NeuralSeek Learning Lab

Introduction and Objectives

NeuralSeek is an AI-powered answer-generation engine that empowers businesses. Unlike most AI, NeuralSeek provides a clickable path to fact check AI generated responses, data analytics to improve AI natural language, and step-by-step instructions to use AI to clean and maintain accurate resource data. It is the business solution to use AI in a professional workplace. It works by taking a user question and assembling a corpus of backing information from a KnowledgeBase like Watson Discovery or ElasticSearch.

NeuralSeek uses this corpus to conduct just-in-time training to generate a conversational answer to the user question. As a user you just see an answer in natural language – not options, paragraphs or references from source material. NeuralSeek attempts to fill in the gaps and join related thoughts – just like a live customer support agent would do in trying to answer user questions.

In this lab you will learn:

How to import PDFs into NeuralSeek and build your KnowledgeBase.
About NeuralSeek language capabilities and what the Match Input feature is.
Four ways to fine tune your answers within NeuralSeek.
1. Document Score Range
2. Document Date Penalty
3. Snippet Size
4. Misinformation Tolerance

Important Terms to Know

Coverage Score: Coverage is a measure of how many documents or sections of a document talk about the subject area of a user question. A low coverage score is not necessarily bad, depending on the question. A high coverage score, on the other hand, may be indicative of questions that have conflicting or confusing source material. You should look to ensure your KnowledgeBase does not have contradictory information in it.

Confidence Score: How well does NeuralSeek believe that the corporate knowledge found answers to the user question. The higher score here, the better.

Document Date Penalty: If turned on (set above 0) the Document Date Penalty will search documents for dates in their text. If multiple dates are found, it will take the newest date found. The Document Date Penalty is calculated by the years off from the current date times the percent penalty set. This is used to help weed out old documents when you want to focus answers on current topics only.

Document Score Range: NeuralSeek determines over time the max possible score out of your KnowledgeBase. This varies based on the actual documents and queries. The max possible score is 100%. The score range is then the window below that max possible score that documents must fall into to be considered. EG: A Document Score Range setting of 20% will consider only documents that score between 80-100% of the max.

Misinformation Tolerance: This is the tolerance for generating text about the company about topics without backing KnowledgeBase material. 0 is the strictest, 100 is the loosest. The lower the misinformation tolerance the harder it is for the person to make the system talk about something that is fake or not real or not aligned with your company. So this is an important toggle for corporate settings.

Now the tighter the misinformation tolerance the less ability the system has to deal with misspellings or miss phrasing. At the tightest, it may decline to answer. It may say, “I don’t know about that. I’m not confident.” And at the lowest, it will attempt to answer the question. You’ll have to play with the questions and the settings to get the impact that you want.

Snippet Size: You can use the Snippet Size setting to window relevant details in a document that do not specifically mention the user question, but apply to it. When the KnowledgeBase finds relevant text, this is the average character count returned.

Semantic Match Score: NeuralSeek generates responses by directly utilizing content from corporate sources. Clarity is achieved by employing semantic match scores. These scores compare the generated response with the ground truth documentation, providing a clear understanding of the alignment between the response and the meaning conveyed in source documents. This ensures accuracy and instills confidence in the reliability of the responses generated by NeuralSeek.

What You Will Need For This Lab

For this lab you will need access to NeuralSeek and Watson Discovery, the instances for NeuralSeek and Watson Discover should already be setup, if they are not setup follow the instructions on this VIDEO. You will also need 4 documents (PDF preferred) to upload into the KnowledgeBase.

List the documents

1_Customer 360 and Match 360 with Watson FAQ.pdf
2_Small Partner Automation FAQ.pdf
3_FNCM – FAQ for FileNet Content Manager_11:15:17.pdf
4_FNCM – FAQ for FileNet Content Manager_1_6_23

You can download the documents needed for the KnowledgeBase here.

How to import PDFs into NeuralSeek and Build Your KnowledgeBase

The first thing that you want to do is open up Watson Discovery. Once you have it open you will see a section called “My projects”. Here you can see all the projects you have created to integrate them into NeuralSeek. For this lab, you will be selecting a new project.

After selecting “New Project” you will be asked to name it. You can name it whatever you like but for this example you can name it “Live Lab”.

For the project type you will select “None of the above – I’m working on a custom project”.

You will then be asked to select your data source, since we’re just uploading documents select “Upload data”.

The final thing you will be asked to do is name the collection. (The collection is where all the uploaded documents will be stored). You can name it whatever you like, for this example we can name it “Learning Lab”.

Now that is all done, it’s time to upload your documents. The documents you will need to upload are found here.

*Note* For the first part of this lab we will only be uploading the first 3 documents.

1_Customer 360 and Match 360 with Watson FAQ.pdf
2_Small Partner Automation FAQ.pdf
3_FNCM – FAQ for FileNet Content Manager_11/15/17.pdf

We will upload the 4th document later on in the lab.

The best way to think of the KnowledgeBase is like a library. This is because each project made in Watson Discovery and integrated with NeuralSeek follows only the documents that are uploaded within it and it won’t look for any references you don’t want it to.

Once the documents are uploaded, the next step is to integrate the project with NeuralSeek.

To integrate the project with NeuralSeek we’ll first want to go back to “My Projects” and select the project we’ve been working on so far.

After selecting the current project let’s head over to “Integrate and Deploy” found in the menu on the left hand side.

Once we are in this section click on API information and copy the Project ID.

Now that the Project ID is copied, we can open up NeuralSeek in a new window to integrate our KnowledgeBase with it.

With NeuralSeek opened up, go to the “Configure” tab at the top of the page and open the “Corporate Knowledge Base Details” section. Here you can paste the Project ID into the Discovery Project ID.

Your KnowledgeBase is now connected with NeuralSeek.

Back in Watson Discovery, using the menu on the left hand side select “Manage Collections”. Please keep this page open as we will refer back to it later in the lab.

NeuralSeeks Language Capabilities and the Match Input Language Feature

NeuralSeek can answer and respond in multiple languages. NeuralSeek currently supports taking questions and delivering answers in English, Spanish, Portuguese, French, German, Italian, Arabic, Korean, Chinese, and Japanese. Believe it or not, the system knows even more languages than are just in the list.

You can see how NeuralSeek will respond to other languages outside of what’s listed by following the example below.

First, go to the “Seek” tab at the top of the page. Then on the right side next to the “Seek” button, in the “Respond in” dropdown select “Match Input”.

Next open a new window and go to Google Translate and translate “How does someone know if FileNet content manager works best for their business?” into a language that’s not part of the original NeuralSeek language options found in the “Respond in” dropdown in NeuralSeek.

Then, copy the translated question from Google Translate and paste it into NeuralSeek, and select “Seek”. Let’s see if it answers it.

You can check NeuralSeek’s response by copying and pasting the answer back into Google Translate.

As you can see, through the “Match Input” feature NeuralSeek’s language capabilities go beyond the list of languages provided.

Four Ways to Fine Tune Your Answers Within NeuralSeek

The quality of the output of NeuralSeek is directly correlated to the quality of the KnowledgeBase documents loaded into Watson Discovery. If you are getting an undesired output from NeuralSeek, it’s important to investigate the content that you have provided in Discovery. The more content and documentation you can provide the better. It’s also important to check that documents don’t contradict each other.

NeuralSeek is designed to always give an answer. This means that NeuralSeek will sometimes give a wrong answer. This is an unavoidable consequence of a system designed to always answer.

You can use the following four ways to fine tune your answers within NeuralSeek.

It’s also important to note that since users may use a small collection of queries you may not see a major difference when we fine tune our answers.

Document Score Range

NeuralSeek determines over time the max possible score out of your KnowledgeBase. This varies based on the actual documents and queries. The max possible score is 100%. The score range is then the window below that max possible score that documents must fall into to be considered. EG: a setting of 20% will consider only documents that score between 80-100% of the max.

You can see how it works by following the example below.

Low Document Score Example

In the “Configure” tab under the “Corporate Knowledge Base Details” section change the Document Score Range to 20%. Then select “Save” at the bottom of the page and refresh the page.

Next, go to the “Seek” tab at the top of the page to ask a question. For this example let’s ask “How is the enterprise video streaming connector relevant to a tech business?”. Once the question is entered, select “Seek”.

Looking at the KnowledgeBase Results you can see that we received a score of 10. Now let’s see what happens when we raise the Document Score.

High Document Score Example

To do this, we’ll go back to the “Configure” tab at the top of the page and go to the “Corporate KnowledgeBase Details” section, then go to the Document Score Range and select 80%.

We’ll select “Save” at the bottom of the page and refresh the page to update it. Then we’ll go back to the “Seek” tab at the top of the page and ask “How is the enterprise video streaming connector relevant to a tech business?” again.

You can see that by raising the Document Score range that the KnowledgeBase Coverage Score rose to 18.

Now that you have explored this feature we recommend returning the Document Score Range to its original state of around 80%. You can find the Document Score Range setting back under the “Configure” tab at the top of the page under the “Corporate Knowledge Base Details” section.

Document Date Penalty

If turned on (set above 0) the Document Date Penalty will search documents for dates in their text. If multiple dates are found, it will take the newest date found. Penalty will be the years off from the current date times the percent penalty set. This is used to help weed out old documents when you want to focus answers on current topics only.

You can see how it works by adjusting the Document Date Penalty and following the example below.

Document Date Penalty Example

In order to have a baseline to compare the filter to, under the “Configure” tab and the “Corporate KnowledgeBase Details” section turn the “Document Date Penalty” to 0. Then select “Save” at the bottom of the page, and refresh the page.

Next, go to the “Seek” tab at the top of the page and ask the question “Which browsers are supported?”. Looking below the response at the KnowledgeBase Context you can see which documents NeuralSeek used to generate the answer.

The easiest way to see the Document Date Penalty feature in action is to use a document with a newer date such as document “4_FNCM – FAQ for FileNet Content Manager_1/6/23” from the Learning Labs folder downloaded earlier.

Go back over to Watson Discovery and upload it into your collection by selecting “Upload data”.

Once it’s finished uploading, go back to the “Configure” tab in NeuralSeek and change the Document Date Penalty to anything above zero. For this example let’s do “2%”. Select “Save” at the bottom of the page and refresh the page.

Next, go to the “Seek” tab and ask “Which browsers are supported?” again. Looking below the response at the KnowledgeBase Context you can see the source for the answer has changed due to the filter that is now in place.

Now that you have explored this feature we recommend returning the Document Date Penalty to its original state of 0%. You can find the Document Date Penalty setting back under the “Configure” tab at the top of the page under the “Corporate Knowledge Base Details” section.

Snippet Size

The Snippet Size setting is used to window relevant details in a document that do not specifically mention the user question, but apply to it. When the KnowledgeBase finds relevant text, this is the average character count returned.

You can see how it works by adjusting the settings and following the examples below.

Small Snippet Size Example

Under the “Configure” tab at the top of the page and under the “Corporate Knowledge Base Details” section, reduce the snippet size to 100. Then select “Save” at the bottom of the page, and refresh the page to update the setting.

Next, go to the “Seek” tab at the top of the page and ask the question “How does integrating Match 360 with homegrown MDM systems help organizations enhance their customer 360 views?”.

If you look down at the “KnowledgeBase Context” section, you can see which documents NeuralSeek collected the answers from. You can take a closer look at the snippet of source material if you open the dropdowns for the individual documents.

Now let’s see what happens if we expand the snippet size.

Large Snippet Size Example

To do this, we’ll go back to the Snippet Size setting under the “Configure” tab at the top of the page under the “Corporate Knowledge Base Details” section. Set the snippet size as 1000. Then select “Save” at the bottom of the page, and refresh the page to update the setting.

Next, go to the “Seek” tab at the top of the page and ask the question “How does integrating Match 360 with homegrown MDM systems help organizations enhance their customer 360 views?”.

You can see by changing the snippet size, NeuralSeek used a larger scope to search for answers in the document.

Now that you have explored this feature we recommend returning the Snippet Size to its original size of around 625. You can find the Snippet Size setting back under the “Configure” tab at the top of the page under the “Corporate Knowledge Base Details” section.

Misinformation Tolerance

This is the tolerance for generating text about the company about topics without backing KnowledgeBase material. 0 is the strictest, 100 is the loosest. The lower the misinformation tolerance the harder it is for the person to make the system talk about something that is fake or not real or not aligned with your company. So this is an important toggle for corporate settings.

Now the tighter the misinformation tolerance is, the less ability the system has to deal with misspellings or miss phrasing. At the tightest, it may decline to answer. It may say, “I don’t know about that. I’m not confident.” And at the lowest, it will attempt to answer the question. You’ll have to play with the questions and the settings to get the impact that you want.

You can find the “Misinformation Tolerance” setting under the “Configure” tab under the “Confidence & Warning Thresholds” section.

You can see how it works by adjusting the settings and following the example below.

Low Misinformation Tolerance Example

In the “Configure” tab, under the “Confidence & Warning Thresholds” section, reduce the Misinformation Tolerance to 0. Then select “Save” at the bottom of the page, and refresh the page to update the setting.

Next, go to the “Seek” tab at the top of the page and ask a question with a misspelling. For this example we’ll ask “What is IMB?”.

Looking at the answer, you can see that by turning the Misinformation Tolerance down to 0, it became highly restricted and wasn’t able to provide a response when the company name was misspelled.

Now let’s see what happens when we turn the Misinformation Tolerance up to 100.

High Misinformation Tolerance Example

To do this, we’ll go back to the Misinformation Tolerance setting under the “Configure” tab and select 100. Then select “Save” at the bottom of the page, and refresh the page to update the setting.

Next, go back to the “Seek” tab at the top of the page and ask the same question, “What is IMB?”.

Now you will see that with the higher Misinformation Tolerance, NeuralSeek was able to look past the misspelling and understand the root of the question.

Now that you have explored this feature we recommend returning the Misinformation Tolerance back to its original size of 100. You can find the Misinformation Tolerance setting back under the “Configure” tab at the top of the page under the “Confidence & Warning Thresholds” section.

Summary

You have now seen first hand how NeuralSeek works and how customizable it is. It should also be noted that through experience and practice, will you find the methods that works best.

By completing this learning lab:

You learned how to import PDFs into NeuralSeek and build your KnowledgeBase.
You learned about NeuralSeek’s language capabilities and how expansive the Match Input Language feature is.
You learned four ways to fine tune your answers in NeuralSeek:
1. Document Score Range
2. Document Date Penalty
3. Snippet Size
4. Misinformation Tolerance