Ingestions

The Ingestions plugins are used to ingest user/enterprise data into custome datasets defined by the user/enterprise.

Navigate to Ingesting Data to learn more about the process of ingesting data into Ask Sage.


Table of contents
  1. List of Ingestions Plugins & Agents
  2. CSV
  3. Content into Dataset
  4. File
  5. Plain/text content

List of Ingestions Plugins & Agents

Index Title Access Description of the Plugin/Agent Category
1 CSV Lines Paid users only Ingest each CSV as a separate training per line Ingestion
2 Content into Dataset Paid users only This plugin lets you train text content into specific dataset Ingestion
3 File Paid users only Import file content, split it in chunks, train the chunks into dataset, summarize the content and ingest the summaries Ingestion
4 plain/text content Paid users only Import plain/text content, split it in chunks, train the chunks into dataset, summarize the content and ingest the summaries Ingestion

CSV

The CSV Lines plugin is used to ingest each entry/line in a CSV file as a separate training.

Step 1 - Navigate to the Ask Sage Prompt Settings section and select Prompt Templates. Followed by then selecting the CSV Lines plugin.

Step 2 - Click on the Choose File button to upload the CSV file you want to ingest.

Step 3 - Provide a short description of the CSV file you are ingesting.

Step 5 - Select the dataset you want to ingest the CSV file into.

Step 6 - Click on the Submit button

Step 7 - Execute the prefilled prompt generated by the plugin, which will loop through each line in the CSV and execute the prompt against each line.

After ingesting your CSV file, you can proceed to ask questions or generate text from the data you ingested.


Content into Dataset

The Content into Dataset plugin is used to ingest text content into a specific dataset. A use case for this plugin is where the text content is not in a file format (e.g., CSV, PDF, etc.).

An example is utilizing our Summarize Website plugin to summarize a website content and then ingest the summarized content into a dataset via the Content into Dataset plugin.

Step 1 - Navigate to the Ask Sage Prompt Settings section and select Prompt Templates. Followed by then selecting the Content into Dataset plugin.

Step 2 - Enter the text content you want to ingest into the dataset. (Recommend 500 tokens per ingestion)

Step 3 - Provide a short description of the text content you are ingesting.

Step 5 - Select the dataset you want to ingest the text content into.

Step 6 - Click on the Submit button

Step 7 - Execute the prefilled prompt generated by the plugin, which will ingest the text content into the dataset.

The expected output is similar to the CSV ingestion plugin, where you get a confirmation of the ingestion and can proceed to ask questions or generate text from the data you ingested.


File

The File plugin is used to ingest file content, split it into chunks, train the chunks into a dataset, summarize the content, and ingest the summaries.

File types supported are listed in the plugin description, but you can also find the supported file types by navigating to the Ingesting Data section.

Step 1 - Navigate to the Ask Sage Prompt Settings section and select Prompt Templates. Followed by then selecting the File plugin.

Step 2 - Click on the Choose File button to upload the file you want to ingest.

Step 3 - Select the file reader strategy from the dropdown list. The options are:

  1. Auto (default): This is the default setting, where the system automatically selects the most appropriate file reading strategy based on the file type and content. It aims to balance speed and accuracy.

  2. Fast: This strategy prioritizes speed over accuracy. It is useful when you need to quickly process a large number of files and can tolerate some loss in detail or accuracy.

  3. Hi_res (for OCR recognition): This strategy is designed for high-resolution processing, particularly for Optical Character Recognition (OCR). It is useful for extracting text from images or scanned documents where high accuracy is required.

If you are unsure which strategy to choose, you can leave it as the default “Auto” setting.

Step 4 - Provide a short description of the file content you are ingesting.

Step 5 - Enter the number of tokens you want to ingest per chunk. (Max 2,000 tokens per chunk for training)

Step 6 - Enter the prompt you want to use to summarize the content. (Keep default if unsure)

Step 7 - Select the dataset you want to ingest the file content into.

Step 8 - Click on the Submit button

Step 9 - Execute the prefilled prompt generated by the plugin, which will ingest the file content and prompt you to confirm the summaries.

The expected output is to get a confirmation as show below, which list the number of chunks ingested and the summaries of the content.

At the end of the output, you will have the option to accept or reject the summaries from being ingested into the dataset.

The options are:

A) Ingest the data into the dataset.

/yes

B) User can skip and then is prompted with the option to re-run the summarization plugin on the summarized results or to stop the process.

/skip

If you choose to skip, you can re-run the summarization plugin on the summarized results and then ingest the summaries into the dataset.

C) User can stop the process. (Which will not ingest the summarized results into the dataset)

/stop

Plain/text content

The plain/text content plugin is used to ingest plain/text content, split it into chunks, train the chunks into a dataset, summarize the content, and ingest the summaries.

The main difference between this plugin and the Content into Dataset plugin is that this plugin is able to ingest very large text content.

Step 1 - Navigate to the Ask Sage Prompt Settings section and select Prompt Templates. Followed by then selecting the plain/text content plugin.

Step 2 - Enter the text content you want to ingest into the dataset.

Step 3 - Provide a short description of the text content you are ingesting.

Step 4 - Enter the number of tokens you want to ingest per chunk. (Max 2,000 tokens per chunk for training)

Step 5 - Enter the prompt you want to use to summarize the content. (Keep default if unsure)

Step 6 - Select the dataset you want to ingest the text content into.

Step 7 - Click on the Submit button

Step 8 - Execute the prefilled prompt generated by the plugin, which will ingest the text content and prompt you to confirm the summaries.

Similar to the File plugin, you will have the option to accept or reject the summaries from being ingested into the dataset.


Back to top

Copyright © 2024 Ask Sage Inc. All Rights Reserved.