Ingesting Data into Ask Sage
In this section, we will guide you through the process of ingesting data into Ask Sage. Understanding this process is crucial to generating relevant and accurate results relevant to your use case.
This is valuable because the GenAI models are trained on a diverse range of open-source datasets, and the more specific and relevant the data you ingest, the better the results you will get from the GenAI models. Therefore in this section we will guide you step-by-step on how to ingest data into Ask Sage.
1) Ask Sage allows you to ingest data in any format, including text, images, and audio. This allows you to generate text from the data you ingest, which can be useful for a variety of use cases.
2) Ask Sage users benefit from the ability of only having to ingest data once! Which allows users to leverage the data across multiple GenAI models on the platform.
Table of contents
Define a Dataset
The first step is to create a dataset
in Ask Sage. A dataset
is equivalent to a folder where you can store all the data you want to ingest into Ask Sage. You can create multiple datasets
to organize your data based on specific use cases or projects.
To create a dataset
, follow these steps:
- In the
Advanced Settings
section, click on theUpload New Files
.
- Click on the
Create New Dataset
button.- Enter a dataset name. Only alphanumeric characters and hyphens are allowed. No spaces or special characters are allowed.(e.g.,
my-dataset12345
).
- Click on the
Create Dataset
button. (If successful, you will seeDataset created
)
- Enter a dataset name. Only alphanumeric characters and hyphens are allowed. No spaces or special characters are allowed.(e.g.,
After creating a dataset
, you can now start ingesting data into Ask Sage.
1) As a best practice, it is recommended have a clear naming convention for your datasets to easily identify them when ingesting data.
2) On your local machine, you can create a folder with the same name as the dataset you created in Ask Sage. This will help you organize your data locally and easily upload it to Ask Sage.
Upload/Ingest Data
After creating a dataset
, you can now upload/ingest data into Ask Sage. You can ingest data in any format and as listed in the table below:
Data Type | File Format | Example | Max Size Per File |
---|---|---|---|
Text | .txt, .docx, .pdf, .pptx, .ppt, .csv, .cc, .sql, .cs, .hh, .c, .php, .js, .py, .html, .xml, .msg, .odt, .epub, .eml, .rtf, .doc, .json, .md, .tsv, .yaml, .yml, .java, .rb, .sh, .bat, .ps1 | example.txt | 50MB |
Image | .jpg, .jpeg, .png | example.jpg | 50MB |
Audio | .wav, .mp3, .mp4, .mpeg, .mpga, .m4a, .webm | example.wav | 500MB |
Compressed | .zip | example.zip | 50MB |
Spreadsheet | .xlsx, .tsv | example.xlsx | 50MB |
Presentation | .pptx, .ppt | example.pptx | 50MB |
Code | .cc, .sql, .cs, .hh, .c, .php, .js, .py, .java, .rb, .sh, .bat, .ps1 | example.py | 50MB |
E-book | .epub | example.epub | 50MB |
.eml, .msg | example.eml | 50MB | |
Rich Text | .rtf | example.rtf | 50MB |
Markup | .md, .html, .xml | example.html | 50MB |
Data Interchange | .json, .yaml, .yml | example.json | 50MB |
Be aware that images in text file documents will not be extracted. You will need to upload the images separately.
To upload data into Ask Sage, you navigate to the Ingest Files
section and follow these steps:
- Select the
dataset
you created from the dropdown list. - In the box, drag and drop the files you want to upload or click in the box to select files from your local machine.
- After selecting the files, you will see the file names listed in the box. Review the files to ensure they are correct and delete any files you do not want to upload via the
garbage bin
icon. - Click on the
Ingest Files
button to start uploading the files. - If successful, you will see a
white
checkmark , andSuccessfully Imported
text for each file that is uploaded.
1) You can upload multiple files at once by selecting multiple files from your local machine.
Using the Dataset with GenAI Models
After ingesting data into Ask Sage, you can now use the data with any of the GenAI models available on the platform.
To use the data with the GenAI models, follow these steps:
- Navigate to the
Advanced Settings
section. - Update the
Advanced Settings
to use the dataset(s) you created.- Update any other settings as needed (e.g.,
Model
,Persona
,Temperature
,Personality
, etc.)
- Update any other settings as needed (e.g.,
- Enter a prompt and submit your prompt.
Here is an example of how to use the data with the GenAI models on Ask Sage:
The dataset(s)
selected will appear in the Advanced Settings
section, but also under the prompt window so users can easily identify the dataset(s) used with the prompt.
For optimal results with the ingested data, we recommend keeping the
Temperature
setting at its default value of0.0
and ensuring that theLive
setting is turnedoff
. Incorrect settings may result in subpar outcomes or data contamination.
The inference/response generated by the GenAI model utilizes the dataset assigned to the prompt.
Ask Sage users benefit from the
Show Explainability
feature, which provides users with a detailed reference to the data used to generate the text when using adataset
and/or thelive
feature. This is useful for understanding the context of the generated text and ensuring the text is relevant and not a hallucination.
Summary
In this section, we guided you through the process of ingesting data into Ask Sage. Understanding this process is crucial to generating relevant and accurate results relevant to your work/organization.
Now that you have a better understanding of how to ingest data into Ask Sage, you are ready to start utilizing the platform and leveraging the power of GenAI!
Proceed to the next sections to learn more about Ask Sage! 🚀