Platform onboarding / cohort guide

The operator's field guide to Ask Sage.

A five-module walkthrough for new users landing on the platform. Starts with LLM fundamentals for users who've never thought about what's under the hood, then moves into the questions people actually ask in week one on Ask Sage itself. Built from a full read of docs.asksage.ai.

Format 5 modules · 15–20 min each
Audience New operators, mixed technical depth
Prereq Active Ask Sage account
Compiled April 2026
Module 00

LLM Basics

Before Ask Sage. Before prompts, personas, datasets, or agents. A grounding in what a Large Language Model actually is, how the modern AI systems built on top of it retrieve real information, and when the two behave differently. If your users have never thought about this, everything in the next four modules will feel like magic — and magic is hard to operate safely.

By the end of this module
Users can explain what an LLM is and how it differs from the full AI system around it, distinguish open-book from closed-book answers, understand what a token and a context window are, and appreciate why prompt quality directly affects answer quality.
00 What is a Large Language Model, actually? +

Two answers, and you need both.

The model at the core is a very large statistical pattern-matcher trained on enormous quantities of text. Given the words so far, it predicts the most likely next word. Then the next. Then the next. That's the whole engine. Left to its own devices, a raw LLM doesn't have live internet access, doesn't query your databases, and can't tell you what happened this morning — it only knows patterns it absorbed during training.

The AI system you actually use on Ask Sage is much more than that raw model. Modern platforms wrap the model with tools that let it retrieve real information in the moment: search your ingested documents (RAG), query the live web (Live and Deep Agent), pull from connected systems via plugins and MCP, and call external APIs. So when you ask a question in Ask Sage, the answer you get may absolutely be based on real lookups happening in real time — if you've set things up that way.

mental model The LLM is the brain. Everything around it — RAG, plugins, agents, MCP, web search — is the hands, eyes, and reference library. A modern AI system is the brain plus all the tools it can reach for. Knowing which tools are in play for a given answer is the core operator skill.

01 Is Ask Sage an LLM? +

No. Ask Sage is a platform that gives you secure access to LLMs. The LLMs themselves — GPT-4o, Claude, Gemini, Llama, and others — are made by different companies (OpenAI, Anthropic, Google, Meta). Ask Sage sits in front of them, adds enterprise security, lets you attach your own data, and controls which models your organization can use.

This is why Ask Sage calls itself "model-agnostic" — you can pick any of the 25+ models from inside the same interface. It's also why the platform can add new models over time without changing your workflow.

02 Does the AI actually "look things up" when I ask a question? +

It depends — and this is the single most important thing to understand about modern AI systems.

In any given answer, the system is operating in one of two modes:

  • Closed-book mode — the model is relying only on patterns from its training data. No tools active, no dataset attached, no web search. Whatever it says comes from what it absorbed during training, which has a cutoff date. Good for general knowledge, writing, reasoning. Prone to hallucination on specific facts or recent events.
  • Open-book mode — the model is actively retrieving and reading real source material before answering. This is what happens when you attach a dataset (RAG), turn on Live web search, run Deep Agent, use a plugin that calls an external API, or invoke MCP to query a connected system. The model still composes the sentences, but the facts come from real retrieved documents or live API responses.

When Ask Sage's Deep Agent searches the web, it's genuinely searching the web. When a plugin calls SAM.gov, it's genuinely calling SAM.gov. When RAG retrieves from your dataset, those are real chunks of your documents. These are not hallucinations of lookups — they are real lookups.

operator skill Every answer you see has a mode. Before you trust a specific fact, know which mode produced it. Use the Explainability feature (Module 2) to see exactly what was retrieved. If nothing was retrieved and you're asking about something that could have changed since the model's training cutoff, treat the answer as informed speculation, not verified truth.

03 What's a token? Why does everyone keep talking about them? +

A token is a chunk of text the model reads and writes one piece at a time. Tokens are often whole words but can be parts of words, punctuation, or spaces.

Rough rule: 1 token ≈ ¾ of a word. So 100 words is about 130 tokens. A dense page of text is roughly 500 tokens. A 50-page PDF is roughly 25,000 tokens.

Tokens matter for three reasons:

  • Cost — you're billed in tokens. Both sides of the conversation (what you send and what the model sends back).
  • Limits — every model has a maximum number of tokens it can "see" at once (its context window).
  • Speed — more tokens means slower responses.
04 Inference tokens vs. training tokens — what's the difference? +

Two different kinds of work the AI does, two separate budgets you'll see on most enterprise platforms.

  • Inference tokens — spent when the model reads your prompt and generates a response. Every normal chat turn burns inference tokens.
  • Training tokens — spent when you ingest a file into a dataset so the model can retrieve from it later. The file gets converted into vector embeddings, and that conversion is what costs training tokens. It's a one-time cost per file, not a per-query cost.

This confuses people because "training" in this context doesn't mean the big LLM itself is being retrained on your data — that's not happening. It means your content is being processed and indexed so retrieval can find it later. The word "training" is a holdover from how vector databases were originally marketed.

rule of thumb Chat = inference. Ingesting a file into a dataset = training. They're separate wallets, they reset on different cycles, and neither covers the other. Uploading a 50MB PDF can burn a substantial slice of your training allowance — check your balance before a big ingestion.

on Ask Sage View both balances under Settings → Tokens. On enterprise accounts, your admin distributes tokens from a shared pool and can grant increases via token requests.

05 What's a context window? +

The context window is the model's working memory for a single conversation. It's the total number of tokens the model can hold in view at once — your prompt, the attached files, the chat history, any retrieved dataset chunks, and the model's response all have to fit inside.

Modern models have context windows ranging from ~8,000 tokens (small) to 200,000+ tokens (huge). If you exceed the window, something gets cut off — usually the oldest chat history, silently. This is why long conversations start to feel like the model "forgot" something.

rule of thumb If you attach five 40-page PDFs to one prompt, you'll probably blow past the context window for most models. Either choose a large-context model or ingest into a dataset where only the relevant chunks are retrieved.

06 Why do LLMs sometimes make things up? +

Because the raw model is trained to produce plausible-sounding text, not verified text. In closed-book mode (no tools, no retrieval), when the model doesn't have solid training-time patterns for your specific question, it generates words that sound like a correct answer — right shape, right tone, right specificity — even when the facts are fabricated.

This is called hallucination. It's not the model lying. It's the model doing exactly what it's designed to do, in a situation where it doesn't have the patterns to answer accurately and has no tool to verify.

Hallucinations are most common with:

  • Specific quotes, names, and dates it wasn't trained on
  • Recent events after its training cutoff (no web access)
  • Your organization's internal information (no dataset attached)
  • Niche technical details in specialized fields

the modern defense Switch the system out of closed-book mode. Attach the relevant dataset (RAG). Turn on Live search for current events. Use a plugin for domain-specific lookups. With the right tools active, the model is grounding its answer in real retrieved content — and Explainability lets you verify exactly what it retrieved.

07 What does "temperature" mean? +

Temperature is a dial that controls how random the model's word choices are. At every step, the model has many possible next words ranked by likelihood. Temperature decides how often it picks the top choice versus a less likely alternative.

  • Temperature 0.0 — always pick the most likely next word. Deterministic. Same prompt → same output. Best for factual tasks.
  • Temperature 0.7 — introduce meaningful randomness. More creative, more varied, less predictable. Good for brainstorming and writing.
  • Temperature 1.0+ — quite random. Creative but can become incoherent.

in practice For any task where accuracy matters — RAG, summarization, extraction, analysis — stay at 0.0. Raise it only when you explicitly want variety or creativity.

08 What is a prompt, and why does it matter so much? +

A prompt is the text you send the model. That's it. But because LLMs have no long-term memory between conversations and no sense of who you are or what you're trying to do, the prompt is the entire context the model has to work with.

Better prompts give the model more to work with:

  • Role — "You are a security compliance analyst..."
  • Task — "Summarize the following policy document..."
  • Constraints — "Use bullet points. Maximum 200 words. No jargon."
  • Examples — show it what good output looks like
  • Source material — attach files or ingest datasets

This is why Ask Sage has personas (pre-built role definitions), prompt templates (reusable skeletons), and Enhance Prompt (automatic rewriting). All three are scaffolding to help you write better prompts without having to become a prompt engineer.

09 Does the model remember me between sessions? +

No. Each chat session starts fresh. The model has no persistent memory of who you are, what you discussed yesterday, or anything outside the current context window.

What does persist inside a single session is the chat history — as the conversation grows, earlier messages stay in the context window until it overflows. This is why the docs tell you to start a new chat when switching topics: the prior context is still being fed into every new response, influencing it subtly.

platform vs. model Ask Sage the platform remembers your account, settings, datasets, and chat history. The LLM itself, behind the scenes, remembers nothing. Don't confuse the two.

10 Is my data used to train the model? +

On Ask Sage, no — with one important asterisk.

Models marked with an asterisk (*) are CUI-compliant and guarantee your data is never used for model training. The platform markets this as "Fire & Forget" — your prompt goes to the model, the answer comes back, nothing about your query gets stored by the model provider for training.

Models without the asterisk are research-only. Their providers may use prompts for future training. These are blocked when a CUI dataset is attached but are still available for non-sensitive use.

This is the central differentiator between Ask Sage and consumer ChatGPT. On ChatGPT free, your conversations can be used to improve the model. On Ask Sage's compliant models, they never are.

11 What's the difference between these model names? +

Model names usually follow a pattern: provider + family + generation + size/variant. A few examples:

  • gpt-4.1-mini — OpenAI, GPT-4.1 family, "mini" variant (smaller and faster than full 4.1)
  • claude-sonnet-4-6 — Anthropic, Claude Sonnet 4.6 (mid-sized Claude variant)
  • gemini-2.5-pro — Google, Gemini 2.5, "pro" variant
  • llama-3-70b — Meta, Llama 3, 70 billion parameters

Within a family, you'll usually see small/medium/large tiers. Smaller = faster and cheaper, larger = smarter on hard problems. For most work, mid-tier models are the right default. Reasoning-specialized models (marked as such in the filter) are better at analysis but slower.

don't overthink it Start with GPT Auto mode. Let the platform route. Learn the models only when you have a specific reason to pick one.

12 Why does the same prompt sometimes give different answers? +

Three reasons, in order of frequency:

  • Temperature above zero — the model is sampling probabilistically. Same prompt, different dice roll, different answer.
  • Different chat context — if you're in an existing chat session, prior messages are influencing the response. Start a new chat to reset.
  • Different datasets or Live on/off — changing what the model has access to changes what it can draw from.

At temperature 0.0 with the same prompt in a fresh chat with no datasets, you should get identical answers every time. If you don't, one of the three variables moved.

13 When should I NOT use an LLM? +

Important question. LLMs are remarkable but they are not the right tool for everything.

Avoid or verify carefully:

  • Precise numerical calculations — LLMs are notoriously bad at arithmetic. Use a calculator, spreadsheet, or the platform's spreadsheet-attach path.
  • Real-time or very recent facts — without Live or RAG, the model only knows what was in its training data.
  • Legal, medical, or safety-critical decisions — the model produces plausible text, not verified advice. Always have a qualified human review.
  • Source attribution — without RAG and Explainability, the model cannot cite where it learned something. Don't trust unsourced claims.
  • Anything where "probably right" isn't good enough — use the model as a starting draft, not the final word.

defense context In your work domain especially, treat LLM output as a junior analyst's first draft. Useful speed multiplier. Never the final product.

Module 00 takeaway

Brain plus tools. Know which mode you're in.

The LLM at the core is a prediction engine — it composes language based on training patterns. But the AI system around it, on Ask Sage, has real hands and eyes: RAG retrieves your documents, Live searches the web, plugins call APIs, agents chain tool calls together. Any answer you see is either closed-book (model memory only) or open-book (something was actually looked up). Hallucinations happen almost exclusively in closed-book mode. The operator skill is knowing which mode each answer came from — and using Explainability to verify when it matters.

Module 01

Your First Prompt

Before we get into Ask Sage's dials and switches, a reality check: most bad answers are bad prompts. This module covers the fundamentals of prompting that carry across every LLM — the anatomy of a good prompt, when to show examples, and how to iterate — and then layers on the Ask Sage-specific pieces (model pick, personas, attachments, session hygiene) you won't find in consumer tools.

By the end of this module
Users can write a specific, well-structured prompt, iterate when the first answer isn't right, and choose an appropriate model and persona for the task at hand.
01 What actually makes a prompt work? +

Four parts, in roughly this order: role, task, context, format. You don't need all four every time, but missing the wrong one is why most prompts flop.

  • Role — who the model should be. "Act as a security assistance policy analyst." Sets vocabulary, tone, assumptions.
  • Task — the single concrete thing you want done. A verb. "Summarize", "compare", "draft", "extract", "rewrite". Vague verbs get vague output.
  • Context — what the model needs to know to do the task well. The source material, the audience, the constraints. "This is for a 2-star who has 90 seconds." "The audience is skeptical of LLMs."
  • Format — what you want back. A three-bullet executive summary? A table? A 200-word paragraph? Markdown? JSON? If you don't specify, you'll get the model's default, which is usually too long and too listy.

worked example Bad: "tell me about FMS." Better: "Act as a security cooperation planner. In 150 words and plain prose — no bullets — explain how FMS differs from FMF for a congressional staffer who knows neither acronym. End with one sentence on why the distinction matters for appropriations."

02 Why does being specific matter more than being clever? +

The internet is full of "magic prompts" — "You are a world-class expert," "Take a deep breath," "This is very important to my career." Some help on some models. None beat a clearly scoped task.

The two biggest levers are almost always:

  • Constrain the output. Word count, number of items, tone, audience, what to exclude. "3 bullets, max 15 words each, no jargon" outperforms "make it concise" every time.
  • Give the model something to work with. Paste the actual document, email, or data. Don't describe it. A model reasoning over real source material beats a model guessing what your source probably says.

common failure The kitchen-sink prompt — asking for 5 different things in one shot ("summarize this, find the risks, draft a response, and format as a brief"). Split it. One task per turn produces better results on every task.

03 Zero-shot, one-shot, few-shot, chain-of-thought — what are these, and when do I use each? +

These are the four named techniques you'll hear everywhere. They're not exotic — they're just different amounts of scaffolding you give the model before the real ask. The taxonomy matters because picking the right one saves tokens and gets better answers.

  • Zero-shot — just the task, no examples. "Summarize this report in three bullets." This is what 90% of chat usage is. Works when the task is common and the format is obvious.
  • One-shot — you give one example of what you want before the real ask. "Here's a sample executive summary in our house style: [EXAMPLE]. Now write one for this report." Use when the format or tone is specific but not bizarre.
  • Few-shot — two to five examples before the real ask. Massively underused. Best for structured outputs, consistent style, or extraction tasks where you want specific fields. "Here are three examples of how we format threat assessments: [EX 1] [EX 2] [EX 3]. Now do the same for: [NEW INPUT]."
  • Chain-of-thought — you tell the model to reason step by step before giving the final answer. "Think through this step by step, then give your recommendation." Use for analysis, math, multi-step reasoning, or anything where the model tends to jump to a wrong conclusion. Note: most reasoning models (o1, o3, Claude with extended thinking) do this internally already — you don't need to prompt it.

decision rule Start zero-shot. If the output format is off, move to one-shot. If it's still inconsistent, few-shot. If the model is getting the reasoning wrong (not the format), add chain-of-thought. Don't stack them all at once — add scaffolding only when you see the specific failure it addresses.

picking examples One good example beats three mediocre ones. Pick examples that match what you actually want, not aspirational ones. The model will match the style of your examples more than your instructions.

04 The first answer wasn't great. Refine or start over? +

Depends on what went wrong. Two different failure modes, two different responses:

  • Close but not quite — wrong tone, too long, missed a constraint, weird phrasing. Refine in-thread. "Shorter. Cut the intro. Drop the bullets, use prose." The model has the context already.
  • Fundamentally off-target — wrong task, wrong framing, hallucinated the whole thing. Start a new chat. Prior context is now polluting every next answer, and patching a bad foundation usually costs more turns than rewriting.

rule of thumb If your third refinement isn't landing, stop. The prompt itself is wrong. Open a new chat and rebuild it with more constraints — you'll save tokens and get a better answer.

pro move When something does work, save the prompt as a template (see below) so you don't have to reconstruct it next time.

05 There are 25+ models in the dropdown. Which one do I pick? +

default Start with GPT Auto mode — the platform routes your prompt to the best-fit model. Pick a specific model only when you know why.

Use the filter to narrow by task: Reasoning for analysis, MCP for tool-using models, Image Generation and Video Generation for media.

critical Models marked with an asterisk * are CUI-compliant and safe for sensitive data. Models without the asterisk are research-only and may use your prompts for future training. The UI shows compliance status at the bottom of the prompt window — look there before sending anything sensitive.

06 What's a persona and how is it different from a prompt template? +

A persona is a system-prompt guardrail — it tells the model who to be and how to behave. Roughly 37 personas ship built-in, plus you can create custom ones. Think "Data Science Expert" or "Prompt Engineer" — the model takes on that role before answering.

A prompt template is a reusable prompt skeleton with fillable slots. Think of it as a saved starter prompt for a recurring task ("Explain this DevSecOps term...").

tip New to prompt writing? Select the Prompt Engineer persona and ask it to help you write better prompts. It's the platform's built-in prompt coach.

07 What does "Enhance Prompt" actually do? +

Takes whatever you typed and rewrites it for clarity, context, and specificity before sending. It's a prompt-rewriting pass that runs before inference. Useful when you're in a hurry or new to prompting. Don't rely on it for anything subtle — it costs inference tokens and can paraphrase away intent.

08 How many files can I attach to a prompt? +

Up to five files in a single prompt. You're still bound by the chosen model's context window — five large PDFs will truncate.

when to attach vs. ingest Attach for one-off analysis of a file you don't need again. Ingest into a dataset when the file is part of a reference corpus you'll query repeatedly.

09 Should I start a new chat session or keep going in this one? +

Start new when the topic changes. Prior chat history influences model responses — continuing a session after switching subjects causes subtle drift and worse answers. The best-practice guidance in the docs is explicit: start fresh for a new topic, and start fresh as a debugging step if the model starts misbehaving.

Chat titles are auto-generated from the first prompt, so name your opening question well. Search only looks across the last five chats loaded in memory.

Module 1 takeaway

Specific prompts, safe models.

Two things to leave this module with: (1) a good prompt has a role, a task, context, and a format — skip any of the four and you're leaving quality on the table, and specific beats clever every time; (2) only asterisked models are safe for sensitive data, and the UI tells you which is which if you look. Everything in Module 2 assumes you've internalized both.

Module 02

Bring Your Own Data

Datasets are where Ask Sage becomes more than a chat wrapper. This module covers creating a dataset, ingesting files, the classification labeling that gates CUI work, and the three settings that will silently ruin your RAG results if you don't touch them.

By the end of this module
Users can create and label a dataset, ingest files successfully, query against it without data contamination, and use the Explainability feature to audit the answer.
10 What is a dataset and how do I create one? +

A dataset is a named container for ingested content, stored as vector embeddings in Ask Sage's vector database. The original files are not retained once ingestion completes.

Create one from Prompt Tools → Data & Settings → Upload New Files → Create New Dataset. Naming rules: alphanumeric and hyphens only, no spaces or special characters. Choose the classification (Unclassified or CUI) at creation — this sticks.

11 Why can't I label my dataset CUI? +

access gate CUI labeling is restricted to users authenticated with a CAC or PIV card. If you don't have one, you cannot mark a dataset as CUI through the UI.

If you need CUI labeling and don't have a CAC/PIV, email support@asksage.ai to request activation. Access is granted case-by-case, not automatically. You'll get follow-on instructions by email.

12 What file types can I ingest? +

Broad coverage at 50MB/file (500MB for audio):

  • Documents — PDF, DOCX, DOC, ODT, RTF, PPTX, PPT, TXT, MD
  • Email & web — EML, MSG, HTML, XML
  • Images — JPG, JPEG, PNG (uploaded separately — images inside documents are NOT extracted)
  • Audio — WAV, MP3, MP4, MPEG, MPGA, M4A, WebM (transcribed on ingest)
  • Code — PY, JS, JAVA, CS, C, CC, HH, PHP, RB, SH, BAT, PS1, SQL
  • Data — JSON, YAML, YML, TSV, CSV, EPUB, ZIP

watch out Images embedded in Word/PDF documents are not extracted. If diagrams and maps matter to your corpus, upload them as separate image files.

13 Should I ingest my spreadsheet into a dataset? +

No. Vector embeddings are bad at tabular data. The docs explicitly recommend against it.

Instead, attach the spreadsheet to your prompt. Ask Sage will use Python libraries to analyze it directly — a code-interpreter path separate from the RAG pipeline. You get accurate numerical analysis without wasting training tokens on useless embeddings.

14 My RAG answers are wrong or nonsensical. What happened? +

Nine times out of ten, one of three settings:

  • Temperature above 0.0 — the docs call this out explicitly: for dataset queries, keep temp at 0.0.
  • Live (web search) enabled — pulls in web content that contaminates your dataset-grounded answer.
  • Multiple unrelated datasets selected — RAG retrieves chunks across all selected datasets. Turn off ones that aren't relevant to your question.

they call this "data contamination" The platform docs warn: incorrect settings "may result in subpar outcomes or data contamination." Translation: turn off Live, set temp to 0, select only the dataset you actually need.

15 Can I use Live (web search) with a CUI dataset? +

no Live is not CUI-compliant. The platform cannot control what a search engine logs about your query, so Live is blocked when a CUI-labeled dataset is attached.

Same constraint applies to Deep Agent when it's configured to search the web.

16 The model cited something that wasn't in my data. How do I verify? +

Turn on Show Explainability. It renders the exact source chunks that fed into the response — the audit trail for every dataset-grounded answer. This is your hallucination-detection tool and the thing that separates enterprise RAG from consumer chat.

If Explainability shows chunks that don't match the answer, the model hallucinated. If it shows chunks outside the dataset you expected, you have Live on or have extra datasets selected.

17 Who can see my dataset? +

By default, only you. Datasets can be shared organization-wide through dataset management. Dataset permissions are managed via the /update-permission-dataset flow (UI or API) with granular controls.

Chat history is private by default too — users can share individual chats inside their organization, but never across organizations.

Module 2 takeaway

The three-setting rule for clean RAG

For every dataset query: Temperature = 0.0, Live = off, only the relevant dataset(s) selected. Turn on Show Explainability to audit the answer. This is muscle memory users need to build in week one, because the defaults do not enforce it — and silent data contamination looks identical to a correct answer until someone checks.

Module 03

Beyond Chat

Ask Sage ships with six user-facing surfaces. They overlap enough to confuse new users — "I want to write a memo, which one do I use?" is the most common Module 3 question. This section gives them a decision matrix.

By the end of this module
Users can choose the right workspace for the task (chat, workbook, canvas, compare, in-a-box) without defaulting to chat for everything.
Ask Sage Chat
General purpose
Default entry point. Q&A, brainstorming, RAG queries, attached-file analysis. Start here unless you have a reason not to.
Workbook
Multi-prompt
When you need to run several related prompts in a structured workspace. Better than chat for iterative analytical work. Note: manage workbook data inside the workbook UI, not via dataset management.
Code Canvas
Software development
Write and edit code with LLM-assisted modifications. IDE-like workflow. Use for building, debugging, refactoring.
Prose Canvas
Long-form writing
Same interaction model as Code Canvas, but optimized for documents — memos, reports, articles. Use when you're producing something someone will read.
Model Compare
Evaluation
Run the same prompt across multiple models side-by-side. Use when picking a model for a recurring task, or when answer quality matters enough to A/B.
In-a-Box
Templated documents
Generate structured deliverables like compliance packages. Start here when the output is a known document type with known sections — not free-form writing.
18 I want to write a memo. Chat, Prose Canvas, or In-a-Box? +

Prose Canvas is the default answer. It's built for long-form writing with iterative LLM edits — closest to what you'd want for a drafting session.

Use In-a-Box if the memo fits a templated format (a specific compliance document, ATO package, etc.). It saves time when the structure is fixed.

Use Chat for a quick 2-paragraph response you'll paste elsewhere. Anything longer — move to Prose Canvas.

19 When should I use Model Compare? +

Two scenarios:

  • Picking a default — you're deciding which model to set as your go-to for a recurring task (summarizing intel reports, drafting code).
  • Quality-critical answers — when getting it right matters enough that you want a second opinion.

Costs inference tokens on every model you include in the comparison, so don't use it as the daily driver.

20 What exactly is In-a-Box? +

A workspace for generating specific document types using pre-built templates and workflows. Strong compliance/cybersecurity leaning — the documented flow covers adding a system and generating compliance packages (think RMF documentation, ATO artifacts).

Also supports organization-specific templated outputs. If your team has a recurring document they produce, In-a-Box is where you'd build or use the template.

21 How do I customize my default settings so I don't reconfigure every prompt? +

Settings → Customization → Prompt Settings. Set default dataset, persona, and model. Saves time when you're doing the same type of work repeatedly.

Also in Customization: choose theme (dark/light), toggle the chat UI view (modern vs. advanced chat fields), and control follow-up prompt display.

Module 04

Automation & Agents

Five overlapping agentic concepts live on Ask Sage: plugins, agents, Deep Agent, Agent Builder, and MCP. Users don't need to understand all five in week one — they need to understand the ladder, so they reach for the right tool instead of over-engineering.

By the end of this module
Users understand the complexity ladder (plugin → Deep Agent → Agent Builder → MCP), recognize the token cost of Deep Agent runs, and know when to call support about MCP whitelisting.
The five flavors of automation, simplest to most complex
Concept What it is When to use What to watch for
Plugin Single-task GenAI tool. ~33 built-in (SAM.gov search, USA-Spending lookups, SBIR assessment, content generation, connectors, ingestions, medical, audio). You want one specific thing done: look up a contract, pull award data, generate a specific document type. Some are free, some require paid subscription. Check the directory for the FREE/PAID badge.
Agent Plugin's more autonomous cousin. Higher decision-making, handles multi-step tasks. You need a task completed that involves multiple decisions, but a custom workflow would be overkill. Lives in the same directory as plugins. Mostly paid. Treat it as the next tier up from a plugin.
Deep Agent Iterative research agent. Queries both the web and your private datasets, generating multiple sub-prompts per run. Research and synthesis across public + private sources. Report-style outputs. Token-hungry. A single run may generate dozens of prompts and consume 500–1,000 tokens. Not CUI-compliant when web search is active. Currently in beta.
Agent Builder Visual no-code workflow composer. Drag nodes onto a canvas, connect them, execute the workflow. You have a recurring multi-step process that no existing plugin covers and you want a reusable version. Beta. First workflow takes time to learn. Don't build one if a plugin already exists — check the directory first.
MCP (Model Context Protocol) Framework for connecting AI models to external tools and data sources. Supports Microsoft 365, GitHub, and custom MCPs. Integrating Ask Sage with tools your team already uses — reading email, pulling from GitHub, calling internal APIs. Admins can whitelist which MCP servers users can add. If a needed integration isn't available, route to the enterprise admin, not support.
22 I want to automate a task. Where do I start? +

Walk up the ladder until something fits:

  • Step 1: Check the plugin directory. ~33 plugins across acquisition, automation, coding, connectors, content, ingestions, medical, and audio categories. If one already does what you want, use it.
  • Step 2: If it's a research task across public + private data, try Deep Agent (but budget the tokens).
  • Step 3: If you need a reusable custom multi-step workflow, Agent Builder.
  • Step 4: If you need to connect to an external system, set up an MCP.

The most common mistake: users skip straight to Agent Builder when a plugin already exists.

23 How much does a Deep Agent query cost me? +

A typical run generates dozens of sub-prompts. A single session analyzing both web and RAG data may consume 500–1,000 inference tokens.

Default model is GPT-4o, chosen for cost/performance balance. Switch to a cheaper model if you're running many queries; switch to a reasoning model if precision matters.

burn rate Running three or four Deep Agent sessions casually can exhaust a day's inference budget on smaller plans.

24 How do I connect Ask Sage to GitHub / Microsoft 365 / my internal tools? +

MCP (Model Context Protocol). Built-in support for Microsoft 365 and GitHub, plus custom MCP servers for internal integrations.

Add and manage MCPs via Settings → Manage your MCPs. On enterprise accounts, admins can whitelist which MCP servers are allowed. If you try to add one and get blocked, route to your admin.

25 I'm a developer. How do I authenticate against the API? +

Two-step, and non-standard — budget a little time for this:

  • Step 1: Generate an API key under Settings → Account → Manage your API Keys.
  • Step 2: POST your API key + email to /user/get-token-with-api-key to receive an access token.
  • Step 3: Pass the token as an x-access-tokens header on every subsequent call.

Not Authorization: Bearer. Custom header name. This trips up every developer once.

26 Can I use my existing OpenAI or Anthropic SDK? +

Yes. Ask Sage publishes compatibility guides for OpenAI-style, Anthropic-style, and Gemini-style endpoints. Point your existing client at the Ask Sage base URL and most code works unchanged.

This is the cleanest migration path if your team has code already written against those SDKs. Full native API is richer (personas, datasets as first-class parameters) but the compat layer unblocks day one.

27 Where's the Swagger / API reference? +

Two Swaggers, because there are two APIs:

  • Server API (v1.56) — inference, datasets, plugins, agents, MCP.
  • User API (v1.21) — auth, user management, dataset CRUD, API keys.

Both are linked from the API documentation page, viewable in Swagger UI or Redocly. Python client available for rapid prototyping.

Module 4 takeaway

Climb the ladder. Don't skip to the top.

Plugin first. Deep Agent when you need research. Agent Builder when you need a reusable workflow. MCP when you need external integration. Users who skip to Agent Builder on day one build fragile custom workflows that a plugin could have handled in one call. When in doubt, check the plugin directory.


Back to top

Copyright © 2026 Ask Sage Inc. All Rights Reserved. Ask Sage is a BigBear.ai company.