MindsDB — Semantic Search via SQL
MindsDB lets you build semantic search pipelines entirely in SQL. Connect your data sources, shape them with views and tables, index unstructured content into knowledge bases, and expose everything through a natural-language agent — all without leaving your SQL client.
Quickstart
Five SQL statements to go from zero to a working semantic search agent:
-- 1. Connect your data source
CREATE DATABASE my_pg WITH ENGINE = 'postgres',
PARAMETERS = {"host":"localhost","port":5432,"user":"user","password":"pass","database":"mydb"};
-- 2. Create a project to organize work
CREATE PROJECT search_project;
-- 3. Create a knowledge base (semantic index)
CREATE KNOWLEDGE_BASE search_project.docs_kb
USING
embedding_model = {"provider":"openai","model_name":"text-embedding-3-large","api_key":"sk-..."},
content_columns = ['body'],
metadata_columns = ['title','category'],
id_column = 'doc_id';
-- 4. Index your data
INSERT INTO search_project.docs_kb
SELECT doc_id, title, category, body FROM my_pg.documents;
-- 5. Build the agent
CREATE AGENT search_project.my_agent
USING
model = {"provider":"openai","model_name":"gpt-4o","api_key":"sk-..."},
data = {"knowledge_bases":["search_project.docs_kb"]},
prompt_template = 'docs_kb contains product documentation. Answer user questions using it.';
-- Query it
SELECT answer FROM search_project.my_agent
WHERE question = 'How do I reset my password?';
Setup
# Docker
The fastest way to run MindsDB locally. Choose the image that fits your integrations:
| Image | Includes |
|---|---|
mindsdb/mindsdb:latest | mysql, postgresql, snowflake, bigquery, mssql, salesforce |
mindsdb/mindsdb:huggingface | All of the above + HuggingFace |
docker run --name mindsdb_container \
-e MINDSDB_APIS=http,mysql \
-p 47334:47334 -p 47335:47335 \
mindsdb/mindsdb
Access the MindsDB editor at http://127.0.0.1:47334. The MySQL API is available at port 47335.
Persist data: add -v $(pwd)/mdb_data:/root/mdb_storage to keep models and configs across restarts.
With auth: pass -e MINDSDB_USERNAME=admin -e MINDSDB_PASSWORD=password to enable login.
Local databases: use http://host.docker.internal instead of localhost when the target DB runs in another container.
Container management
docker stop mindsdb_container # stop
docker start mindsdb_container # restart (preserves state)
docker logs -f mindsdb_container # follow logs
Install integration dependencies
Once the container is running, install extra integrations directly from the editor (Settings → Manage Integrations), or via shell:
docker exec mindsdb_container pip install mindsdb[pgvector]
# pip install
pip install mindsdb
# with extras
pip install mindsdb[pgvector,openai,postgres]
python -m mindsdb
Editor: http://127.0.0.1:47334 · MySQL API: port 47335 · PostgreSQL API: port 47336
# MindsDB Cloud
Sign up at cloud.mindsdb.com — no installation required. The SQL editor is available immediately. All SQL statements in this reference work identically on Cloud.
# Connect Clients
MindsDB exposes a MySQL-compatible wire protocol. Any MySQL client can connect:
| Client | Host | Port | User | Password |
|---|---|---|---|---|
| MySQL CLI / DBeaver / TablePlus | 127.0.0.1 | 47335 | mindsdb | (empty) |
| MindsDB Editor | http://127.0.0.1:47334 | |||
| SQLAlchemy | mysql+pymysql://mindsdb@127.0.0.1:47335/mindsdb | |||
mysql -h 127.0.0.1 --port 47335 -u mindsdb -p
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://mindsdb@127.0.0.1:47335/mindsdb")
with engine.connect() as conn:
result = conn.execute("SELECT answer FROM my_agent WHERE question = 'hello'")
Databases
Databases are connections to external data sources — your Postgres, MySQL, S3, Snowflake, MongoDB, etc. MindsDB never copies your data; it queries it live each time.
# CREATE DATABASE
CREATE DATABASE [IF NOT EXISTS] datasource_name
[WITH] [ENGINE [=] engine_name] [,]
[PARAMETERS [=] {
"key": "value",
...
}];
'postgres', 'mysql', 'mongodb', 'snowflake', 's3').Examples
CREATE DATABASE my_postgres
WITH ENGINE = 'postgres',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"user": "postgres",
"password": "password",
"database": "mydb"
};
CREATE DATABASE my_mysql
WITH ENGINE = 'mysql',
PARAMETERS = {
"host": "127.0.0.1",
"port": 3306,
"user": "root",
"password": "password",
"database": "mydb"
};
CREATE DATABASE my_mongo
WITH ENGINE = 'mongodb',
PARAMETERS = {
"host": "mongodb+srv://user:pass@cluster.mongodb.net",
"database": "mydb"
};
Query all available data handlers: SELECT * FROM information_schema.handlers WHERE type = 'data';
# DROP DATABASE
DROP DATABASE [IF EXISTS] database_name;
Dropping a database removes the connection only. No data in the external source is affected.
# SHOW DATABASES
-- List all databases and projects
SHOW DATABASES;
SHOW FULL DATABASES;
-- Filter to data sources only
SHOW FULL DATABASES WHERE type = 'data';
-- Via information_schema
SELECT * FROM information_schema.databases;
-- List tables in a database
SHOW TABLES FROM datasource_name;
# USE
Switch context to a database or project. Subsequent queries can omit the database prefix.
USE datasource_name;
-- Now you can query directly
SELECT * FROM table_name LIMIT 10;
-- Switch to a project
USE my_project;
# Native Queries
Pass database-native syntax through MindsDB without translation. Useful for database-specific functions, MongoDB-QL, Snowflake SQL extensions, etc.
SELECT * FROM my_postgres (
SELECT
model,
year,
price,
ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml,
COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell
FROM demo_data.used_cars
);
SELECT * FROM my_mongo (
db.products.find({"category": "electronics"}).limit(50)
);
CREATE VIEW enriched_cars FROM my_postgres (
SELECT *, ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml
FROM demo_data.used_cars
);
Projects
Projects are namespaces that hold knowledge bases, agents, views, and jobs. They let you organize your AI pipeline by use case (e.g. search_prod, search_staging).
# CREATE PROJECT
CREATE PROJECT [IF NOT EXISTS] project_name;
CREATE PROJECT search_project;
CREATE PROJECT IF NOT EXISTS analytics;
# DROP PROJECT
DROP PROJECT [IF EXISTS] project_name;
# SHOW PROJECTS
-- List all projects
SHOW DATABASES WHERE type = 'project';
SHOW FULL DATABASES WHERE type = 'project';
-- List objects in a project
SHOW TABLES FROM project_name;
SHOW VIEWS FROM project_name;
SHOW KNOWLEDGE_BASES FROM project_name;
Tables & Views
Use tables and views to shape and materialize data before indexing it into a knowledge base. Views are saved SELECT statements (virtual); tables are materialized results (physical).
# CREATE TABLE
Create an empty table or materialize a query result into an integration.
-- Empty table
CREATE TABLE integration_name.table_name (
column_name data_type,
...
);
-- Filled from query
CREATE TABLE integration_name.table_name (SELECT ...);
-- Replace if exists
CREATE OR REPLACE TABLE integration_name.table_name (SELECT ...);
-- Clean and materialize support tickets for KB ingestion
CREATE OR REPLACE TABLE my_pg.clean_tickets (
SELECT
id,
category,
TRIM(LOWER(subject)) AS subject,
body,
created_at
FROM my_pg.raw_support_tickets
WHERE body IS NOT NULL
AND LENGTH(body) > 20
);
# DROP TABLE
DROP TABLE table_name;
DROP TABLE files.uploaded_file;
# CREATE VIEW
A view is a saved SELECT that executes on every access. Perfect for data preparation before sending to a knowledge base.
CREATE VIEW [IF NOT EXISTS] [project_name.]view_name AS (
SELECT columns
FROM integration_name.table_name
WHERE ...
);
-- View joining tickets with product metadata for richer indexing
CREATE VIEW search_project.enriched_tickets AS (
SELECT
t.id,
t.body,
t.created_at,
p.name AS product_name,
p.category AS product_category
FROM my_pg.support_tickets t
JOIN my_pg.products p ON t.product_id = p.id
WHERE t.status = 'closed'
);
SHOW VIEWS;
SHOW FULL VIEWS;
# ALTER VIEW
-- Standard syntax
ALTER VIEW view_name [AS] (
SELECT * FROM integration_name.table_name
);
-- With explicit source
ALTER VIEW view_name
FROM integration_name (
SELECT * FROM table_name
);
# DROP VIEW
DROP VIEW [IF EXISTS] view_name;
SQL Reference
# SELECT
MindsDB supports standard SELECT with all standard clauses. Queries that reference one integration are pushed down to that engine. Cross-integration queries are executed in MindsDB's DuckDB-backed engine.
SELECT location, MAX(sqft)
FROM my_pg.home_rentals
GROUP BY location
LIMIT 5;
-- Wrap in subquery when integration doesn't support GROUP BY
SELECT type, MAX(bedrooms), LAST(price)
FROM my_mongo (
db.house_sales.find().limit(300)
) GROUP BY 1;
SELECT id, content, 'tickets' AS source FROM my_pg.support_tickets
UNION ALL
SELECT id, body AS content, 'forums' AS source FROM my_pg.forum_posts;
# INSERT INTO
Insert rows into an integration table from a subquery. The destination table must already exist.
INSERT INTO integration_name.table_name
(SELECT ...);
-- Archive processed tickets into a separate table
INSERT INTO my_pg.archived_tickets (
SELECT * FROM my_pg.support_tickets
WHERE resolved_at < '2024-01-01'
);
# UPDATE
UPDATE my_pg.support_tickets
SET status = 'archived'
WHERE resolved_at < '2024-01-01';
UPDATE my_pg.products
ON product_id
FROM (
SELECT product_id, new_description AS description
FROM my_pg.product_updates
);
# DELETE
-- Simple delete
DELETE FROM my_pg.table_name
WHERE column_name = 'value';
-- Delete with subquery
DELETE FROM my_pg.support_tickets
WHERE id IN (
SELECT id FROM my_pg.resolved_tickets
WHERE resolved_at < '2023-01-01'
);
# JOIN
Standard SQL JOINs work across tables within the same integration or after bridging with a view/subquery. Use JOINs to denormalize and enrich data before indexing.
-- Enrich tickets with user and product info before KB insert
SELECT
t.id,
t.body,
u.name AS user_name,
u.plan AS user_plan,
p.name AS product_name,
p.category AS product_category
FROM my_pg.support_tickets t
JOIN my_pg.users u ON t.user_id = u.id
JOIN my_pg.products p ON t.product_id = p.id
WHERE t.body IS NOT NULL;
SELECT pg_data.id, mongo_data.tags
FROM (SELECT id, title FROM my_pg.articles) AS pg_data
JOIN (SELECT article_id, tags FROM my_mongo.article_tags) AS mongo_data
ON pg_data.id = mongo_data.article_id;
# CASE WHEN
Standard conditional logic in SELECT, WHERE, and other clauses.
SELECT
id,
body,
CASE
WHEN priority = 1 THEN 'critical'
WHEN priority BETWEEN 2 AND 3 THEN 'high'
WHEN priority = 4 THEN 'medium'
ELSE 'low'
END AS priority_label
FROM my_pg.support_tickets;
# CTEs (WITH)
Common Table Expressions create named temporary result sets for modular, readable queries.
WITH
-- Step 1: Get recent tickets
recent AS (
SELECT id, user_id, product_id, body
FROM my_pg.support_tickets
WHERE created_at > '2024-01-01'
),
-- Step 2: Join product info
enriched AS (
SELECT
r.id,
r.body,
p.name AS product_name,
p.category AS category
FROM recent r
JOIN my_pg.products p ON r.product_id = p.id
)
-- Final: Insert into KB
INSERT INTO search_project.tickets_kb
SELECT * FROM enriched;
Knowledge Bases
A knowledge base is the semantic index at the heart of MindsDB's search capabilities. It combines an embedding model, an optional reranking model, and a vector store to enable context-aware retrieval over any data you load into it.
Knowledge bases match content by meaning, not keywords. "reset credentials" and "forgot password" return the same document even though no words overlap.
How it works
Create
Register the KB with an embedding model, optional reranking model, storage backend, and column mapping.
Insert
Feed rows from any table or view. Each row is chunked, embedded, and written to the vector store.
Query
Use WHERE content = '...' for semantic search, metadata columns for filtering, and relevance to threshold results.
Connect to Agent
Reference the KB in a CREATE AGENT statement — the agent reasons over it automatically.
# CREATE KNOWLEDGE_BASE
CREATE KNOWLEDGE_BASE [project_name.]kb_name
USING
embedding_model = { "provider": "...", "model_name": "...", "api_key": "..." },
reranking_model = { "provider": "...", "model_name": "...", "api_key": "..." },
storage = vector_db_conn.storage_table,
metadata_columns = ['col1', 'col2', ...],
content_columns = ['col_a', 'col_b', ...],
id_column = 'id_col';
-- First connect PGVector as the vector store
CREATE DATABASE my_pgvector
WITH ENGINE = 'pgvector',
PARAMETERS = {
"host": "127.0.0.1",
"port": 5432,
"database": "postgres",
"user": "user",
"password": "password",
"distance": "cosine"
};
-- Create the knowledge base
CREATE KNOWLEDGE_BASE search_project.support_kb
USING
embedding_model = {
"provider": "openai",
"model_name": "text-embedding-3-large",
"api_key": "sk-..."
},
reranking_model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-...",
"method": "multi-class"
},
storage = my_pgvector.support_kb_store,
metadata_columns = ['product_name', 'priority', 'created_at'],
content_columns = ['subject', 'body'],
id_column = 'ticket_id';
# Parameters
embedding_model
Required. Converts text into vector representations for similarity search.
| Provider | Required params | Optional params |
|---|---|---|
openai | model_name, api_key | base_url, api_version |
openai_azure | model_name, api_key, base_url, api_version | — |
google | model_name, api_key | — |
bedrock | model_name, aws_access_key_id, aws_region_name, aws_secret_access_key | aws_session_token |
snowflake | model_name, api_key, account_id | — |
ollama | model_name, base_url | — |
Define default_embedding_model in config.json to avoid specifying it on every CREATE KNOWLEDGE_BASE.
reranking_model
Optional. Scores result chunks for relevance using an LLM. Supports same providers as embedding_model. Set to false to disable.
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = { ... },
reranking_model = false,
...;
| method | Description |
|---|---|
multi-class (default) | Classifies each chunk into 4 relevance levels (0.25 / 0.5 / 0.75 / 1.0). Relevance = weighted sum of class probabilities. |
binary | Relevant / not relevant. Uses log probability of the positive class. |
storage
The vector database to store embeddings in. Connect it first with CREATE DATABASE.
Recommended: PGVector ≥ 0.8.0 for best performance and hybrid search support.
MindsDB Docker Desktop Extension includes a built-in PGVector — storage is optional when using it.
metadata_columns
Array of column names used as metadata. Metadata enables fast pre-filtering before or alongside semantic search.
A column cannot be in both metadata_columns and content_columns.
content_columns
Array of column names whose text gets chunked and embedded. Multiple columns are concatenated. Defaults to a column named content if not specified.
id_column
Column that uniquely identifies each source row. Optional — defaults to the MD5 hash of content columns. Used for upsert logic when re-inserting data.
INSERT INTO my_kb (
SELECT ROW_NUMBER() OVER (ORDER BY created_at) AS id, *
FROM my_pg.raw_documents
);
# ALTER KNOWLEDGE_BASE
Modify an existing KB configuration. The storage backend and embedding model type cannot be changed (would break existing embeddings), but you can rotate API keys, swap reranking models, and update column mappings.
ALTER KNOWLEDGE_BASE kb_name
USING
param_name = value,
...;
ALTER KNOWLEDGE_BASE support_kb
USING
embedding_model = { "api_key": "sk-new-key" };
ALTER KNOWLEDGE_BASE support_kb
USING
reranking_model = {
"provider": "google",
"model_name": "gemini-2.0-flash",
"api_key": "AIza..."
};
-- Add new fields while keeping existing ones filterable
ALTER KNOWLEDGE_BASE support_kb
USING
metadata_columns = ['product_name', 'priority', 'created_at', 'region'];
Changing metadata_columns doesn't remove old stored metadata. Only the columns listed in the most recent ALTER can be used in WHERE filters going forward.
# INSERT INTO Knowledge Base
Feed data into the KB. Rows are chunked, embedded, and stored in the vector database.
INSERT INTO search_project.support_kb
SELECT ticket_id, subject, body, product_name, priority, created_at
FROM my_pg.support_tickets;
INSERT INTO search_project.support_kb
SELECT * FROM search_project.enriched_tickets;
INSERT INTO search_project.support_kb
SELECT ticket_id, subject, body, product_name, priority, created_at
FROM my_pg.support_tickets
WHERE created_at > (
SELECT MAX(created_at) FROM search_project.support_kb
);
Performance options
-- Faster inserts when you know there are no duplicates
INSERT INTO my_kb
SELECT * FROM my_pg.documents
USING kb_no_upsert = true;
Track insert progress: SELECT * FROM information_schema.queries;
# Query Knowledge Base / Semantic Search
Query a knowledge base using the content pseudo-column for semantic search, metadata columns for structured filtering, and relevance for scoring.
Output columns
| Column | Description |
|---|---|
id | Source row identifier (from id_column) |
chunk_id | Chunk identifier: <id>:<n>of<total>:<start>to<end> |
chunk_content | The actual text of this chunk |
metadata | JSON blob with all stored metadata fields |
| metadata cols | Individual metadata columns (e.g. product_name, priority) |
distance | Raw vector distance (lower = more similar) |
relevance | Score 0–1 from reranking model (or 1/(1+distance) if no reranker) |
Semantic search
SELECT *
FROM search_project.support_kb
WHERE content = 'how do I reset my password'
LIMIT 5;
SELECT id, chunk_content, product_name, relevance
FROM search_project.support_kb
WHERE content = 'billing issue'
AND relevance >= 0.6
LIMIT 10;
Metadata filtering
-- Only search within a specific product and priority
SELECT id, chunk_content, relevance
FROM search_project.support_kb
WHERE content = 'cannot connect'
AND product_name = 'DataSync Pro'
AND priority <= 2
AND relevance >= 0.5;
SELECT *
FROM search_project.support_kb
WHERE product_name = 'DataSync Pro'
AND created_at BETWEEN '2024-01-01' AND '2024-12-31';
Supported filtering operators
| Type | Operators |
|---|---|
| Semantic (content col) | = 'query', LIKE 'query', NOT LIKE, IN ('q1','q2'), NOT IN, OR, AND (intersection) |
| Metadata | =, !=, <>, >, <, >=, <=, BETWEEN, LIKE, IN, NOT IN, AND, OR, NOT |
| Exclusion | id != x, id NOT IN (SELECT id FROM kb WHERE content = '...') |
Default LIMIT is 10. Default relevance threshold is ≥ 0 (no filtering). Specify both to control result count and quality independently.
# Hybrid Search
Hybrid search combines semantic similarity (vector embeddings) with exact keyword matching (BM25 full-text index). Use it when your queries include specific identifiers, acronyms, product codes, or technical terms that embeddings might miss.
Hybrid search requires PGVector as the knowledge base storage backend.
SELECT *
FROM search_project.support_kb
WHERE content = 'ACME-213 error'
AND hybrid_search = true; -- alpha defaults to 0.5
-- hybrid_search_alpha: 0 = pure keyword, 1 = pure semantic
SELECT *
FROM search_project.support_kb
WHERE content = 'ticket ERR-4421'
AND hybrid_search_alpha = 0.3; -- lean toward exact keyword match
-- Uses alpha-weighted average of BM25 + embedding scores instead
SELECT *
FROM search_project.support_kb
WHERE content = 'ERR-4421'
AND hybrid_search_alpha = 0.2
AND reranking = false;
How it works
When you trigger hybrid search, both paths run in parallel:
| Path | Method | Best for |
|---|---|---|
| Semantic | Embedding vector similarity | Conceptual queries, natural language, paraphrases |
| Keyword | BM25 full-text index | Exact terms, product codes, acronyms, ticket IDs |
Results from both paths are merged and reranked (via the KB's reranking model if available, or via alpha-weighted averaging if not).
When to use hybrid search: any time users search for specific identifiers, technical terms, model numbers, or internal terminology alongside natural language queries.
# DESCRIBE KNOWLEDGE_BASE
DESCRIBE KNOWLEDGE_BASE support_kb;
-- List all knowledge bases
SHOW KNOWLEDGE_BASES;
SHOW KNOWLEDGE_BASES WHERE project = 'search_project';
The DESCRIBE output includes: NAME, PROJECT, STORAGE, PARAMS (embedding/reranking config), INSERT_STARTED_AT, INSERT_FINISHED_AT, PROCESSED_ROWS, ERROR.
# DROP KNOWLEDGE_BASE
DROP KNOWLEDGE_BASE [IF EXISTS] kb_name;
DROP KNOWLEDGE_BASE [IF EXISTS] project_name.kb_name;
This permanently removes all embeddings and metadata from the vector store. Cannot be undone.
Agents
An agent is the conversational interface over your data. It combines an LLM with access to knowledge bases and database tables, enabling natural language queries over structured and unstructured data alike. This is the final piece of the semantic search pipeline.
How agents work
Input Processing
Builds a real-time data catalog from 5-row samples of each connected object. Extracts the question and structures LLM input.
Planning
Determines which knowledge bases and tables are relevant. Prepares SQL queries as needed.
Exploration Loop
Executes queries, collects results, adjusts if needed. Up to 20 queries per request.
Synthesis
Aggregates results and synthesizes a natural language or structured response.
Performance tip: keep connected objects to ≤ 10. Create views to pre-aggregate and simplify data before connecting to the agent. The clearer your prompt_template, the more accurate the responses.
# CREATE AGENT
CREATE AGENT [project_name.]agent_name
USING
model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-...",
"base_url": "https://...", -- optional
"api_version": "2024-02-01" -- optional, required for Azure
},
data = {
"knowledge_bases": ["project.kb_name", ...],
"tables": ["datasource.table_name", ...]
},
prompt_template = 'describe your data here',
timeout = 30,
mode = 'text'; -- 'text' or 'sql'
CREATE AGENT search_project.support_agent
USING
model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-..."
},
data = {
"knowledge_bases": ["search_project.support_kb"],
"tables": [
"my_pg.products",
"my_pg.users"
]
},
prompt_template = '
search_project.support_kb contains customer support tickets with fields:
- chunk_content: ticket text
- product_name: product the ticket is about
- priority: 1 (critical) to 4 (low)
- created_at: submission date
my_pg.products contains product catalog.
my_pg.users contains user account data.
Answer questions accurately and cite which tickets are relevant.
',
timeout = 60;
Supported LLM providers
| Provider | Notable models |
|---|---|
openai | gpt-4o, gpt-4.1, gpt-4.1-mini, o3-mini, o4-mini |
anthropic | claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307 |
google | gemini-2.5-pro-preview-03-25, gemini-2.0-flash, gemini-1.5-pro |
bedrock | All AWS Bedrock models (requires aws_region_name, aws_access_key_id, aws_secret_access_key) |
ollama | llama2, mistral, mixtral, codellama, gemma, phi, qwen, and more |
nvidia_nim | meta/llama-3_1-70b-instruct, mistralai/mistral-large, and more |
writer | palmyra-x5, palmyra-x4 |
Connect all objects from a source at once
CREATE AGENT my_agent
USING
model = { ... },
data = {
"knowledge_bases": ["search_project.*"], -- all KBs in project
"tables": ["my_pg.*"] -- all tables in datasource
},
prompt_template = '...';
Use default model from config
{
"default_llm": {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": "sk-..."
}
}
When default_llm is set, omit the model parameter from CREATE AGENT.
# SELECT FROM AGENT
Query the agent with a natural language question. The agent returns either a free-text answer or structured columns depending on how you write the SELECT.
SELECT answer
FROM search_project.support_agent
WHERE question = 'What are the most common issues with DataSync Pro?';
-- Agent formats its response to match the requested columns
SELECT issue_type, ticket_count, example_ticket_id
FROM search_project.support_agent
WHERE question = 'What are the top 5 issue types for DataSync Pro this month?';
-- Test with a different model without changing the agent definition
SELECT answer
FROM search_project.support_agent
WHERE question = 'Summarize open critical tickets'
USING
model = {
"provider": "anthropic",
"model_name": "claude-3-5-sonnet-20241022",
"api_key": "sk-ant-..."
};
# ALTER AGENT
ALTER AGENT agent_name
USING
model = { ... },
data = { "knowledge_bases": [...], "tables": [...] },
prompt_template = '...';
ALTER AGENT search_project.support_agent
USING
data = {
"knowledge_bases": [
"search_project.support_kb",
"search_project.docs_kb" -- new KB added
],
"tables": ["my_pg.products"]
};
ALTER AGENT search_project.support_agent
USING
model = {
"provider": "openai",
"model_name": "gpt-4.1",
"api_key": "sk-..."
};
# DROP / SHOW AGENTS
-- Drop an agent
DROP AGENT agent_name;
DROP AGENT project_name.agent_name;
-- List agents
SHOW AGENTS;
SHOW AGENTS WHERE project = 'search_project';
SHOW AGENTS WHERE name = 'support_agent';
Jobs
Jobs schedule any SQL statement (or sequence of statements) to run automatically — once at a future time, or repeatedly on an interval. Use jobs to keep your knowledge bases up to date as new data arrives.
# CREATE JOB
CREATE JOB [IF NOT EXISTS] [project_name.]job_name [AS] (
<statement_1>[; <statement_2>][; ...]
)
[START <date>]
[END <date>]
[EVERY [number] <period>]
[IF (<condition_statement>)];
| Clause | Description |
|---|---|
START <date> | When to begin. Defaults to now if omitted. |
END <date> | When to stop recurring. Omit for indefinite repetition. |
EVERY [n] <period> | Repetition frequency. Omit to run once. Period values: minute, hour, day, week, month. |
IF (...) | Only execute if the condition query returns rows. |
Date format: 'YYYY-MM-DD HH:MM:SS' or 'YYYY-MM-DD'. Timezone: UTC.
CREATE JOB search_project.refresh_support_kb (
INSERT INTO search_project.support_kb
SELECT ticket_id, subject, body, product_name, priority, created_at
FROM my_pg.support_tickets
WHERE created_at > LAST
)
EVERY hour;
CREATE JOB search_project.conditional_kb_refresh (
INSERT INTO search_project.support_kb
SELECT ticket_id, subject, body, product_name, priority, created_at
FROM my_pg.support_tickets
WHERE created_at > LAST
)
EVERY 30 minutes
IF (
SELECT * FROM my_pg.support_tickets
WHERE created_at > LAST
);
CREATE JOB search_project.daily_snapshot (
CREATE TABLE my_pg.`kb_snapshot_{{START_DATE}}` (
SELECT id, chunk_content, product_name, relevance
FROM search_project.support_kb
WHERE content = 'critical errors'
AND relevance >= 0.7
)
)
EVERY day;
Available template variables: {{START_DATETIME}}, {{START_DATE}}, {{PREVIOUS_START_DATETIME}}.
# LAST keyword
LAST stores the maximum value seen in the previous run. Use it to process only new rows on each execution — turning any data source into a stream.
-- First run: returns nothing (no prior state)
-- Second run: returns rows inserted since the first run
SELECT id, body
FROM my_pg.support_tickets
WHERE id > LAST;
-- First run uses 1000 as the seed; subsequent runs use LAST
SELECT id, body
FROM my_pg.support_tickets
WHERE id > COALESCE(LAST, 1000);
To reset the LAST context in the editor: SET context = 0; or SET context = null;
# DROP JOB
DROP JOB [IF EXISTS] [project_name.]job_name;
# SHOW / Query JOBS
-- All jobs
SHOW JOBS;
-- Jobs in a project
SHOW JOBS WHERE project = 'search_project';
-- Full details
SELECT * FROM search_project.jobs;
SELECT * FROM information_schema.jobs;
-- Execution history (includes errors)
SELECT * FROM log.jobs_history
WHERE project = 'search_project'
ORDER BY run_start DESC
LIMIT 20;
The jobs table has columns: NAME, PROJECT, RUN_START, RUN_END, NEXT_RUN_AT, SCHEDULE_STR, QUERY. The history table adds ERROR.
Functions
# Standard Functions
MindsDB runs queries on DuckDB internally — all DuckDB functions are available. MySQL-style functions are adapted automatically.
DuckDB function categories
| Category | Examples |
|---|---|
| Aggregate | SUM, AVG, COUNT, MIN, MAX, LAST, LIST |
| Text | LOWER, UPPER, TRIM, REPLACE, REGEXP_MATCHES, STRING_SPLIT |
| Date / Time | NOW, DATE_TRUNC, DATE_DIFF, STRFTIME, EPOCH |
| Numeric | ROUND, FLOOR, CEIL, ABS, RANDOM, POW |
| JSON | JSON_EXTRACT, JSON_OBJECT, JSON_ARRAY |
| Window | ROW_NUMBER, RANK, LAG, LEAD, FIRST_VALUE |
| List / Array | LIST_AGG, UNNEST, ARRAY_LENGTH, LIST_DISTINCT |
MySQL-compatible functions
| Function | Description |
|---|---|
CHAR(), FORMAT(), INSTR(), LOCATE() | String manipulation |
LENGTH(), SUBSTRING_INDEX(), UNHEX() | String utilities |
ADDDATE(), DATE_ADD(), DATE_SUB(), DATEDIFF() | Date arithmetic |
DATE_FORMAT(), FROM_UNIXTIME(), CURDATE(), CURTIME() | Date formatting |
TIMESTAMPDIFF(), CONVERT_TZ() | Timezone & diff |
REGEXP_SUBSTR(), SHA2() | Regex & crypto |
# Variables
Store API keys and reusable values as session variables. Reference them in CREATE KNOWLEDGE_BASE, CREATE AGENT, and other statements.
-- From an environment variable (must start with MDB_)
SET @openai_key = from_env('MDB_OPENAI_API_KEY');
-- Directly
SET @pgvector_host = '127.0.0.1';
-- Use in CREATE statements
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name": "text-embedding-3-large",
"api_key": @openai_key
},
...;
# FROM_ENV()
Pull values directly from environment variables into SQL statements. Only variables prefixed with MDB_ can be accessed.
-- Inline in a CREATE statement
CREATE KNOWLEDGE_BASE my_kb
USING
embedding_model = {
"provider": "openai",
"model_name": "text-embedding-3-large",
"api_key": from_env('MDB_OPENAI_API_KEY')
},
...;
CREATE AGENT my_agent
USING
model = {
"provider": "openai",
"model_name": "gpt-4o",
"api_key": from_env('MDB_OPENAI_API_KEY')
},
...;
Only environment variables with names starting with MDB_ are accessible via from_env().
MindsDB SQL Reference · Built with the Anton dark design system