SQL Reference
GitHub
SQL Reference

MindsDB — Semantic Search via SQL

MindsDB lets you build semantic search pipelines entirely in SQL. Connect your data sources, shape them with views and tables, index unstructured content into knowledge bases, and expose everything through a natural-language agent — all without leaving your SQL client.

🗄️ Connect Data
──→
🧠 Index in KB
──→
🤖 Query via Agent
optional
🔧 Views
+
📁 Projects

Quickstart

Five SQL statements to go from zero to a working semantic search agent:

SQL
-- 1. Connect your data source
CREATE DATABASE my_pg WITH ENGINE = 'postgres',
PARAMETERS = {"host":"localhost","port":5432,"user":"user","password":"pass","database":"mydb"};

-- 2. Create a project to organize work
CREATE PROJECT search_project;

-- 3. Create a knowledge base (semantic index)
CREATE KNOWLEDGE_BASE search_project.docs_kb
USING
  embedding_model = {"provider":"openai","model_name":"text-embedding-3-large","api_key":"sk-..."},
  content_columns = ['body'],
  metadata_columns = ['title','category'],
  id_column = 'doc_id';

-- 4. Index your data
INSERT INTO search_project.docs_kb
  SELECT doc_id, title, category, body FROM my_pg.documents;

-- 5. Build the agent
CREATE AGENT search_project.my_agent
USING
  model = {"provider":"openai","model_name":"gpt-4o","api_key":"sk-..."},
  data = {"knowledge_bases":["search_project.docs_kb"]},
  prompt_template = 'docs_kb contains product documentation. Answer user questions using it.';

-- Query it
SELECT answer FROM search_project.my_agent
WHERE question = 'How do I reset my password?';

Setup

# Docker

The fastest way to run MindsDB locally. Choose the image that fits your integrations:

ImageIncludes
mindsdb/mindsdb:latestmysql, postgresql, snowflake, bigquery, mssql, salesforce
mindsdb/mindsdb:huggingfaceAll of the above + HuggingFace
BASH
docker run --name mindsdb_container \
  -e MINDSDB_APIS=http,mysql \
  -p 47334:47334 -p 47335:47335 \
  mindsdb/mindsdb

Access the MindsDB editor at http://127.0.0.1:47334. The MySQL API is available at port 47335.

💡

Persist data: add -v $(pwd)/mdb_data:/root/mdb_storage to keep models and configs across restarts.

With auth: pass -e MINDSDB_USERNAME=admin -e MINDSDB_PASSWORD=password to enable login.

Local databases: use http://host.docker.internal instead of localhost when the target DB runs in another container.

Container management

BASH
docker stop mindsdb_container    # stop
docker start mindsdb_container   # restart (preserves state)
docker logs -f mindsdb_container # follow logs

Install integration dependencies

Once the container is running, install extra integrations directly from the editor (Settings → Manage Integrations), or via shell:

BASH
docker exec mindsdb_container pip install mindsdb[pgvector]

# pip install

BASH
pip install mindsdb

# with extras
pip install mindsdb[pgvector,openai,postgres]
BASH — start
python -m mindsdb

Editor: http://127.0.0.1:47334  ·  MySQL API: port 47335  ·  PostgreSQL API: port 47336

# MindsDB Cloud

Sign up at cloud.mindsdb.com — no installation required. The SQL editor is available immediately. All SQL statements in this reference work identically on Cloud.

# Connect Clients

MindsDB exposes a MySQL-compatible wire protocol. Any MySQL client can connect:

ClientHostPortUserPassword
MySQL CLI / DBeaver / TablePlus127.0.0.147335mindsdb(empty)
MindsDB Editorhttp://127.0.0.1:47334
SQLAlchemymysql+pymysql://mindsdb@127.0.0.1:47335/mindsdb
BASH — MySQL CLI
mysql -h 127.0.0.1 --port 47335 -u mindsdb -p
PYTHON — SQLAlchemy
from sqlalchemy import create_engine
engine = create_engine("mysql+pymysql://mindsdb@127.0.0.1:47335/mindsdb")
with engine.connect() as conn:
    result = conn.execute("SELECT answer FROM my_agent WHERE question = 'hello'")

Databases

Databases are connections to external data sources — your Postgres, MySQL, S3, Snowflake, MongoDB, etc. MindsDB never copies your data; it queries it live each time.

# CREATE DATABASE

CREATE DATABASEConnect an external data source
CREATE DATABASE [IF NOT EXISTS] datasource_name
[WITH] [ENGINE [=] engine_name] [,]
[PARAMETERS [=] {
  "key": "value",
  ...
}];
datasource_nameUnique identifier for this connection within MindsDB.
engine_nameThe handler to use (e.g. 'postgres', 'mysql', 'mongodb', 'snowflake', 's3').
PARAMETERSJSON object of connection parameters specific to the engine (host, port, user, password, database, etc.).

Examples

PostgreSQL
CREATE DATABASE my_postgres
WITH ENGINE = 'postgres',
PARAMETERS = {
  "host": "127.0.0.1",
  "port": 5432,
  "user": "postgres",
  "password": "password",
  "database": "mydb"
};
MySQL
CREATE DATABASE my_mysql
WITH ENGINE = 'mysql',
PARAMETERS = {
  "host": "127.0.0.1",
  "port": 3306,
  "user": "root",
  "password": "password",
  "database": "mydb"
};
MongoDB
CREATE DATABASE my_mongo
WITH ENGINE = 'mongodb',
PARAMETERS = {
  "host": "mongodb+srv://user:pass@cluster.mongodb.net",
  "database": "mydb"
};
💡

Query all available data handlers: SELECT * FROM information_schema.handlers WHERE type = 'data';

# DROP DATABASE

DROP DATABASERemove a connected data source
DROP DATABASE [IF EXISTS] database_name;

Dropping a database removes the connection only. No data in the external source is affected.

# SHOW DATABASES

SQL
-- List all databases and projects
SHOW DATABASES;
SHOW FULL DATABASES;

-- Filter to data sources only
SHOW FULL DATABASES WHERE type = 'data';

-- Via information_schema
SELECT * FROM information_schema.databases;

-- List tables in a database
SHOW TABLES FROM datasource_name;

# USE

Switch context to a database or project. Subsequent queries can omit the database prefix.

SQL
USE datasource_name;

-- Now you can query directly
SELECT * FROM table_name LIMIT 10;

-- Switch to a project
USE my_project;

# Native Queries

Pass database-native syntax through MindsDB without translation. Useful for database-specific functions, MongoDB-QL, Snowflake SQL extensions, etc.

SQL — PostgreSQL native query
SELECT * FROM my_postgres (
  SELECT
    model,
    year,
    price,
    ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml,
    COUNT(*) OVER (PARTITION BY model, year) AS units_to_sell
  FROM demo_data.used_cars
);
SQL — MongoDB-QL native query
SELECT * FROM my_mongo (
  db.products.find({"category": "electronics"}).limit(50)
);
SQL — create view from native query
CREATE VIEW enriched_cars FROM my_postgres (
  SELECT *, ROUND(CAST((mpg / 2.3521458) AS numeric), 1) AS kml
  FROM demo_data.used_cars
);

Projects

Projects are namespaces that hold knowledge bases, agents, views, and jobs. They let you organize your AI pipeline by use case (e.g. search_prod, search_staging).

# CREATE PROJECT

CREATE PROJECT
CREATE PROJECT [IF NOT EXISTS] project_name;
SQL
CREATE PROJECT search_project;
CREATE PROJECT IF NOT EXISTS analytics;

# DROP PROJECT

DROP PROJECT
DROP PROJECT [IF EXISTS] project_name;

# SHOW PROJECTS

SQL
-- List all projects
SHOW DATABASES WHERE type = 'project';
SHOW FULL DATABASES WHERE type = 'project';

-- List objects in a project
SHOW TABLES FROM project_name;
SHOW VIEWS FROM project_name;
SHOW KNOWLEDGE_BASES FROM project_name;

Tables & Views

Use tables and views to shape and materialize data before indexing it into a knowledge base. Views are saved SELECT statements (virtual); tables are materialized results (physical).

# CREATE TABLE

Create an empty table or materialize a query result into an integration.

CREATE TABLEEmpty table or filled from query
-- Empty table
CREATE TABLE integration_name.table_name (
  column_name data_type,
  ...
);

-- Filled from query
CREATE TABLE integration_name.table_name (SELECT ...);

-- Replace if exists
CREATE OR REPLACE TABLE integration_name.table_name (SELECT ...);
SQL — materialize cleaned data
-- Clean and materialize support tickets for KB ingestion
CREATE OR REPLACE TABLE my_pg.clean_tickets (
  SELECT
    id,
    category,
    TRIM(LOWER(subject))  AS subject,
    body,
    created_at
  FROM my_pg.raw_support_tickets
  WHERE body IS NOT NULL
    AND LENGTH(body) > 20
);

# DROP TABLE

DROP TABLE
DROP TABLE table_name;
DROP TABLE files.uploaded_file;

# CREATE VIEW

A view is a saved SELECT that executes on every access. Perfect for data preparation before sending to a knowledge base.

CREATE VIEW
CREATE VIEW [IF NOT EXISTS] [project_name.]view_name AS (
  SELECT columns
  FROM integration_name.table_name
  WHERE ...
);
SQL — create filtered view for KB ingestion
-- View joining tickets with product metadata for richer indexing
CREATE VIEW search_project.enriched_tickets AS (
  SELECT
    t.id,
    t.body,
    t.created_at,
    p.name    AS product_name,
    p.category AS product_category
  FROM my_pg.support_tickets t
  JOIN my_pg.products p ON t.product_id = p.id
  WHERE t.status = 'closed'
);
SQL — list views
SHOW VIEWS;
SHOW FULL VIEWS;

# ALTER VIEW

ALTER VIEW
-- Standard syntax
ALTER VIEW view_name [AS] (
  SELECT * FROM integration_name.table_name
);

-- With explicit source
ALTER VIEW view_name
FROM integration_name (
  SELECT * FROM table_name
);

# DROP VIEW

DROP VIEW
DROP VIEW [IF EXISTS] view_name;

SQL Reference

# SELECT

MindsDB supports standard SELECT with all standard clauses. Queries that reference one integration are pushed down to that engine. Cross-integration queries are executed in MindsDB's DuckDB-backed engine.

SQL — basic select
SELECT location, MAX(sqft)
FROM my_pg.home_rentals
GROUP BY location
LIMIT 5;
SQL — subquery on integration data
-- Wrap in subquery when integration doesn't support GROUP BY
SELECT type, MAX(bedrooms), LAST(price)
FROM my_mongo (
  db.house_sales.find().limit(300)
) GROUP BY 1;
SQL — UNION ALL
SELECT id, content, 'tickets' AS source FROM my_pg.support_tickets
UNION ALL
SELECT id, body  AS content, 'forums'  AS source FROM my_pg.forum_posts;

# INSERT INTO

Insert rows into an integration table from a subquery. The destination table must already exist.

INSERT INTO
INSERT INTO integration_name.table_name
  (SELECT ...);
SQL
-- Archive processed tickets into a separate table
INSERT INTO my_pg.archived_tickets (
  SELECT * FROM my_pg.support_tickets
  WHERE resolved_at < '2024-01-01'
);

# UPDATE

SQL — standard update
UPDATE my_pg.support_tickets
SET status = 'archived'
WHERE resolved_at < '2024-01-01';
SQL — update from select
UPDATE my_pg.products
ON product_id
FROM (
  SELECT product_id, new_description AS description
  FROM my_pg.product_updates
);

# DELETE

SQL
-- Simple delete
DELETE FROM my_pg.table_name
WHERE column_name = 'value';

-- Delete with subquery
DELETE FROM my_pg.support_tickets
WHERE id IN (
  SELECT id FROM my_pg.resolved_tickets
  WHERE resolved_at < '2023-01-01'
);

# JOIN

Standard SQL JOINs work across tables within the same integration or after bridging with a view/subquery. Use JOINs to denormalize and enrich data before indexing.

SQL — cross-table join for data prep
-- Enrich tickets with user and product info before KB insert
SELECT
  t.id,
  t.body,
  u.name       AS user_name,
  u.plan        AS user_plan,
  p.name        AS product_name,
  p.category    AS product_category
FROM my_pg.support_tickets t
JOIN my_pg.users    u ON t.user_id    = u.id
JOIN my_pg.products p ON t.product_id = p.id
WHERE t.body IS NOT NULL;
SQL — multi-source join via subqueries
SELECT pg_data.id, mongo_data.tags
FROM (SELECT id, title FROM my_pg.articles) AS pg_data
JOIN (SELECT article_id, tags FROM my_mongo.article_tags) AS mongo_data
  ON pg_data.id = mongo_data.article_id;

# CASE WHEN

Standard conditional logic in SELECT, WHERE, and other clauses.

SQL
SELECT
  id,
  body,
  CASE
    WHEN priority = 1              THEN 'critical'
    WHEN priority BETWEEN 2 AND 3 THEN 'high'
    WHEN priority = 4              THEN 'medium'
    ELSE                               'low'
  END AS priority_label
FROM my_pg.support_tickets;

# CTEs (WITH)

Common Table Expressions create named temporary result sets for modular, readable queries.

SQL
WITH
-- Step 1: Get recent tickets
recent AS (
  SELECT id, user_id, product_id, body
  FROM my_pg.support_tickets
  WHERE created_at > '2024-01-01'
),
-- Step 2: Join product info
enriched AS (
  SELECT
    r.id,
    r.body,
    p.name     AS product_name,
    p.category AS category
  FROM recent r
  JOIN my_pg.products p ON r.product_id = p.id
)
-- Final: Insert into KB
INSERT INTO search_project.tickets_kb
  SELECT * FROM enriched;

Knowledge Bases

A knowledge base is the semantic index at the heart of MindsDB's search capabilities. It combines an embedding model, an optional reranking model, and a vector store to enable context-aware retrieval over any data you load into it.

🧠

Knowledge bases match content by meaning, not keywords. "reset credentials" and "forgot password" return the same document even though no words overlap.

How it works

1

Create

Register the KB with an embedding model, optional reranking model, storage backend, and column mapping.

2

Insert

Feed rows from any table or view. Each row is chunked, embedded, and written to the vector store.

3

Query

Use WHERE content = '...' for semantic search, metadata columns for filtering, and relevance to threshold results.

4

Connect to Agent

Reference the KB in a CREATE AGENT statement — the agent reasons over it automatically.

# CREATE KNOWLEDGE_BASE

CREATE KNOWLEDGE_BASE
CREATE KNOWLEDGE_BASE [project_name.]kb_name
USING
  embedding_model  = { "provider": "...", "model_name": "...", "api_key": "..." },
  reranking_model  = { "provider": "...", "model_name": "...", "api_key": "..." },
  storage          = vector_db_conn.storage_table,
  metadata_columns = ['col1', 'col2', ...],
  content_columns  = ['col_a', 'col_b', ...],
  id_column        = 'id_col';
SQL — full example
-- First connect PGVector as the vector store
CREATE DATABASE my_pgvector
WITH ENGINE = 'pgvector',
PARAMETERS = {
  "host": "127.0.0.1",
  "port": 5432,
  "database": "postgres",
  "user": "user",
  "password": "password",
  "distance": "cosine"
};

-- Create the knowledge base
CREATE KNOWLEDGE_BASE search_project.support_kb
USING
  embedding_model = {
    "provider":   "openai",
    "model_name": "text-embedding-3-large",
    "api_key":    "sk-..."
  },
  reranking_model = {
    "provider":   "openai",
    "model_name": "gpt-4o",
    "api_key":    "sk-...",
    "method":     "multi-class"
  },
  storage          = my_pgvector.support_kb_store,
  metadata_columns = ['product_name', 'priority', 'created_at'],
  content_columns  = ['subject', 'body'],
  id_column        = 'ticket_id';

# Parameters

embedding_model

Required. Converts text into vector representations for similarity search.

ProviderRequired paramsOptional params
openaimodel_name, api_keybase_url, api_version
openai_azuremodel_name, api_key, base_url, api_version
googlemodel_name, api_key
bedrockmodel_name, aws_access_key_id, aws_region_name, aws_secret_access_keyaws_session_token
snowflakemodel_name, api_key, account_id
ollamamodel_name, base_url
💡

Define default_embedding_model in config.json to avoid specifying it on every CREATE KNOWLEDGE_BASE.

reranking_model

Optional. Scores result chunks for relevance using an LLM. Supports same providers as embedding_model. Set to false to disable.

SQL — disable reranking
CREATE KNOWLEDGE_BASE my_kb
USING
  embedding_model  = { ... },
  reranking_model  = false,
  ...;
methodDescription
multi-class (default)Classifies each chunk into 4 relevance levels (0.25 / 0.5 / 0.75 / 1.0). Relevance = weighted sum of class probabilities.
binaryRelevant / not relevant. Uses log probability of the positive class.

storage

The vector database to store embeddings in. Connect it first with CREATE DATABASE.

Recommended: PGVector ≥ 0.8.0 for best performance and hybrid search support.

💡

MindsDB Docker Desktop Extension includes a built-in PGVector — storage is optional when using it.

metadata_columns

Array of column names used as metadata. Metadata enables fast pre-filtering before or alongside semantic search.

A column cannot be in both metadata_columns and content_columns.

content_columns

Array of column names whose text gets chunked and embedded. Multiple columns are concatenated. Defaults to a column named content if not specified.

id_column

Column that uniquely identifies each source row. Optional — defaults to the MD5 hash of content columns. Used for upsert logic when re-inserting data.

SQL — auto-generate ID when none exists
INSERT INTO my_kb (
  SELECT ROW_NUMBER() OVER (ORDER BY created_at) AS id, *
  FROM my_pg.raw_documents
);

# ALTER KNOWLEDGE_BASE

Modify an existing KB configuration. The storage backend and embedding model type cannot be changed (would break existing embeddings), but you can rotate API keys, swap reranking models, and update column mappings.

ALTER KNOWLEDGE_BASE
ALTER KNOWLEDGE_BASE kb_name
USING
  param_name = value,
  ...;
SQL — rotate API key
ALTER KNOWLEDGE_BASE support_kb
USING
  embedding_model = { "api_key": "sk-new-key" };
SQL — swap reranking model
ALTER KNOWLEDGE_BASE support_kb
USING
  reranking_model = {
    "provider": "google",
    "model_name": "gemini-2.0-flash",
    "api_key": "AIza..."
  };
SQL — update metadata columns
-- Add new fields while keeping existing ones filterable
ALTER KNOWLEDGE_BASE support_kb
USING
  metadata_columns = ['product_name', 'priority', 'created_at', 'region'];

Changing metadata_columns doesn't remove old stored metadata. Only the columns listed in the most recent ALTER can be used in WHERE filters going forward.

# INSERT INTO Knowledge Base

Feed data into the KB. Rows are chunked, embedded, and stored in the vector database.

SQL — insert from table
INSERT INTO search_project.support_kb
  SELECT ticket_id, subject, body, product_name, priority, created_at
  FROM my_pg.support_tickets;
SQL — insert from view (enriched data)
INSERT INTO search_project.support_kb
  SELECT * FROM search_project.enriched_tickets;
SQL — incremental insert (new rows only)
INSERT INTO search_project.support_kb
  SELECT ticket_id, subject, body, product_name, priority, created_at
  FROM my_pg.support_tickets
  WHERE created_at > (
    SELECT MAX(created_at) FROM search_project.support_kb
  );

Performance options

SQL — skip duplicate check for bulk loads
-- Faster inserts when you know there are no duplicates
INSERT INTO my_kb
  SELECT * FROM my_pg.documents
USING kb_no_upsert = true;
💡

Track insert progress: SELECT * FROM information_schema.queries;

# Query Knowledge Base / Semantic Search

Query a knowledge base using the content pseudo-column for semantic search, metadata columns for structured filtering, and relevance for scoring.

Output columns

ColumnDescription
idSource row identifier (from id_column)
chunk_idChunk identifier: <id>:<n>of<total>:<start>to<end>
chunk_contentThe actual text of this chunk
metadataJSON blob with all stored metadata fields
metadata colsIndividual metadata columns (e.g. product_name, priority)
distanceRaw vector distance (lower = more similar)
relevanceScore 0–1 from reranking model (or 1/(1+distance) if no reranker)

Semantic search

SQL — basic semantic search
SELECT *
FROM search_project.support_kb
WHERE content = 'how do I reset my password'
LIMIT 5;
SQL — with relevance threshold
SELECT id, chunk_content, product_name, relevance
FROM search_project.support_kb
WHERE content = 'billing issue'
  AND relevance >= 0.6
LIMIT 10;

Metadata filtering

SQL — combine semantic + metadata filter
-- Only search within a specific product and priority
SELECT id, chunk_content, relevance
FROM search_project.support_kb
WHERE content = 'cannot connect'
  AND product_name = 'DataSync Pro'
  AND priority <= 2
  AND relevance >= 0.5;
SQL — metadata-only filter (no vector search)
SELECT *
FROM search_project.support_kb
WHERE product_name = 'DataSync Pro'
  AND created_at BETWEEN '2024-01-01' AND '2024-12-31';

Supported filtering operators

TypeOperators
Semantic (content col)= 'query', LIKE 'query', NOT LIKE, IN ('q1','q2'), NOT IN, OR, AND (intersection)
Metadata=, !=, <>, >, <, >=, <=, BETWEEN, LIKE, IN, NOT IN, AND, OR, NOT
Exclusionid != x, id NOT IN (SELECT id FROM kb WHERE content = '...')
💡

Default LIMIT is 10. Default relevance threshold is ≥ 0 (no filtering). Specify both to control result count and quality independently.

# Hybrid Search

Hybrid search combines semantic similarity (vector embeddings) with exact keyword matching (BM25 full-text index). Use it when your queries include specific identifiers, acronyms, product codes, or technical terms that embeddings might miss.

Hybrid search requires PGVector as the knowledge base storage backend.

SQL — enable hybrid search (default alpha)
SELECT *
FROM search_project.support_kb
WHERE content = 'ACME-213 error'
  AND hybrid_search = true;  -- alpha defaults to 0.5
SQL — tune the semantic/keyword balance
-- hybrid_search_alpha: 0 = pure keyword, 1 = pure semantic
SELECT *
FROM search_project.support_kb
WHERE content = 'ticket ERR-4421'
  AND hybrid_search_alpha = 0.3;  -- lean toward exact keyword match
SQL — disable reranker for hybrid search
-- Uses alpha-weighted average of BM25 + embedding scores instead
SELECT *
FROM search_project.support_kb
WHERE content = 'ERR-4421'
  AND hybrid_search_alpha = 0.2
  AND reranking = false;

How it works

When you trigger hybrid search, both paths run in parallel:

PathMethodBest for
SemanticEmbedding vector similarityConceptual queries, natural language, paraphrases
KeywordBM25 full-text indexExact terms, product codes, acronyms, ticket IDs

Results from both paths are merged and reranked (via the KB's reranking model if available, or via alpha-weighted averaging if not).

When to use hybrid search: any time users search for specific identifiers, technical terms, model numbers, or internal terminology alongside natural language queries.

# DESCRIBE KNOWLEDGE_BASE

SQL
DESCRIBE KNOWLEDGE_BASE support_kb;

-- List all knowledge bases
SHOW KNOWLEDGE_BASES;
SHOW KNOWLEDGE_BASES WHERE project = 'search_project';

The DESCRIBE output includes: NAME, PROJECT, STORAGE, PARAMS (embedding/reranking config), INSERT_STARTED_AT, INSERT_FINISHED_AT, PROCESSED_ROWS, ERROR.

# DROP KNOWLEDGE_BASE

DROP KNOWLEDGE_BASERemoves the KB and all stored embeddings
DROP KNOWLEDGE_BASE [IF EXISTS] kb_name;
DROP KNOWLEDGE_BASE [IF EXISTS] project_name.kb_name;

This permanently removes all embeddings and metadata from the vector store. Cannot be undone.

Agents

An agent is the conversational interface over your data. It combines an LLM with access to knowledge bases and database tables, enabling natural language queries over structured and unstructured data alike. This is the final piece of the semantic search pipeline.

How agents work

1

Input Processing

Builds a real-time data catalog from 5-row samples of each connected object. Extracts the question and structures LLM input.

2

Planning

Determines which knowledge bases and tables are relevant. Prepares SQL queries as needed.

3

Exploration Loop

Executes queries, collects results, adjusts if needed. Up to 20 queries per request.

4

Synthesis

Aggregates results and synthesizes a natural language or structured response.

💡

Performance tip: keep connected objects to ≤ 10. Create views to pre-aggregate and simplify data before connecting to the agent. The clearer your prompt_template, the more accurate the responses.

# CREATE AGENT

CREATE AGENT
CREATE AGENT [project_name.]agent_name
USING
  model = {
    "provider":    "openai",
    "model_name":  "gpt-4o",
    "api_key":     "sk-...",
    "base_url":    "https://...",   -- optional
    "api_version": "2024-02-01"    -- optional, required for Azure
  },
  data = {
    "knowledge_bases": ["project.kb_name", ...],
    "tables":          ["datasource.table_name", ...]
  },
  prompt_template = 'describe your data here',
  timeout = 30,
  mode    = 'text';  -- 'text' or 'sql'
SQL — semantic search agent
CREATE AGENT search_project.support_agent
USING
  model = {
    "provider":   "openai",
    "model_name": "gpt-4o",
    "api_key":    "sk-..."
  },
  data = {
    "knowledge_bases": ["search_project.support_kb"],
    "tables": [
      "my_pg.products",
      "my_pg.users"
    ]
  },
  prompt_template = '
    search_project.support_kb contains customer support tickets with fields:
      - chunk_content: ticket text
      - product_name:  product the ticket is about
      - priority:      1 (critical) to 4 (low)
      - created_at:    submission date
    my_pg.products contains product catalog.
    my_pg.users contains user account data.
    Answer questions accurately and cite which tickets are relevant.
  ',
  timeout = 60;

Supported LLM providers

ProviderNotable models
openaigpt-4o, gpt-4.1, gpt-4.1-mini, o3-mini, o4-mini
anthropicclaude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
googlegemini-2.5-pro-preview-03-25, gemini-2.0-flash, gemini-1.5-pro
bedrockAll AWS Bedrock models (requires aws_region_name, aws_access_key_id, aws_secret_access_key)
ollamallama2, mistral, mixtral, codellama, gemma, phi, qwen, and more
nvidia_nimmeta/llama-3_1-70b-instruct, mistralai/mistral-large, and more
writerpalmyra-x5, palmyra-x4

Connect all objects from a source at once

SQL — wildcard data connection
CREATE AGENT my_agent
USING
  model = { ... },
  data = {
    "knowledge_bases": ["search_project.*"],  -- all KBs in project
    "tables":          ["my_pg.*"]            -- all tables in datasource
  },
  prompt_template = '...';

Use default model from config

JSON — config.json
{
  "default_llm": {
    "provider":   "openai",
    "model_name": "gpt-4o",
    "api_key":    "sk-..."
  }
}

When default_llm is set, omit the model parameter from CREATE AGENT.

# SELECT FROM AGENT

Query the agent with a natural language question. The agent returns either a free-text answer or structured columns depending on how you write the SELECT.

SQL — natural language answer
SELECT answer
FROM search_project.support_agent
WHERE question = 'What are the most common issues with DataSync Pro?';
SQL — structured output
-- Agent formats its response to match the requested columns
SELECT issue_type, ticket_count, example_ticket_id
FROM search_project.support_agent
WHERE question = 'What are the top 5 issue types for DataSync Pro this month?';
SQL — override params at query time
-- Test with a different model without changing the agent definition
SELECT answer
FROM search_project.support_agent
WHERE question = 'Summarize open critical tickets'
USING
  model = {
    "provider":   "anthropic",
    "model_name": "claude-3-5-sonnet-20241022",
    "api_key":    "sk-ant-..."
  };

# ALTER AGENT

ALTER AGENTUpdate model, data, or prompt — any combination
ALTER AGENT agent_name
USING
  model           = { ... },
  data            = { "knowledge_bases": [...], "tables": [...] },
  prompt_template = '...';
SQL — add a new KB to an agent
ALTER AGENT search_project.support_agent
USING
  data = {
    "knowledge_bases": [
      "search_project.support_kb",
      "search_project.docs_kb"     -- new KB added
    ],
    "tables": ["my_pg.products"]
  };
SQL — upgrade the model
ALTER AGENT search_project.support_agent
USING
  model = {
    "provider":   "openai",
    "model_name": "gpt-4.1",
    "api_key":    "sk-..."
  };

# DROP / SHOW AGENTS

SQL
-- Drop an agent
DROP AGENT agent_name;
DROP AGENT project_name.agent_name;

-- List agents
SHOW AGENTS;
SHOW AGENTS WHERE project = 'search_project';
SHOW AGENTS WHERE name = 'support_agent';

Jobs

Jobs schedule any SQL statement (or sequence of statements) to run automatically — once at a future time, or repeatedly on an interval. Use jobs to keep your knowledge bases up to date as new data arrives.

# CREATE JOB

CREATE JOB
CREATE JOB [IF NOT EXISTS] [project_name.]job_name [AS] (
  <statement_1>[; <statement_2>][; ...]
)
[START <date>]
[END   <date>]
[EVERY [number] <period>]
[IF (<condition_statement>)];
ClauseDescription
START <date>When to begin. Defaults to now if omitted.
END <date>When to stop recurring. Omit for indefinite repetition.
EVERY [n] <period>Repetition frequency. Omit to run once. Period values: minute, hour, day, week, month.
IF (...)Only execute if the condition query returns rows.

Date format: 'YYYY-MM-DD HH:MM:SS' or 'YYYY-MM-DD'. Timezone: UTC.

SQL — refresh KB every hour
CREATE JOB search_project.refresh_support_kb (
  INSERT INTO search_project.support_kb
    SELECT ticket_id, subject, body, product_name, priority, created_at
    FROM my_pg.support_tickets
    WHERE created_at > LAST
)
EVERY hour;
SQL — conditional job (only if new data exists)
CREATE JOB search_project.conditional_kb_refresh (
  INSERT INTO search_project.support_kb
    SELECT ticket_id, subject, body, product_name, priority, created_at
    FROM my_pg.support_tickets
    WHERE created_at > LAST
)
EVERY 30 minutes
IF (
  SELECT * FROM my_pg.support_tickets
  WHERE created_at > LAST
);
SQL — multi-step job with dynamic table name
CREATE JOB search_project.daily_snapshot (
  CREATE TABLE my_pg.`kb_snapshot_{{START_DATE}}` (
    SELECT id, chunk_content, product_name, relevance
    FROM search_project.support_kb
    WHERE content = 'critical errors'
      AND relevance >= 0.7
  )
)
EVERY day;

Available template variables: {{START_DATETIME}}, {{START_DATE}}, {{PREVIOUS_START_DATETIME}}.

# LAST keyword

LAST stores the maximum value seen in the previous run. Use it to process only new rows on each execution — turning any data source into a stream.

SQL — basic LAST usage
-- First run: returns nothing (no prior state)
-- Second run: returns rows inserted since the first run
SELECT id, body
FROM my_pg.support_tickets
WHERE id > LAST;
SQL — LAST with seed value for first run
-- First run uses 1000 as the seed; subsequent runs use LAST
SELECT id, body
FROM my_pg.support_tickets
WHERE id > COALESCE(LAST, 1000);
💡

To reset the LAST context in the editor: SET context = 0; or SET context = null;

# DROP JOB

DROP JOB
DROP JOB [IF EXISTS] [project_name.]job_name;

# SHOW / Query JOBS

SQL
-- All jobs
SHOW JOBS;

-- Jobs in a project
SHOW JOBS WHERE project = 'search_project';

-- Full details
SELECT * FROM search_project.jobs;
SELECT * FROM information_schema.jobs;

-- Execution history (includes errors)
SELECT * FROM log.jobs_history
WHERE project = 'search_project'
ORDER BY run_start DESC
LIMIT 20;

The jobs table has columns: NAME, PROJECT, RUN_START, RUN_END, NEXT_RUN_AT, SCHEDULE_STR, QUERY. The history table adds ERROR.

Functions

# Standard Functions

MindsDB runs queries on DuckDB internally — all DuckDB functions are available. MySQL-style functions are adapted automatically.

DuckDB function categories

CategoryExamples
AggregateSUM, AVG, COUNT, MIN, MAX, LAST, LIST
TextLOWER, UPPER, TRIM, REPLACE, REGEXP_MATCHES, STRING_SPLIT
Date / TimeNOW, DATE_TRUNC, DATE_DIFF, STRFTIME, EPOCH
NumericROUND, FLOOR, CEIL, ABS, RANDOM, POW
JSONJSON_EXTRACT, JSON_OBJECT, JSON_ARRAY
WindowROW_NUMBER, RANK, LAG, LEAD, FIRST_VALUE
List / ArrayLIST_AGG, UNNEST, ARRAY_LENGTH, LIST_DISTINCT

MySQL-compatible functions

FunctionDescription
CHAR(), FORMAT(), INSTR(), LOCATE()String manipulation
LENGTH(), SUBSTRING_INDEX(), UNHEX()String utilities
ADDDATE(), DATE_ADD(), DATE_SUB(), DATEDIFF()Date arithmetic
DATE_FORMAT(), FROM_UNIXTIME(), CURDATE(), CURTIME()Date formatting
TIMESTAMPDIFF(), CONVERT_TZ()Timezone & diff
REGEXP_SUBSTR(), SHA2()Regex & crypto

# Variables

Store API keys and reusable values as session variables. Reference them in CREATE KNOWLEDGE_BASE, CREATE AGENT, and other statements.

SQL — set variables
-- From an environment variable (must start with MDB_)
SET @openai_key = from_env('MDB_OPENAI_API_KEY');

-- Directly
SET @pgvector_host = '127.0.0.1';

-- Use in CREATE statements
CREATE KNOWLEDGE_BASE my_kb
USING
  embedding_model = {
    "provider":   "openai",
    "model_name": "text-embedding-3-large",
    "api_key":    @openai_key
  },
  ...;

# FROM_ENV()

Pull values directly from environment variables into SQL statements. Only variables prefixed with MDB_ can be accessed.

SQL
-- Inline in a CREATE statement
CREATE KNOWLEDGE_BASE my_kb
USING
  embedding_model = {
    "provider":   "openai",
    "model_name": "text-embedding-3-large",
    "api_key":    from_env('MDB_OPENAI_API_KEY')
  },
  ...;

CREATE AGENT my_agent
USING
  model = {
    "provider":   "openai",
    "model_name": "gpt-4o",
    "api_key":    from_env('MDB_OPENAI_API_KEY')
  },
  ...;

Only environment variables with names starting with MDB_ are accessible via from_env().

MindsDB SQL Reference · Built with the Anton dark design system