Skip to content

Commit 8c617e3

Browse files
authored
add vector search (#81)
* update * update * update title
1 parent 49a5a8c commit 8c617e3

3 files changed

Lines changed: 115 additions & 4 deletions

File tree

sqlite-cloud/_nav.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ const sidebarNav: SidebarNavStruct = [
2525
{ title: "Edge Functions", filePath: "edge-functions", type: "inner", level: 0 },
2626
{ title: "Webhooks", filePath: "webhooks", type: "inner", level: 0 },
2727
{ title: "Pub/Sub", filePath: "pub-sub", type: "inner", level: 0 },
28+
{ title: "Vector", filePath: "vector", type: "inner", level: 0 },
2829
{ title: "Scaling", type: "inner", filePath: "scaling", level: 0 },
2930
{ title: "Security and Access Control", filePath: "security", type: "inner", level: 0 },
3031
{ title: "Backups", filePath: "backups", type: "inner", level: 0 },

sqlite-cloud/platform/extensions.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ SQLite Cloud comes with the following pre-installed SQLite extensions. These ext
1010

1111
## Extensions
1212
- **[Full-text Search 5](https://www.sqlite.org/fts5.html)**: Full-text search engine that allows you to search for text in a database.
13-
- **[JSON1](https://www.sqlite.org/json1.html)**: Extension that allows you to store, query, and manipulate JSON data.
14-
- **[Math](https://www.sqlite.org/lang_mathfunc.html)**: Extension that provides mathematical functions.
15-
- **[RTree](https://www.sqlite.org/rtree.html)**: Extension that provides an R-Tree index for storing and querying spatial data.
16-
- **[Geopoly](https://www.sqlite.org/geopoly.html)**: Extension that provides functions for working with geospatial data.
13+
- **[JSON1](https://www.sqlite.org/json1.html)**: Allows you to easily store, query, and manipulate JSON data.
14+
- **[Math](https://www.sqlite.org/lang_mathfunc.html)**: Mathematical functions.
15+
- **[RTree](https://www.sqlite.org/rtree.html)**: R-Tree index for storing and querying spatial data.
16+
- **[Geopoly](https://www.sqlite.org/geopoly.html)**: Functions for working with geospatial data.
17+
- **[sqlite-vec](/docs/vector-search)**: Vector storage extension for similarity search.
1718

1819
In the future, we plan to allow users to install their own extensions. If you have a specific extension you would like to use, please let us know by [adding to this issue](https://github.com/sqlitecloud/docs/issues/34).

sqlite-cloud/platform/vector.mdx

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
---
2+
title: SQLite Cloud Vector Search
3+
description: Vector storage extension for similarity search in SQLite Cloud.
4+
category: platform
5+
status: publish
6+
slug: vector
7+
---
8+
Every SQLite Cloud database comes with the `sqlite-vec` extension pre-installed. This allows you to store and query vectors in your database, which enables similarity search functionality.
9+
10+
## Overview
11+
`sqlite-vec` is a no-dependency SQLite extension for vector search, written entirely in a single C file. It's extremely portable, works in most operating systems and environments, and is MIT/Apache-2 dual licensed.
12+
13+
Using sqlite-vec is similar to using full-text search in SQLite. Declare a "virtual table" with vector columns, insert data with normal INSERT INTO statements, and query with normal SELECT statements.
14+
15+
`sqlite-vec` is currently built and optimized for brute-force vector search. This means there is no approximate nearest neighbor search available, limiting the number of vectors that can be searched in a reasonable amount of time.
16+
17+
## Usage
18+
### Create a vector table
19+
20+
To create a virtual vector table, use vec0 and the following syntax:
21+
22+
```sql
23+
create virtual table vec_table_name using vec0(
24+
id integer primary key autoincrement,
25+
embedding float[384]
26+
-- other columns like:
27+
-- text text,
28+
-- metadata blob,
29+
);
30+
```
31+
32+
### Insert vectors
33+
Insert vectors as you would with any other data:
34+
35+
```sql
36+
insert into vec_table_name(embedding) values
37+
('[0.1, 0.2, ...]'),
38+
('[0.3, 0.4, ...]'),
39+
('[0.5, 0.6, ...]');
40+
```
41+
### Execute a similarity search query
42+
To search for similar vectors, use the following syntax:
43+
44+
```sql
45+
select
46+
rowid,
47+
distance
48+
from vec_table_name
49+
where embedding match <your query embedding>
50+
and k = 20;
51+
```
52+
53+
The value of k sets the number of nearest neighbors to return. For more on nearest neighbor searches, check out our article on the topic.
54+
55+
## Quantization
56+
Vector quantization is a category of techniques to compress the individual elements inside of a floating point vector. In a float vector, each element is stored as a 32-bit floating point number. For longer vectors, this will quickly require a large amount of storage.
57+
58+
To reduce storage requirements with minimal loss of accuracy, we recommend using bit vectors as a method of quantization. With bit vector, each dimension in the vector takes up 1 bit. This method delivers up to a 32x reduction in storage requirements.
59+
60+
When using bit vectors, we recommend using embedding models that are trained on binary quantization loss. This will help maintain accuracy even after converting to binary.
61+
62+
To convert a float vector to a binary vector, use the vec_quantize_binary() function:
63+
64+
```sql
65+
create virtual table vec_table using vec0(
66+
embedding float[1536]
67+
);
68+
69+
-- slim because "embedding_coarse" is quantized 32x to a bit vector
70+
create virtual table vec_table_slim using vec0(
71+
embedding_coarse bit[1536]
72+
);
73+
74+
insert into vec_table_slim
75+
select rowid, vec_quantize_binary(embedding) from vec_table;
76+
```
77+
## Matryoshka embeddings
78+
sqlite-vec also supports Matryoshka embeddings, a technique in some embeddings models that allows you to "truncate" excess dimensions of a given vector, without a significant loss in quality.
79+
80+
Matryoshka embedding save on storage and result in faster queries.
81+
82+
To create a Matryoshka embedding, use the vec_slice() function:
83+
84+
```sql
85+
86+
create virtual table vec_items using vec0(
87+
embedding float[1536]
88+
);
89+
90+
-- slim because "embedding" is a truncated version of the full vector
91+
create virtual table vec_items_slim using vec0(
92+
embedding_coarse float[512]
93+
);
94+
95+
insert into vec_items_slim
96+
select
97+
rowid,
98+
vec_normalize(vec_slice(embedding, 0, 512))
99+
from vec_items;
100+
```
101+
102+
## Performance considerations
103+
Free SQLite Cloud plans are not optimized for vector workloads. To speak to the team about upgrading your plan, [please reach out](https://www.sqlitecloud.io/support).
104+
105+
## Next Steps
106+
Combined with [edge functions](/docs/edge-functions), SQLite Cloud's vector search capabilities make it a great choice for serverless RAG applications.
107+
108+
109+

0 commit comments

Comments
 (0)