Skip to content

Latest commit

 

History

History
107 lines (86 loc) · 11.6 KB

File metadata and controls

107 lines (86 loc) · 11.6 KB
id connect_to_query_engines
title Connect to Query Engines
sidebar_label Connect to Query Engines

Overview

Querybook supports all Sqlalchemy compatible query engines by default. Basic functionalities such as query execution, table metadata, and auto-completion are provided out of the box. However, more advanced integrations would require customized code. Overall, the query engines can be categorized into 3-tiers:

Tier Tier 3 Tier 2 Tier 1
Summary Not tested Tested w/ DB Used in Production
Library Sqlalchemy Custom/SqlAlchemy Custom
Run Queries
Paginated Result Fetch
Syntax highlight & Autocomplete
Query Progress x ?
Query Logs x ?
Query Metadata x ?
Cancel Query x ?
User Authentication x x
Syntax Error Parsing x ?
Service discovery x x
Language Specific Autocomplete x x

Tier 1 does not mean engines can be used in production everywhere since different companies/org require different kinds of integrations. However, tier 1 databases provide an excellent foundation to extend additional functionalities. Use them as a reference or subclass them via the query engine plugin.

If you have tried any of the tier 3 databases and confirmed it works, please update this doc to let others know.

Query Engine Support

Querybook only supports a few of the Tier 1 & 2 databases by default. When Querybook is launched, it checks with SqlAlchemy to see if any of the databases below are available. If so, the query engine would be automatically available to set up in the Admin UI. Please see the step by step guide below to see an working example.

Step by step guide

In this guide, we will go through adding Amazon Redshift query engine to Querybook. This serves as an example to adding all sqlalchemy-compatible query engines.

  1. Clone and download the repo
git clone git@github.com:pinterest/querybook.git
cd querybook
  1. Create a local.txt under requirements/ folder in the project's root directory
touch requirements/local.txt
  1. Add the required packages
echo -e "sqlalchemy-redshift\nredshift_connector" > requirements/local.txt
  1. Start the container
make
  1. Register as a new user and use the demo setup.
  2. Visit https://localhost:10001/admin/query_engine/ and create a new query engine. Put redshift as the language and generic-sqlalchemy as the executor. In the Executor Params, put the connection string (as specified by SqlAlchemy) in the Connection_string field.
  3. Go to https://localhost:10001/admin/environment/1/ and add the Redshift engine under the demo_environment.
  4. Now you can run queries against the new Redshift engine in https://localhost:10001/demo_environment/adhoc/.
  5. To include table metadata and autocompletion, you would need to add a metastore. Visit https://localhost:10001/admin/metastore/ and create a new metastore. Use SqlAlchemyMetastoreLoader with the exact connection string used for the query engine. Click on Save -> CREATE SCHEDULE -> Create Task. Now click on Run Task to sync. You can view the progress in the History tab. Wait until it is completed (Should be done in seconds if the number of tables is small).
  6. Go to your query engine page on https://localhost:10001/admin/query_engine/, in the Metastore field, choose the metastore you just created and click Save.
  7. Visit https://localhost:10001/demo_environment/adhoc/ again and the auto complete feature should be available. You can also view all tables by clicking on the Tables button on the left sidebar and select the specific metastore.

All Query Engines

Note: If the query engine is not included below, but it does have a Sqlalchemy integration, you can still use it in Querybook. Follow the step by step guide with 1 additional step before step 4. Visit <project_root>/querybook/server/lib/query_executor/sqlalchemy.py and add the query engine to the list variable SQLALCHEMY_SUPPORTED_DIALECTS, and continue to step 4. If it works, please contribute to Querybook by submitting a PR of your changes.

Query Engine Tier Package
Apache Drill 3 sqlalchemy-drill
Apache Hive 1 pyhive OR -r engines/hive.txt
Apache Kylin 3 kylinpy
Apache Solr 3 sqlalchemy-solr
Amazon Athena 3 pyathena
Amazon Redshift 2 sqlalchemy-redshift
redshift_connector OR -r engines/redshift.txt
BigQuery 2 google-cloud-bigquery
OR -r engines/bigquery.txt
ClickHouse 3 clickhouse-sqlalchemy
clickhouse-driver
CockroachDB 3 sqlalchemy-cockroachdb
psycopg2
CrateDB 3 crate
Dremio 3 sqlalchemy-dremio
Druid 2 pydruid OR -r engines/druid.txt
ElasticSearch 3 elasticsearch-dbapi
EXASolution 3 sqlalchemy-exasol
Firebird 3 sqlalchemy-firebird
Google Spreasheets 3 gsheetsdb
IBM DB2 3 ibm-db-sa
Microsoft Access 3 sqlalchemy-access
Microsoft SQL Server 3 Included by default
MySQL 1 Included by default
MonetDB 3 sqlalchemy_monetdb
Oracle 3 Included by default
PostgreSQL 2 Included by default
Presto 1 pyhive OR -r engines/presto.txt
SAP Hana 3 sqlalchemy-hana
Snowflake 2 snowflake-sqlalchemy OR -r engines/snowflake.txt
SQLite 2 Included by default
Teradata Vantage 3 teradatasqlalchemy
Trino 2 trino OR -r engines/trino.txt
Vertica 3 sqlalchemy-vertica-python