chDB AWS Lambda Fun(ction)
Running ClickHouse queries in AWS Lambdas with chDB.io
Let's run chDB as an AWS Lambda function using Docker and Python!
If you're not yet familiar with chDB, here's a quick recap for you:
chDB is an embedded SQL OLAP Engine powered by ClickHouse
Features
In-process SQL OLAP Engine, powered by ClickHouse
Serverless. No need to install ClickHouse
Minimized data copy from C++ to Python with python memoryview
Input&Output support Parquet, CSV, JSON, Arrow, ORC and 60+more formats
Support Python DB API 2.0, example
The chdb library can run in-process and makes it super easy to run ClickHouse SQL queries in a variety of languages and can be used to mimic ClickHouse query APIs.
TLDR; Here's a Live Demo
⚡ Let's OLAP on Lambdas!
Let's assemble our chDB Lambda. This article describes how to:
Create and test an AWS Lambda function running chDB in Docker.
Use chDB to query any supported data source.
Use SQL to filter, process and downsample cloud data.
Write the data back to ClickHouse, S3 or any supported destination.
Use CloudWatch or EventBridge to trigger your functions automatically.
Create an AWS Lambda Function for chDB
To create an AWS Lambda function, log into your AWS console.
➡ Build & Push the latest chDB-lambda container image to your ECR storage.
➡ Search for AWS Lambda and select the service. Then, click Create Function.
➡ Choose Container Image and use the chDB ECR instance URI you created
➡ Click Create Function at the bottom right when you’re done.
Validation
Let's test our new chDB Lambda using a simple query.
The Lamba expects JSON requests with a query key:
{
'query': 'SELECT version()',
'default_format': 'CSV'
}
And the response would look like this (or any other format)
22.12.1.1
You can also use the Browser and the AWS Console to generate test events:
👏 Well Done!
Once your Lambda is validated, the sky's the limit! Move your existing ClickHouse workflows into your lambdas and use them to downsample, forecast and report.
Connect to any dataset on S3/R2/Minio using Parquet, Arrow, etc
Connect to any ClickHouse Server securely to pull/insert data
Use your favorite programming language to work with ClickHouse
Save money by running ClickHouse queries with Lambda workers
💡 For additional security, use AWS Secrets Manager and ENV variables to control sensible fields (such as authentication ids, tokens, etc) in your ClickHouse scripts.
Automation
If you’re looking to perform a downsampling task, you’ll need to run your Lambda script on a schedule. You can use CloudWatch or EventBridge to create a rule and target your AWS Lambda function to run chDB scripts on a user-defined schedule.
Use the following documentation depending on your preferred service:
Conclusion
Just a few clicks and you are all set with a low-cost, high-power chDB Lambda function ready to perform simple and complex data processing tasks at any scale.