chDB AWS Lambda Fun(ction)

chDB AWS Lambda Fun(ction)

Running ClickHouse queries in AWS Lambdas with chDB.io

Let's run chDB as an AWS Lambda function using Docker and Python!

If you're not yet familiar with chDB, here's a quick recap for you:

chDB is an embedded SQL OLAP Engine powered by ClickHouse

Features

  • In-process SQL OLAP Engine, powered by ClickHouse

  • Serverless. No need to install ClickHouse

  • Minimized data copy from C++ to Python with python memoryview

  • Input&Output support Parquet, CSV, JSON, Arrow, ORC and 60+more formats

  • Support Python DB API 2.0, example

  • Library bindings for Python, Go, Rust, NodeJS, Bun

The chdb library can run in-process and makes it super easy to run ClickHouse SQL queries in a variety of languages and can be used to mimic ClickHouse query APIs.

TLDR; Here's a Live Demo

☝️☝️☝️ This is not an image. Click RUN for results ☝️☝️☝️

⚡ Let's OLAP on Lambdas!

Let's assemble our chDB Lambda. This article describes how to:

  1. Create and test an AWS Lambda function running chDB in Docker.

  2. Use chDB to query any supported data source.

  3. Use SQL to filter, process and downsample cloud data.

  4. Write the data back to ClickHouse, S3 or any supported destination.

  5. Use CloudWatch or EventBridge to trigger your functions automatically.

Create an AWS Lambda Function for chDB

To create an AWS Lambda function, log into your AWS console.

Build & Push the latest chDB-lambda container image to your ECR storage.

➡ Search for AWS Lambda and select the service. Then, click Create Function.

create function

➡ Choose Container Image and use the chDB ECR instance URI you created

➡ Click Create Function at the bottom right when you’re done.

Validation

Let's test our new chDB Lambda using a simple query.

The Lamba expects JSON requests with a query key:

{
   'query': 'SELECT version()',
   'default_format': 'CSV'
}

And the response would look like this (or any other format)

22.12.1.1

You can also use the Browser and the AWS Console to generate test events:

image

👏 Well Done!

Once your Lambda is validated, the sky's the limit! Move your existing ClickHouse workflows into your lambdas and use them to downsample, forecast and report.

  • Connect to any dataset on S3/R2/Minio using Parquet, Arrow, etc

  • Connect to any ClickHouse Server securely to pull/insert data

  • Use your favorite programming language to work with ClickHouse

  • Save money by running ClickHouse queries with Lambda workers

💡 For additional security, use AWS Secrets Manager and ENV variables to control sensible fields (such as authentication ids, tokens, etc) in your ClickHouse scripts.

Automation

If you’re looking to perform a downsampling task, you’ll need to run your Lambda script on a schedule. You can use CloudWatch or EventBridge to create a rule and target your AWS Lambda function to run chDB scripts on a user-defined schedule.

Use the following documentation depending on your preferred service:

Conclusion

Just a few clicks and you are all set with a low-cost, high-power chDB Lambda function ready to perform simple and complex data processing tasks at any scale.

Use with Logs, Metrics and Traces for end-to-end visibility using qryn's polyglot API and data from Loki, Prometheus, Opentelemetry, InfluxDB, Elastic and many more.

Did you find this article valuable?

Support qryn by becoming a sponsor. Any amount is appreciated!