ClickHouse 🀝 DuckDB = OLAP²

ClickHouse 🀝 DuckDB = OLAP²

DuckDB as a ClickHouse UDF Function

Β·

4 min read

ClickHouse 🀝 DuckDB

If you follow this blog you're familiar with the OLAP ecosystem, our passion for ClickHouse and our involvement in developing the chDB in-memory database.

The ClickHouse community is strong and proud but there's a small taboo 🀫

πŸ¦† The bipedal in the room is DuckDB and its exciting features and small size!

πŸ˜‚ Jokes aside, there are already lots of comparisons around making a case for either or both platforms being the best or the fastest at something. We leave that to the benchmark experts, while we focus on what we love: integrations without borders!

Enemies with Benefits

Can ClickHouse and DuckDB become enemies with benefits (for users)?

To make this little bromance happen we'll use a magical combo:

πŸ’‘ Quackpipe runs a DuckDB core and supports the Clickhouse query API and several native data formats - essentially impersonating a basic ClickHouse and including a variety of ClickHouse Macro Aliases to make familiar commands and functions available out of the box for ClickHouse clients and users.

It can run standalone, embedded or accept input and commands via -stdin from scripts. It even includes a fully working clickhouse play user interface! 😎

☝️☝️☝️ This is a Live demo, not just an image. Pick a Preset and click RUN ☝️☝️☝️

That's the definition of a ClickHouse UDF-ready toolset! Let's go OLAPΒ²

OLAPΒ² Experiment

Install Quackpipe UDF

Download the latest quackpipe binary to your clickhouse-server user_scripts directory to allow safe execution from within your ClickHouse queries:

curl -fsSL github.com/metrico/quackpipe/releases/latest/download/quackpipe-amd64 --output quackpipe \
&& chmod +x quackpipe

mv quackpipe /var/lib/clickhouse/user_scripts/quackpipe

⚑ Once ready, we can leverage our Quackpipe integration with two methods:

Table UDF:

This method can be used to return multiple columns and rows from DuckDB into ClickHouse using any of the supported formats. Table schema MUST be known.

:) SELECT * FROM executable('quackpipe -stdin -format TSV', TSV, 'id UInt32, num UInt32', (SELECT 'SELECT 1, 2'))
:) SELECT *
FROM executable('quackpipe -stdin -format TSV', TSV, 'id UInt32, num UInt32', (
    SELECT 'SELECT 1, 2'
))
Query id: dd878948-bec8-4abe-9e06-2f5813653c3a
β”Œβ”€id─┬─num─┐
β”‚  1 β”‚   2 β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
1 rows in set. Elapsed: 0.268 sec.

What is this? Think of it as a SELECT within a SELECT with a different syntax.

The data comes from DuckDB right into ClickHouse columns. You can read and pass data to UDF functions ad-hoc per query, or by using a dedicated engine table.

:) CREATE TABLE onlyone
(
    `id` UInt64
    `num` UInt64
)
ENGINE = Executable('quackpipe -stdin -format TSV', TSV, (
    SELECT 'SELECT 1, 2'
));

:) SELECT * FROM onlyone

Query id: 45aa7267-e0f4-4fdc-a52e-5a7a4bf2c0a8
β”Œβ”€id─┬─num─┐
β”‚  1 β”‚   2 β”‚
β””β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
1 rows in set. Elapsed: 0.272 sec.

Function UDF:

NOTE: ONLY WORKS AS FUNCTION, cannot return multiple rows or columns!

  1. Configure the UDF function in /etc/clickhouse-server/duckdb_function.xml
  • the return_type is String with the full results

  • the actual content can be JSON, CSV, TSV with or without Column Names

<functions>
    <function>
        <type>executable</type>
        <name>duckdb</name>
        <return_type>String</return_type>
        <argument>
            <type>String</type>
        </argument>
        <format>TSV</format>
        <command>quackpipe -stdin -format TSV</command>
    </function>
</functions>
  1. Restart Clickhouse and make sure the function is loaded
clickhouse-udf | Processing configuration file '/etc/clickhouse-server/duckdb_function.xml'.
clickhouse-udf | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/duckdb_function.xml'.
  1. Test with Clickhouse client.
b5c4f004e862 :) SELECT duckdb('SELECT 1') as t1

SELECT duckdb('SELECT 1') AS t1

Query id: 68c4fbd0-75a5-466d-9720-3937bcd96898

β”Œβ”€t1─┐
β”‚ 1  β”‚
β””β”€β”€β”€β”€β”˜

1 row in set. Elapsed: 0.047 sec.

BONUS CONTENT

Upon request of Alexey Milovidov, here's DuckDB querying data out of ClickHouse and chdb using nothing more than the HTTP API and URL query parameters πŸ”₯πŸ”₯πŸ”₯

SELECT * FROM 
read_json_auto("https://chdb.fly.dev/?default_format=JSONEachRow&query=SELECT number as once, number *2 as twice FROM numbers(10)")

The string format is not the prettiest (we'll work on that!) but the query returns steaming hot columns and column formats just fine! It even works on MotherDuck:

Conclusion

Where do we go from here? Can this combination be used to build something amazing and revolutionary, or is it just a useless hack/flex for both platforms?

That's up to the users and community - and hopefully, You too!

Want more? Discover all of our projects on the Metrico github repository

Did you find this article valuable?

Support qryn by becoming a sponsor. Any amount is appreciated!

Β