ClickHouse π€ DuckDB
If you follow this blog you're familiar with the OLAP ecosystem, our passion for ClickHouse and our involvement in developing the chDB in-memory database.
The ClickHouse community is strong and proud but there's a small taboo π€«
π¦ The bipedal in the room is DuckDB and its exciting features and small size!
π Jokes aside, there are already lots of comparisons around making a case for either or both platforms being the best or the fastest at something. We leave that to the benchmark experts, while we focus on what we love: integrations without borders!
Enemies with Benefits
Can ClickHouse and DuckDB become enemies with benefits (for users)?
To make this little bromance happen we'll use a magical combo:
π‘ Quackpipe runs a DuckDB core and supports the Clickhouse query API and several native data formats - essentially impersonating a basic ClickHouse and including a variety of ClickHouse Macro Aliases to make familiar commands and functions available out of the box for ClickHouse clients and users.
It can run standalone, embedded or accept input and commands via -stdin
from scripts. It even includes a fully working clickhouse play user interface! π
That's the definition of a ClickHouse UDF-ready toolset! Let's go OLAPΒ²
OLAPΒ² Experiment
Install Quackpipe UDF
Download the latest quackpipe binary to your clickhouse-server user_scripts
directory to allow safe execution from within your ClickHouse queries:
curl -fsSL github.com/metrico/quackpipe/releases/latest/download/quackpipe-amd64 --output quackpipe \
&& chmod +x quackpipe
mv quackpipe /var/lib/clickhouse/user_scripts/quackpipe
β‘ Once ready, we can leverage our Quackpipe integration with two methods:
Table UDF:
This method can be used to return multiple columns and rows from DuckDB into ClickHouse using any of the supported formats. Table schema MUST be known.
:) SELECT * FROM executable('quackpipe -stdin -format TSV', TSV, 'id UInt32, num UInt32', (SELECT 'SELECT 1, 2'))
:) SELECT *
FROM executable('quackpipe -stdin -format TSV', TSV, 'id UInt32, num UInt32', (
SELECT 'SELECT 1, 2'
))
Query id: dd878948-bec8-4abe-9e06-2f5813653c3a
ββidββ¬βnumββ
β 1 β 2 β
ββββββ΄ββββββ
1 rows in set. Elapsed: 0.268 sec.
What is this? Think of it as a SELECT within a SELECT with a different syntax.
The data comes from DuckDB right into ClickHouse columns. You can read and pass data to UDF functions ad-hoc per query, or by using a dedicated engine table.
:) CREATE TABLE onlyone
(
`id` UInt64
`num` UInt64
)
ENGINE = Executable('quackpipe -stdin -format TSV', TSV, (
SELECT 'SELECT 1, 2'
));
:) SELECT * FROM onlyone
Query id: 45aa7267-e0f4-4fdc-a52e-5a7a4bf2c0a8
ββidββ¬βnumββ
β 1 β 2 β
ββββββ΄ββββββ
1 rows in set. Elapsed: 0.272 sec.
Function UDF:
NOTE: ONLY WORKS AS FUNCTION, cannot return multiple rows or columns!
- Configure the UDF function in
/etc/clickhouse-server/duckdb_function.xml
the
return_type
isString
with the full resultsthe actual content can be JSON, CSV, TSV with or without Column Names
<functions>
<function>
<type>executable</type>
<name>duckdb</name>
<return_type>String</return_type>
<argument>
<type>String</type>
</argument>
<format>TSV</format>
<command>quackpipe -stdin -format TSV</command>
</function>
</functions>
- Restart Clickhouse and make sure the function is loaded
clickhouse-udf | Processing configuration file '/etc/clickhouse-server/duckdb_function.xml'.
clickhouse-udf | Saved preprocessed configuration to '/var/lib/clickhouse/preprocessed_configs/duckdb_function.xml'.
- Test with Clickhouse client.
b5c4f004e862 :) SELECT duckdb('SELECT 1') as t1
SELECT duckdb('SELECT 1') AS t1
Query id: 68c4fbd0-75a5-466d-9720-3937bcd96898
ββt1ββ
β 1 β
ββββββ
1 row in set. Elapsed: 0.047 sec.
BONUS CONTENT
Upon request of Alexey Milovidov, here's DuckDB querying data out of ClickHouse and chdb using nothing more than the HTTP API and URL query parameters π₯π₯π₯
SELECT * FROM
read_json_auto("https://chdb.fly.dev/?default_format=JSONEachRow&query=SELECT number as once, number *2 as twice FROM numbers(10)")
The string format is not the prettiest (we'll work on that!) but the query returns steaming hot columns and column formats just fine! It even works on MotherDuck:
Conclusion
Where do we go from here? Can this combination be used to build something amazing and revolutionary, or is it just a useless hack/flex for both platforms?
That's up to the users and community - and hopefully, You too!
Want more? Discover all of our projects on the Metrico github repository