Tracing ClickHouse with qryn
Self-Instrumented Traces and Logs using MVs
ClickHouse is one of our favorite databases. But what makes queries so fast (or not, when there's an issue) is often hard to tell without looking under the hood.
Luckily ClickHouse creates detailed trace spans for each query and some of the query execution stages, such as query planning or distributed queries.
This priceless troubleshooting material sits safely in the opentelemetry_span_log table but as pretty as tables are it can be kind of hard to navigate and intimidating to use with regular SQL queries (unless you're Alexey Milovidov). But fear not!
Using materialized views the internal spans can easily be translated to Zipkin JSON spans and sent to a tracing collector such as qryn and its Tempo-like API.
Let's Trace ClickHouse with ClickHouse!
Requirements
ClickHouse with opentelemetry settings in
config.xml
qryn or qryn.cloud deployment
Send Spans from ClickHouse
Query Spans will not be generated unless we tell ClickHouse we want them.
Let's enable tracing probability as you in our clickhouse-client session;
Use a value between 0 (turned off) and 1 (100% of queries create traces)
clickhouse-client -h 127.0.0.1 --opentelemetry_start_trace_probability=0.1 --query_id "a0c9f73a1c684e0ce66333477a3841bf-200" --query "select 1"
clickhouse-client -h 127.0.0.1 --query "system flush logs"
You can also set your preference using SET parameters while querying:
SET opentelemetry_start_trace_probability = 1.0;
SELECT 1;
system flush logs;
Create Traces Materialized View
The following Materialized View will push traces from the opentelemetry_span_log table table to qryn as Zipkin spans via HTTP leveraging the ClickHouse URL engine and JSON formatting functionality.
CREATE MATERIALIZED VIEW default.zipkin_spans
ENGINE = URL('http://qryn:3100/tempo/api/push', 'JSONEachRow')
SETTINGS output_format_json_named_tuples_as_objects = 1,
output_format_json_array_of_rows = 1 AS
SELECT
lower(hex(reinterpretAsFixedString(trace_id))) AS traceId,
lower(hex(parent_span_id)) AS parentId,
lower(hex(span_id)) AS id,
operation_name AS name,
start_time_us AS timestamp,
finish_time_us - start_time_us AS duration,
cast(tuple('clickhouse'), 'Tuple(serviceName text)') AS localEndpoint,
cast(tuple(
attribute.values[indexOf(attribute.names, 'db.statement')]),
'Tuple("db.statement" text)') AS tags
FROM system.opentelemetry_span_log
Replace the qryn URL from the example to match your actual deployment!
The ZipKin JSON events will be pushed to the qryn tempo api. Here's a sample:
[{
traceID: '97a156e95095e0e4c0f0f4dbb921244d',
spanID: '19ce8d73f029e649',
operationName: 'TCPHandler',
references: [],
startTime: '1641319910142576',
startTimeUnixNano: 1641319910142576000,
endTimeUnixNano: 1,
duration: 12,
tags: [ { key: 'db.statement', value: '', type: 'string' } ],
logs: [],
processID: 'p1',
warnings: null,
localEndpoint: { serviceName: 'clickhouse' },
parentSpanID: '9654a4d5336d4c3e'
}]
Time to Search Spans
Your data should be ready to be queried. Use Grafana Tempo or qryn-view to search, filter and navigate your spans and traces. It's fun and easy!
What about Logs?
Right! ClickHouse populates the system.text_log
table with its internal logs.
Let's use the same materialized view method to ship everything into qryn.
Create Logs Materialized View
The following Materialized View will push logs to the qryn using the ClickHouse URL engine and the InfluxDB line protocol API (/influx/api/v2/write)
CREATE MATERIALIZED VIEW default.influx_log_send
ENGINE = URL('http://qryn:3100/influx/api/v2/write', 'LineAsString')
AS SELECT format('syslog,level={0},logger_name={1},type=clickhouse message="{2}" {3}',
toString(level),
replaceRegexpAll(toString(logger_name), '[^a-zA-Z0-9_]', '_'),
replaceAll(replaceRegexpAll(message, '["\\\\]', '\x00\\0'), '\x00', '\\'), toString(toUnixTimestamp64Nano(event_time_microseconds))) FROM system.text_log;
Replace the qryn URL from the example to match your actual deployment!
Done? Just wait a few seconds for the logs to flush and they will appear in qryn.
Look for the {type="clickhouse"}
label to get started with your logs.
You can now do fun things such as linking Logs and Traces using derived fields
That's it!
You're now troubleshooting ClickHouse using ClickHouse! (well, qryn really!)
In our next episodes, we'll go full APM on ClickHouse. Stay Tuned!