Tracing ClickHouse with qryn

Tracing ClickHouse with qryn

Self-Instrumented Traces and Logs using MVs

ยท

3 min read

ClickHouse is one of our favorite databases. But what makes queries so fast (or not, when there's an issue) is often hard to tell without looking under the hood.

Luckily ClickHouse creates detailed trace spans for each query and some of the query execution stages, such as query planning or distributed queries.

This priceless troubleshooting material sits safely in the opentelemetry_span_log table but as pretty as tables are it can be kind of hard to navigate and intimidating to use with regular SQL queries (unless you're Alexey Milovidov). But fear not!

Using materialized views the internal spans can easily be translated to Zipkin JSON spans and sent to a tracing collector such as qryn and its Tempo-like API.

Let's Trace ClickHouse with ClickHouse!

Requirements

Send Spans from ClickHouse

Query Spans will not be generated unless we tell ClickHouse we want them.

Let's enable tracing probability as you in our clickhouse-client session;

Use a value between 0 (turned off) and 1 (100% of queries create traces)

clickhouse-client -h 127.0.0.1 --opentelemetry_start_trace_probability=0.1 --query_id "a0c9f73a1c684e0ce66333477a3841bf-200" --query "select 1"
clickhouse-client -h 127.0.0.1 --query "system flush logs"

You can also set your preference using SET parameters while querying:

SET opentelemetry_start_trace_probability = 1.0;
SELECT 1;
system flush logs;

Create Traces Materialized View

The following Materialized View will push traces from the opentelemetry_span_log table table to qryn as Zipkin spans via HTTP leveraging the ClickHouse URL engine and JSON formatting functionality.

CREATE MATERIALIZED VIEW default.zipkin_spans
ENGINE = URL('http://qryn:3100/tempo/api/push', 'JSONEachRow')
SETTINGS output_format_json_named_tuples_as_objects = 1,
    output_format_json_array_of_rows = 1 AS
SELECT
    lower(hex(reinterpretAsFixedString(trace_id))) AS traceId,
    lower(hex(parent_span_id)) AS parentId,
    lower(hex(span_id)) AS id,
    operation_name AS name,
    start_time_us AS timestamp,
    finish_time_us - start_time_us AS duration,
    cast(tuple('clickhouse'), 'Tuple(serviceName text)') AS localEndpoint,
    cast(tuple(
        attribute.values[indexOf(attribute.names, 'db.statement')]),
        'Tuple("db.statement" text)') AS tags
FROM system.opentelemetry_span_log

Replace the qryn URL from the example to match your actual deployment!

The ZipKin JSON events will be pushed to the qryn tempo api. Here's a sample:

[{
 traceID: '97a156e95095e0e4c0f0f4dbb921244d',
 spanID: '19ce8d73f029e649',
 operationName: 'TCPHandler',
 references: [],
 startTime: '1641319910142576',
 startTimeUnixNano: 1641319910142576000,
 endTimeUnixNano: 1,
 duration: 12,
 tags: [ { key: 'db.statement', value: '', type: 'string' } ],
 logs: [],
 processID: 'p1',
 warnings: null,
 localEndpoint: { serviceName: 'clickhouse' },
 parentSpanID: '9654a4d5336d4c3e'
}]

Time to Search Spans

Your data should be ready to be queried. Use Grafana Tempo or qryn-view to search, filter and navigate your spans and traces. It's fun and easy!

image

What about Logs?

Right! ClickHouse populates the system.text_log table with its internal logs.

Let's use the same materialized view method to ship everything into qryn.

Create Logs Materialized View

The following Materialized View will push logs to the qryn using the ClickHouse URL engine and the InfluxDB line protocol API (/influx/api/v2/write)

CREATE MATERIALIZED VIEW default.influx_log_send
ENGINE = URL('http://qryn:3100/influx/api/v2/write', 'LineAsString')
AS SELECT format('syslog,level={0},logger_name={1},type=clickhouse message="{2}" {3}', 
  toString(level), 
  replaceRegexpAll(toString(logger_name), '[^a-zA-Z0-9_]', '_'), 
  replaceAll(replaceRegexpAll(message, '["\\\\]', '\x00\\0'), '\x00', '\\'), toString(toUnixTimestamp64Nano(event_time_microseconds))) FROM system.text_log;

Replace the qryn URL from the example to match your actual deployment!

Done? Just wait a few seconds for the logs to flush and they will appear in qryn.

Look for the {type="clickhouse"} label to get started with your logs.

You can now do fun things such as linking Logs and Traces using derived fields

That's it!

You're now troubleshooting ClickHouse using ClickHouse! (well, qryn really!)

In our next episodes, we'll go full APM on ClickHouse. Stay Tuned!

Did you find this article valuable?

Support qryn by becoming a sponsor. Any amount is appreciated!

ย