SELECT query gets executed. To illustrate it, let’s add some data in a table in ClickHouse:
EXPLAIN query. These steps are summarized in the chart below:
Let’s look at each entity in action during query execution. We’re going to take a few queries and then examine them using the EXPLAIN statement.
Parser
The goal of a parser is to transform the query text into an AST (Abstract Syntax Tree). This step can be visualized usingEXPLAIN AST:
Analyzer
ClickHouse currently has two architectures for the Analyzer. You can use the old architecture by setting:enable_analyzer=0. The new architecture is enabled by default. We’re going to describe only the new architecture here, given the old one is going to be deprecated once the new analyzer is generally available.
The new architecture should provide us with a better framework to improve ClickHouse’s performance. However, given it is a fundamental component of the query processing steps, it also might have a negative impact on some queries and there are known incompatibilities. You can revert back to the old analyzer by changing the
enable_analyzer setting at the query or user level.Planner
The planner takes a query tree and builds a query plan out of it. The query tree tells us what we want to do with a specific query, and the query plan tells us how we will do it. Additional optimizations are going to be done as part of the query plan. You can useEXPLAIN PLAN or EXPLAIN to see the query plan (EXPLAIN will execute EXPLAIN PLAN).
minimum_date, maximum_date and percentage), but you might also want to have the details of all the actions that need to be executed. You can do so by setting actions=1.
Query pipeline
A query pipeline is generated from the query plan. The query pipeline is very similar to the query plan, with the difference that it’s not a tree but a graph. It highlights how ClickHouse is going to execute a query and what resources are going to be used. Analyzing the query pipeline is very useful to see where the bottleneck is in terms of inputs/outputs. Let’s take our previous query and look at the query pipeline execution:graph we can set to 1 and specify the output format to be TSV:
x followed by a number corresponds to the number of inputs/outputs that are being used. If you don’t want to see them in a compact form, you can always add compact=0:
EXPLAIN query again:
Executor
Finally the last step of the query execution is done by the executor. It will take the query pipeline and execute it. There are different types of executors, depending if you’re doing aSELECT, an INSERT, or an INSERT SELECT.