Introduction
In the previous post, I explained the Incremental View Maintenance (IVM) feature that is under development. In this article, I will show more about the performance evaluation of our IVM implementation.
Incrementally Maintainable Materialized Views on the TPC-H queries
In the previous post, I showed that the IVM feature enables materialized views
to be updated automatically and more rapidly than the
REFRESH
command by using
pgbench
tables.
Here, as a more practical example, I will evaluate the performance using
queries of TPC-H benchmark.
TPC-H queries
Firstly, we checked how many of the 22 queries could be used as a definition of an incrementally maintainable materialized view in our implementation. As the results, 9 of 22 queries are supported. Other queries are not supported in the current implementation, due to subqueries including aggregation, etc.
Among these queries, we used Q01 and Q09 here, which took a long time to execute the query. Q01 is a query that aggregates one large table,
select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-12-01' - interval '78' day
group by
l_returnflag,
l_linestatus
and Q09 is a query that joins six tables and aggregates them.
select
nation,
o_year,
sum(amount) as sum_profit
rom
(
select
n_name as nation,
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount
from
part,
supplier,
lineitem,
partsupp,
orders,
nation
where
s_suppkey = l_suppkey
and ps_suppkey = l_suppkey
and ps_partkey = l_partkey
and p_partkey = l_partkey
and o_orderkey = l_orderkey
and s_nationkey = n_nationkey
and p_name like '%sandy%'
) as profit
group by
nation,
o_year
In both queries, ORDER BY
and LIMIT
are removed
because these clauses are not supported in the view definition. The scale
factor of TPC-H was 1.
Performance of Updating View
I created an incremental maintainable materialized view on each query, and
compare the execution times of 1) direct SELECT
to the query, 2)
SELECT
to the view, 3) REFRESH
of the view, and 4)
updating a row of a table (lineitem
). This table update triggers
IVM and the view is updated automatically and incrementally.
Here is the results.
SELECT query | SELECT view | REFRESH command | IVM | |
---|---|---|---|---|
Q01 | 10.311 s | 1.585 ms | 36.811 s | 17.104 ms |
Q09 | 3.124 s | 2.331 ms | 6.153 s | 29.301 ms |
Issuing SELECT
directly to a query took about 10 seconds for Q01
and 3 seconds for Q09. On the other hand, SELECT
to a view took
only 1.6 ms for Q01 and 2.3ms for Q09. These fast response time is the
original benefit of materialized views.
However, REFRESH
of the view took a long time, that is, about 37
seconds for Q01 and 6 seconds for Q09. This shows that On the other hand, IVM
took only 17ms for Q01 and 29ms for Q09. This execution time includes both
update of the table and update of the view contents, so it is very rapid. It
was more than 2000 times faster for Q01, and more than 200 times faster than
REFRESH
command.
Performance of Updating Tables
As shown above, the IVM feature allows materialized views to be updated automatically and quickly. However, it obviously affects table update performance because a view is updated for each table update. In addition, the table update performance is significantly degraded in the simultaneous update by concurrent transactions because an exclusive lock is acquired when the view is updated, which is required to avoid inconsistent view update. This exclusive lock is not necessary if the view has only one table in the definition, though.
I evaluated the table update performance by letting pgbench run a custom
script, which updated a tuple selected randomly from
lineitem
table in a transaction. I compared the performance
between conditions where there was no materialized view and where an
incrementally maintainable materialized view for Q01 was created.
This graph shows the result. The vertical line is TPS (transaction per second) and the horizontal axis is the number of connections. The blue line is the performance measured when there was no materialized view ,that is, the performance of simple table updates. The red line is the performance including IVM of the view defined on Q01 query. This is from two to five times slower than the simple table updates due to the overhead of updating the view. In addition, the yellow line is the performance of IVM where an exclusive lock is forced. In this case, as you can see, the performance doesn’t scale if the connection is increased. The exclusive lock is actually not required for the Q01 view because the view has only one table, but this is required if there are more than one table in the view definition like Q09. Note that this evaluation was performed under the condition where there was only one incrementally maintainable materialized view. When there are more than one views, we could see additional overhead on table update performance.
From these results, the current implementation of IVM is not suitable for write-heavy workload. However, It will be useful when table update is not so frequent but users would like know the analysis results immediately after a table update.
Conclusion
The benefit of IVM that enables rapid and automatic update of materialized views was shown for the practical views like TPC-H queries. On the other hand, it was also shown that it was unsuitable when table update is frequent due to the overhead of the view update.
The approach that update the view in the same transaction which updates a table like our IVM implementation is called “immediate maintenance”. On the other hand, there is another approach called “deferred maintenance” that updates a table after the transaction committed, for example, when a user command is executed, when the view is accessed, or when the maintenance is triggered periodically in background. I would like to implement also deferred approach to resolve the table update performance degression in future.
In the next post, I will explain the implementation of IVM and how it works in more detailed.