In the previous post, I explained the Incremental View Maintenance (IVM) feature that is under development. In this article, I will show more about the performance evaluation of our IVM implementation.
Incrementally Maintainable Materialized Views on the TPC-H queries
In the previous post, I showed that the IVM feature enables materialized views
to be updated automatically and more rapidly than the
REFRESHcommand by using
Here, as a more practical example, I will evaluate the performance using queries of TPC-H benchmark.
Firstly, we checked how many of the 22 queries could be used as a definition of an incrementally maintainable materialized view in our implementation. As the results, 9 of 22 queries are supported. Other queries are not supported in the current implementation, due to subqueries including aggregation, etc.
Among these queries, we used Q01 and Q09 here, which took a long time to execute the query. Q01 is a query that aggregates one large table,
select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate <= date '1998-12-01' - interval '78' day group by l_returnflag, l_linestatus
and Q09 is a query that joins six tables and aggregates them.
select nation, o_year, sum(amount) as sum_profit rom ( select n_name as nation, extract(year from o_orderdate) as o_year, l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%sandy%' ) as profit group by nation, o_year
In both queries,
ORDER BY and
LIMIT are removed
because these clauses are not supported in the view definition. The scale
factor of TPC-H was 1.
Performance of Updating View
I created an incremental maintainable materialized view on each query, and
compare the execution times of 1) direct
SELECT to the query, 2)
SELECT to the view, 3)
REFRESH of the view, and 4)
updating a row of a table (
lineitem). This table update triggers
IVM and the view is updated automatically and incrementally.
Here is the results.
|SELECT query||SELECT view||REFRESH command||IVM|
|Q01||10.311 s||1.585 ms||36.811 s||17.104 ms|
|Q09||3.124 s||2.331 ms||6.153 s||29.301 ms|
SELECT directly to a query took about 10 seconds for Q01
and 3 seconds for Q09. On the other hand,
SELECTto a view took
only 1.6 ms for Q01 and 2.3ms for Q09. These fast response time is the
original benefit of materialized views.
REFRESH of the view took a long time, that is, about 37
seconds for Q01 and 6 seconds for Q09. This shows that On the other hand, IVM
took only 17ms for Q01 and 29ms for Q09. This execution time includes both
update of the table and update of the view contents, so it is very rapid. It
was more than 2000 times faster for Q01, and more than 200 times faster than
Performance of Updating Tables
As shown above, the IVM feature allows materialized views to be updated automatically and quickly. However, it obviously affects table update performance because a view is updated for each table update. In addition, the table update performance is significantly degraded in the simultaneous update by concurrent transactions because an exclusive lock is acquired when the view is updated, which is required to avoid inconsistent view update. This exclusive lock is not necessary if the view has only one table in the definition, though.
I evaluated the table update performance by letting pgbench run a custom
script, which updated a tuple selected randomly from
lineitem table in a transaction. I compared the performance
between conditions where there was no materialized view and where an
incrementally maintainable materialized view for Q01 was created.
This graph shows the result. The vertical line is TPS (transaction per second) and the horizontal axis is the number of connections. The blue line is the performance measured when there was no materialized view ,that is, the performance of simple table updates. The red line is the performance including IVM of the view defined on Q01 query. This is from two to five times slower than the simple table updates due to the overhead of updating the view. In addition, the yellow line is the performance of IVM where an exclusive lock is forced. In this case, as you can see, the performance doesn’t scale if the connection is increased. The exclusive lock is actually not required for the Q01 view because the view has only one table, but this is required if there are more than one table in the view definition like Q09. Note that this evaluation was performed under the condition where there was only one incrementally maintainable materialized view. When there are more than one views, we could see additional overhead on table update performance.
From these results, the current implementation of IVM is not suitable for write-heavy workload. However, It will be useful when table update is not so frequent but users would like know the analysis results immediately after a table update.
The benefit of IVM that enables rapid and automatic update of materialized views was shown for the practical views like TPC-H queries. On the other hand, it was also shown that it was unsuitable when table update is frequent due to the overhead of the view update.
The approach that update the view in the same transaction which updates a table like our IVM implementation is called “immediate maintenance”. On the other hand, there is another approach called “deferred maintenance” that updates a table after the transaction committed, for example, when a user command is executed, when the view is accessed, or when the maintenance is triggered periodically in background. I would like to implement also deferred approach to resolve the table update performance degression in future.
In the next post, I will explain the implementation of IVM and how it works in more detailed.