2021-04-30

Implementing Incremental View Maintenance for PostgreSQL (Part I)

Introduction

A materialized view is a view that stores results of SELECT query in the database, and this enables you to get the query results quickly comparing to executing the query every time. On the other hand, when a table underlying the view is updated, the data in the materialized view gets out of date, so you need to update the materialized view.

PostgreSQL provides REFRESH MATERIALIZED VIEW command as a way to update materialized views . This command refreshes the materialized view by re-running the query.

However, this is not very efficient because it recalculates all the data. If only a small part of the table is updated and only a part of the materialized view can be updated, this must be more efficient than recalculation.

This method is called Incremental View Maintenance (IVM). However, this feature is not implemented in the current PostgreSQL. So we are developing this feature in PostgreSQL and proposing it to the development community.

In this article, I will explain the IVM feature under development.

Note that the feature descried in this article is about the “under development” function, and not available in the current PostgreSQL . If you have any comments or requests for this feature, please give us your feedback!

Features

Overview

The proposed IVM feature allows materialized views to be updated automatically and incrementally just after a underlying table is modified. Because this is an incremental update, it is faster than running a regular REFRESH. And, of course, you don’t have to write trigger functions for your self.

Currently, the SQL queries used to define views are SELECT ... FROM ... WHERE ..., joins (inner joins, outer joins, self-joins), some built-in aggregate functions (count, sum, avg, min, max), GROUP BY clause, DISTINCT clause, simple sub-queries in FROM clause, simple CTEs ( WITH clauses), and EXISTS sub-queries in WHEREclause, where “simple” means that a sub-query or CTE does not include aggregates, outer joins, or DISTINCT. The view can contain multiple tuples with the same content (duplicate tuple support).

On the other hand, aggregates of other than the above (such as user-defined aggregate), complex subqueries or CTEs, window functions, ORDER BY, LIMIT, OFFSET, set operations ( UNION, EXCEPT, INTERSECT) are not supported.

Creating a View

Use CREATE INCREMENTAL MATERIALIZED VIEW command to create a materialized view that is updated automatically and incrementally . This INCREMENTAL is the new keyword added to CREATE MATERIALIZED VIEW in this implementation.

You can also check the syntax with the psql command.

postgres=# \h CREATE MATERIALIZED VIEW 
Command:     CREATE MATERIALIZED VIEW
Description: define a new materialized view
Syntax:
CREATE [ INCREMENTAL ] MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
    [ (column_name [, ...] ) 
    [ USING method ]
    [ WITH ( storage_parameter [= value] [, ... ] ) ]
    [ TABLESPACE tablespace_name ]
    AS query
    [ WITH [ NO ] DATA ]

For example, we can create a materialized view using tables of pgbench as follows. Here, the scale factor is 100.

test=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm AS
       SELECT a.aid, b.bid, abalance, bbalance
       FROM pgbench_accounts a JOIN pgbench_branches b ON a.bid = b.bid:
NOTICE:  created index "mv_ivm_index" on materialized view "mv_ivm"
SELECT 10000000

This materialized view was created with IVM support because INCREMENTAL option was specified. In addition, an index was automatically created on this materialized view that is necessary for efficient IVM. These can be confirmed by \d meta command of psql command. We can see in the result “Incremental view maintenance: yes” and an unique index defined on aid and bid, which are primary key columns of the base tables.

test=# \d+ mv_ivm
                                    Materialized view "public.mv_ivm"
  Column  |  Type   | Collation | Nullable | Default | Storage | Compression | Stats target | Descrip
tion 
----------+---------+-----------+----------+---------+---------+-------------+--------------+--------
-----
 aid      | integer |           |          |         | plain   |             |              | 
 bid      | integer |           |          |         | plain   |             |              | 
 abalance | integer |           |          |         | plain   |             |              | 
 bbalance | integer |           |          |         | plain   |             |              | 
Indexes:
    "mv_ivm_index" UNIQUE, btree (aid, bid)
View definition:
 SELECT a.aid,
    b.bid,
    a.abalance,
    b.bbalance
   FROM pgbench_accounts a
     JOIN pgbench_branches b ON a.bid = b.bid;
Access method: heap
Incremental view maintenance: yes

Updating a View Incrementally

Let’s see the effect of IVM. First, when I executed REFRESH command on this view, it took about 33 seconds.

test=# REFRESH MATERIALIZED VIEW mv_ivm ;
REFRESH MATERIALIZED VIEW
Time: 32863.279 ms (00:32.863)

On the other hand, when I updated a row in pgbench_accounts, it took about 16ms, which was more than 1000 times faster than the REFRESH command. In addition, of course, the contents of the view was updated correctly.

test=# SELECT * FROM mv_ivm WHERE aid = 1;
 aid | bid | abalance | bbalance 
-----+-----+----------+----------
   1 |   1 |      100 |     1000
(1 row)

Time: 1.583 ms
test=# UPDATE pgbench_accounts SET abalance = 11111 WHERE aid = 1;
UPDATE 1
Time: 16.202 ms
test=# SELECT * FROM mv_ivm WHERE aid = 1;
 aid | bid | abalance | bbalance 
-----+-----+----------+----------
   1 |   1 |    11111 |     1000
(1 row)

Time: 1.465 ms

Aggregate View

Next, let’s create a materialized view on a aggregate query . Here, pgbench's scale factor is 1000.

test2=# CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm2 AS
        SELECT bid, count(abalance), sum(abalance), avg(abalance)
        FROM pgbench_accounts GROUP BY bid;
NOTICE:  created index "mv_ivm2_index" on materialized view "mv_ivm2"
SELECT 1000

It took more than 40 seconds for REFRESH of this view .

test2=# REFRESH MATERIALIZED VIEW mv_ivm2;
REFRESH MATERIALIZED VIEW
Time: 42427.666 ms (00:42.428)

On the other hand, when I updated a row in the table, the view was automatically updated due to the IVM function. It took about 18ms, which is more than 2000 times faster than REFRESH.

test2=# SELECT * FROM mv_ivm2 WHERE bid = 1;
 bid | count  | sum  |          avg           
-----+--------+------+------------------------
   1 | 100000 | 1000 | 0.01000000000000000000
(1 row)

Time: 2.199 ms
test2=# UPDATE pgbench_accounts SET abalance = abalance + 5000 WHERE aid = 1;
UPDATE 1
Time: 17.951 ms
test2=# SELECT * FROM mv_ivm2 WHERE bid = 1;
 bid | count  | sum  |          avg           
-----+--------+------+------------------------
   1 | 100000 | 6000 | 0.06000000000000000000
(1 row)

Time: 2.152 ms

Conclusion

In this article, I showed that the proposed IVM feature enables materialized views to be updated automatically and more rapidly than the REFRESH command. In the next post, I will explain the performance of IVM in more detailed.

9 comments:

  1. Nice work! Looking forward to see this merged to PostgreSQL core. I hope you can get rid of the locking issues somehow. I started working on a general purpose version of IVM a while ago that ought to be independent of the database. The PoC implementation idea of it is to generate DDL and triggers that do the IVM. The analysis parts are implemented with Apache Calcite which offers many nice out of the box utilities. I didn't do much in that area recently though as I am working on other projects as well. In case you are interested in that approach as well, I would love to work together on that: https://github.com/Blazebit/blaze-ivm

    Looking forward to read your next post!

    ReplyDelete
    Replies
    1. > I hope you can get rid of the locking issues somehow.
      I also hope so, and I am trying to avoid the strong locking if possible.

      >I started working on a general purpose version of IVM a while ago that ought to be independent of the database. The PoC implementation idea of it is to generate DDL and triggers that do the IVM. The analysis parts are implemented with Apache Calcite which offers many nice out of the box utilities. I didn't do much in that area recently though as I am working on other projects as well. In case you are interested in that approach as well, I would love to work together on that: https://github.com/Blazebit/blaze-ivm

      Thank you for your information. I think implementing IVM on PostgreSQL as a built-in feature would be beneficial for PostgreSQL users, but I am interested in your project, too. I'll look into your github.

      Delete
  2. I appreciate the effort. IMHO I'd prefer to see the Sql Server approach where a Materialized View must be expressible as an index, and the engine monitors the index and updates the Materialized View automatically whenever a change to the index is detected.

    ReplyDelete
    Replies
    1. I am not familiar with the Sql Server approach, so I would like learn it. Thanks!

      Delete
    2. We are moving from Oracle to PG and this is one of the pains of transition. In Oracle we had FAST Refresh of MV that was based on 'MATERIALIZED VIEW LOG'. In PG we have FULL refresh, so the idea of using MV to generate BI/BA reports (not complex) is worthless. Now we are back to ugly triggers, due to lack if any alternatives in PG.

      Delete
  3. We're also holding on to MS SQL Server because of the indexed (materialised) views. Having this in PG would mean there's nothing left in MS SQL that we can't do in PG, and this would open the way to migrate our enterprise system to PG.

    ReplyDelete
  4. PS: Thank you for all your work on this feature!

    ReplyDelete
  5. Thank you for all your work indeed! In my organization, we are creating materialized view selecting from foreign tables (using a foreign data wrapper). Do you think your patch will work on those views ?

    ReplyDelete
    Replies
    1. I also would like to support foreign tables in IVM, but for this purpose we need a way to extract changes that occurred on a foreign table. I don't have the good idea for now. I'll investigate it more any way for the future support of foreign tables.

      Delete

pg_ivm 1.0 released!

pg_ivm v1.0 was officially released! pg_ivm is an extension module that provides Incremental View Maintenance (IVM) feature, which is a w...