Introduction
In the previous post, I showed some performance evaluation of Incremental View Maintenance (IVM) feature that is under development. In this article, I will explain the implementation of IVM and how it works.
Simple Example of Incremental View Maintenance
First, I will explain the process of IVM using a simple example.
Suppose you have a view V defined by natural join on tables R and S as shown below.
Then, suppose that table R is updated as follows, and (1, one, I) is changed to (1, first, I). In the figure, ∇R is the set of tuples deleted from R, and ΔR is the set of tuples inserted into R. Using these delta tables (∇R and ΔR) , the contents of table S, and the view definition of V, we can calculate the change that will occur on the view, that is, ∇V and ΔV.
Finally, we can update the view incrementally by applying the delta tables ∇V and ΔV to the view V.
Steps in Incremental View Maintenance
In our implementation, Incremental view Maintenance is performed in the three steps:
- Extract the changes that occurred on the updated table
- Calculate the changes that will occurs on the view
- Apply the changes to the view
Extract the table change
In our implementation, we use statement-level AFTER triggers and transition tables to extract the change that occurred in the table.
AFTER triggers are automatically created on all base tables contained in the view definition query when the view is created by CREATE INCREMENTAL MATERIALIZED VIEW
command. The triggers are created for INSERT, DELETE, and UPDATE commands, but the TRUNCATE command is not currently supported.
The Transition Table is a features of AFTER trigger, which allows trigger functions to refer to the changes that occurred on the table like normal tables. There are two transition tables; one contains “tuples deleted from the table” and another contains “tuples inserted into the table”, which correspond to the delta tables∇R and ΔR explained above, respectively.
Calculate the view delta
Changes that occurs in views are calculated using the extracted table changes and the view definition queries, as described above. More specifically, we rewrite the view definition query and replace the update table in the query with the transition table. Actually, what is rewritten is not the query string written in SQL but the internal structure (struct Query) which represents the parsed query.
By executing These rewritten queries, we can get the delta tables that contains changes which will occur on the views. These view delta tables are corresponds to ∇V, ΔV.
Apply deltas to the view
The calculated view deltas are applied to the materialized view using a SQL query. Specifically, we use DELETE statement to search for tuples to be deleted (contents of ∇V) from the materialized view and delete them, and use INSERT statement to insert tuples to be inserted (contents of ΔV) into the materialized view.
For views on aggregate query, the aggregated values in the view are updated using UPDATE statement. The way to update depends on aggregate functions. For example, with regard to count() and sum(), we can calculate the new aggregated value by adding an aggregated value calculated from the view delta to the old aggregated value in the view. For calculating the new avg() value, we need to know new sum() and count() values beforehand, so these values are stored in the view as hidden columns.
Recall that a unique index is automatically created if possible when the view is defined. This index is necessary to identify tuples that should be removed from the view efficiently. Without the index, all tuples in the view would be scanned sequentially to search tuples to be deleted, which would be inefficient.
In our implementation, if all the primary key columns of all the tables appear in the target list of the view definition query, a unique index on these columns are created. In the case of a view on aggregate query, a unique index is created on the columns used in GROUP BY clause. We also create a unique index if the view has DISTINCT clause. Otherwise, a index is not automatically created, so you would need to create an appropriate index manually for efficient IVM.
Conclusion
In this article, I explained the overview of our IVM implementation. This feature is under discussion and we are continuously developing targeting PostgreSQL 15. If you want to learn more details, please see the GitHub repository and the discussion. in pgsql-hackers. Also, if you would like to test this feature, you could get a Docker image from Docker Hub.
We would appreciate any your suggestion, question, and feedback.
great work, I would really love to get this!
ReplyDeleteWill it works if the mview use any aggregation?
ReplyDeleteCurrently only built-in count,sum,avg,min and max are supported. I would like to support any aggregates including user-defined-aggregates in future, which will need discussions to change the system catalog in PostgerSQL, pg_aggregate.
DeleteExcellent, hope it gets done for v15, this would be the most exciting and powerful new feature in Postgres for many many years, great work.
ReplyDeleteHello, sorry to bother you here with this, but I've been experimenting with the current build of postgres-ivm and I have a question regarding table partitioning and the maintenance of a materialized view. I currently have a table in my database that is partitioned. I also created an incremental materialized on which the supporting query uses partitions of the previous described table and joins them together.
ReplyDeleteWhenever I make an insert on the master table, the view itself is not updates. Is this expected behavior?
Thank you for your reporting.
DeleteAs I marked it in GitHub issue (https://github.com/sraoss/pgsql-ivm/issues/132), it seems a bug that views on partitions could not work well... If so, we will have to prohibit views on partitioned tables and partitions.
Anyway, I'll investigate this issue more.
FYI, there is a similar report in the pgsql-hackers.
Deletehttps://www.postgresql.org/message-id/OS0PR01MB5682576A59A1C765F7AEDE6B82F09%40OS0PR01MB5682.jpnprd01.prod.outlook.com