2021-11-29

Transition Tables in Incremental View Maintenance (Part II): Multliple Tables Modification case

Introduction

In a previous post, I explained how we use transition tables in our implementation of Incremental View Maintenance (IVM) on PostgreSQL. Transition table is a features of AFTER trigger which allows trigger functions to refer to the changes of a table that occurred in a statement. We are using transition tables in order to extract table changes needed to calculate changes to be applied on views .

In this article I describes a more complicated situation, specifically how we handle transition tables when multiple tables are modified in a statement.

Single Table Modification

In a case where a single table is modified in a statement, the view maintenance process is simple. For example, suppose we have three tables R, S, and T. We also define a materialized view V = R ⨝ S ⨝ T that joins these tables as bellow:

SELECT x,y,z FROM R,S,T WHERE R.i=S.i AND S.j=T.j;

Then, suppose that table R was modified in a statement. This operation can be written as R ← R ∸ ∇R ⊎ ΔR, where ∇R is a bag of tuples deleted from R, and ΔR is a bag of tuples inserted into R in this statement. In this case, the changes are calculated as ∇V = ∇R ⨝ S ⨝ T and ΔV = ΔR ⨝ S ⨝ T, and we can update the view as V ← V ∸ ∇V ⊎ ΔV. The SQL representation of these calculations is as follows:

-- ∇V: tuples to be deleted from the view 
SELECT x,y,z FROM R_old,S,T WHERE R_old.i=S.i AND S.j=T.j;
-- ΔV: tuples to be inserted into the view
SELECT x,y,z FROM R_new,S,T WHERE R_new.i=S.i AND S.j=T.j;

where R_old and R_new are transition tables corresponding to ∇R and ΔR, respectively.

Multiple Tables Modification

Now, let’s see cases where multiple tables are modified in a statement. You can observe it when you use modifying CTEs (WITH clause), like:

WITH i1 AS (INSERT INTO R VALUES(1,10) RETURNING 1), 
     i2 AS (INSERT INTO S VALUES(1,100) RETURNING 1) 
SELECT;

In addition, multiple tables can be updated when you use triggers, or foreign key constraint.

Pre-Update State of Tables

At that time, we need the state of tables before the modification. For example, when some tuples, ΔR, ΔS, and ΔT are inserted into R, S and T respectively, the tuples that will be inserted into the view are calculated by the following three queries:

SELECT x,y,z FROM R_new,S_pre,T_pre WHERE R_new.i=S_pre.i AND S_pre.j=T_pre.j;
SELECT x,y,z FROM R,S_new,T_pre WHERE R.i=S_new.i AND S_new.j=T_pre.j;
SELECT x,y,z FROM R,S,T_new WHERE R.i=S.i AND S.j=T_new.j;

where R_new, S_new, T_new are transition tables corresponding to ΔR, ΔS and, ΔT respectively. S_pre, T_pre are the table states before the modification, R, S are the current states of tables, that is, after the modification.

In our implementation, the “pre-update state of table” is calculated by using some system columns. Specifically, tuples inserted into the table are filtered by cmin/xmin system columns. Also, tuples deleted from the table are put back by appending tuples stored in the old transition table.

Collecting Multiple Transition Tables

As shown above, we need transition tables as many as table modifications for view maintenance. Therefore, transition tables for each modification are collected in each AFTER trigger function call. Then, the view maintenance is performed in the last call of the trigger.

However, in the original PostgreSQL, lifespan of transition tables is not enough long to preserve all transition tables during multiple modifications. Therefore, in order to implement IVM feature, I fixed the trigger code in PostgreSQL to allow to prolong a transition table’s lifespan and prevent it from being freed from memory early.

Summary

In this article, I explained how transition table is used when multiple tables are modified simultaneously in our IVM implementation. Transition tables are useful feature to extract table changes, but when we wanted to use it to implement IVM, we had to devise methods for calculating pre-update state and prolonging transition tables lifespan. If there would be cases where these transition table’s features are useful other than IVM, implementing them in PostgreSQL core as an independent proposal might make sense. However, we don’t have good idea about use cases other than IVM for now, so they are just for IVM.

4 comments:

  1. Is your solution/algorithm and design friendly for other hardware in future e.g. FPGA or GPUs which I assume could / would have a much higher performance.

    Looking forward to seeing this committed, where is it currently at? needing committer review?

    ReplyDelete
    Replies
    1. > Looking forward to seeing this committed, where is it currently at? needing committer review?

      It will not in PG15, and the current target is PG16.
      Yes, as you say, we would like more reviews and discussions from committers and other major contributors.

      Delete
  2. Would be good to see a benchmark against readyset.io on linux 5.19 when it becomes available :) I much prefer your solution :) but would be nice to see a write-up discussing the differences between your solution and readyset.io (based on noria I believe / which you should be able to easily find his PhD paper online on Noria that he created)

    ReplyDelete
    Replies
    1. Thank you for your information!
      I'll search and read the paper.

      Delete