Redshiftのマテリアライズド・ビュー機能(プレビュー)を試してみる

Redshiftでマテビュー機能が利用できるようになったので、試してみました。2020年2月時点ではプレビューの状態であるものの、申請なしで誰でも利用可能となっています。

こちらは利用マニュアル。

Creating Materialized Views in Amazon Redshift

事前準備

利用前提条件

Clusterのバージョンと、Maintenace Trackの設定に条件があります。

To use materialized views, you must confirm that your Amazon Redshift cluster is using the cluster maintenance track for Preview with the track named preview_features. The current cluster maintenance version for the preview track is 1.0.11746.

条件を満たすClusterを作成しました。

f:id:goodbyegangster:20200301030321p:plain

ベースのテーブルを用意

マテリアライズド・ビューのベースとなるテーブルを用意します。下記のタイタニック乗客リストの情報を利用したテーブルを作成します。

Titanic Passengers

利用SQLとデータソースを、下記のGithubに纏めています。

https://github.com/goodbyegangster/sample-data

マテリアライズド・ビューの作成

マテリアライズド・ビューを作成します。

CREATE MATERIALIZED VIEW public.mv_titanic
BACKUP YES
DISTSTYLE EVEN
SORTKEY(passenger_id) AS
  SELECT passenger_id, name, age FROM public.titanic WHERE survived IS TRUE;

細かいオプションは、こちらを参照。

CREATE MATERIALIZED VIEW

利用可能SQLクエリーの条件は、こちらの When using materialized views in Amazon Redshift, be aware of the following limitations: を参照。

Limitations and Usage Notes for Materialized Views

ちゃんとSELECTできます。

# SELECT * from public.mv_titanic limit 5;
 passenger_id |                        name                         | age
--------------+-----------------------------------------------------+------
            2 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | 38.0
            4 | Futrelle, Mrs. Jacques Heath (Lily May Peel)        | 35.0
           11 | Sandstrom, Miss. Marguerite Rut                     |  4.0
           12 | Bonnell, Miss. Elizabeth                            | 58.0
           20 | Masselmani, Mrs. Fatima                             |
(5 rows)

マテビュー一覧を表示する管理ビューはこちら。

# \x
# SELECT * FROM stv_mv_info;
-[ RECORD 1 ]----+---------------------------------------------------------------------------------------------------------------------------------
db_name          | dev
schema           | public
name             | mv_titanic
updated_upto_xid | 2696
is_stale         | f
owner_user_name  | awsuser
state            | 0

STV_MV_INFO

マテリアライズド・ビューの更新

マテビューを更新するには、明示的に REFRESH コマンドを実行する必要があります。以下、動作を確認します。

ベースのテーブルに、レコードを挿入します。

INSERT INTO public.titanic VALUES (999,true,1,'Zunko Tohoku','female',17.0,0,0,999999,50.0,NULL,'S');

更新前なので、SELECTしても取得できません。

# SELECT * FROM public.mv_titanic WHERE name = 'Zunko Tohoku';
 passenger_id | name | age
--------------+------+-----
(0 rows)

リフレッシュコマンドを実行。

REFRESH MATERIALIZED VIEW public.mv_titanic;

REFRESH MATERIALIZED VIEW

取得できるようなりました。

# SELECT * FROM public.mv_titanic WHERE name = 'Zunko Tohoku';
 passenger_id |     name     | age
--------------+--------------+------
          999 | Zunko Tohoku | 17.0

更新方法は an incremental refresh と refreshes by full recomputation とがあり、どちらが利用されるかはマテビュー作成時のSQLクエリーに依るとのことです。

REFRESH MATERIALIZED VIEW attempts to perform an incremental refresh. However, if the underlying SQL query for the materialized view can't be incrementally refreshed, then Amazon Redshift refreshes by full recomputation of the materialized view.

下記ページの A materialized view can't support incremental refreshes if it uses any of the following SQL elements: の条件に当てはまるクエリーは、フル更新になるみたいですね。

Limitations and Usage Notes for Materialized Views

その他に気になったこと

`auto vaccum delete` と `auto vaccum sort` されるのかどうか

マテリアライズド・ビューの場合、どうなのだろうと気になりましたが、以下の記述あるので、結果としてされるのだろうと思いました。

Background vacuum operations might be blocked if materialized views aren't refreshed. After an internally defined threshold period, a vacuum operation is allowed to run. When this vacuum operation happens, any dependent materialized views are marked for recomputation upon the next refresh (even if they are incremental).

Some user-initiated operations on base tables force a materialized view to be fully recomputed next time that a REFRESH operation is run. Examples of such operations are a manually invoked VACUUM, a classic resize, an ALTER DISTKEY operation, an ALTER SORTKEY operation, and a truncate operation.

ベーステーブルでvacuumされた場合、次回リフレッシュ時に fully recomputed されるとあるので（つまり全データを再ロードしている訳で）、結果としてvacuum fullと同じ処理になっているのだろうと考えました。

`auto analyze` されるのかどうか

analyzeについて情報なかったのです、これはちょっと分からなかったです。

何れにしろ、正確なところは、機能がGAされた後に確認する必要ありますね。