MongoDBのindexについて - goodbyegangsterのブログ

MongoDBでindexを作成しつつ、実行計画の見方を確認します。

環境

MongoDB Community Edition v4.2.7

利用データ

サンプルとして利用したデータは、以下の公式にて公開してくれているものを利用しています。

zips というコレクションにデータを格納しています。件数とスキーマ。

> db.zips.aggregate([{$count: "count"}])
{ "count" : 29353 }
>
> db.zips.findOne()
{
        "_id" : "01001",
        "city" : "AGAWAM",
        "loc" : [
                -72.622739,
                42.070206
        ],
        "pop" : 15338,
        "state" : "MA"
}

サイズは1MBちょっとくらいです。

indexの作成

indexの作成方法を確認します。

indexの種類

MongoDBで作成できるindexの種類には、以下があるとのこと。

Single Field index
- 単一フィールドのindex
- 昇順/降順を指定
Compound Index
- 複数フィールドへのindex
  - { userid: 1, score: -1} のように複数フィールドを指定する
Multikey Index
- 配列内の値に対するindex
Geospatial Index
- 地理座標データに対するindex
Text Index
- いわゆるフルテキストインデックス
Hashed Index
- フィールドのハッシュ値にてindexを作成する

index作成時のプロパティとして、以下を指定できるとのこと。

Unique Index
- フィールドの値に重複がある場合、重複された値はインデクシングされない
Partial Index
- フィルター条件に合致した値のみをインデクシングする
Sparse Index
- インデックスキーに指定したフィールドがない場合、インデックスから除外される
TTL Index
- 指定した秒数が経過したドキュメントをコレクションから削除する
- インデックスキーには、Date型のフィールドを指定する

Index Types

作成

city と pop フィールドに対するCompound Indexを作成します。

> db.zips.createIndex( { city: 1, pop: -1 }, { name: "i_city" } )
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

db.collection.createIndex()

作成したindexを確認。 _id フィールドには、デフォルトでindexが貼られています。

> db.zips.getIndexes()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "sample.zips"
        },
        {
                "v" : 2,
                "key" : {
                        "city" : 1,
                        "pop" : -1
                },
                "name" : "i_city",
                "ns" : "sample.zips"
        }
]

db.collection.getIndexes()

実行計画の取得

findクエリを実行して、実行計画(MongoDBではwinning planと言うみたいです)を確認してみます。 explain() というコマンドを利用します。

> db.zips.explain("queryPlanner").find({ city: "NEW YORK" }).sort({ pop: -1 })
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "sample.zips",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "city" : {
                                "$eq" : "NEW YORK"
                        }
                },
                "queryHash" : "A2226FF0",
                "planCacheKey" : "083D7CD3",
                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "city" : 1,
                                        "pop" : -1
                                },
                                "indexName" : "i_city",
                                "isMultiKey" : false,
                                "multiKeyPaths" : {
                                        "city" : [ ],
                                        "pop" : [ ]
                                },
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 2,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "city" : [
                                                "[\"NEW YORK\", \"NEW YORK\"]"
                                        ],
                                        "pop" : [
                                                "[MaxKey, MinKey]"
                                        ]
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "localhost.localdomain",
                "port" : 27017,
                "version" : "4.2.7",
                "gitVersion" : "51d9fe12b5d19720e72dcd7db0f2f17dd9a19212"
        },
        "ok" : 1
}

db.collection.explain()

実行計画取得には3つのモードがあります。

queryPlanner
- winning planのみ取得
executionStats
- 実際に処理を実行して、その統計も取得
allPlansExecution
- 選択されたwinning plan以外の、候補となったwinning planでも情報を取得

expalin結果の見方は、こちらのページを参照。

Explain Results

上のexplain結果を見ると、 IXSCAN (インデックススキャン)ステージにて作成したindexが利用されているのが分かります。

試しに、indexを作成していない同一のコレクション zips_no_index を作成して、explainを取得してみると、

db.zips_no_index.explain("queryPlanner").find({ city: "NEW YORK" }).sort({ pop: -1 })
{
        "queryPlanner" : {
                "plannerVersion" : 1,
                "namespace" : "sample.zips_no_index",
                "indexFilterSet" : false,
                "parsedQuery" : {
                        "city" : {
                                "$eq" : "NEW YORK"
                        }
                },
                "queryHash" : "A2226FF0",
                "planCacheKey" : "A2226FF0",
                "winningPlan" : {
                        "stage" : "SORT",
                        "sortPattern" : {
                                "pop" : -1
                        },
                        "inputStage" : {
                                "stage" : "SORT_KEY_GENERATOR",
                                "inputStage" : {
                                        "stage" : "COLLSCAN",
                                        "filter" : {
                                                "city" : {
                                                        "$eq" : "NEW YORK"
                                                }
                                        },
                                        "direction" : "forward"
                                }
                        }
                },
                "rejectedPlans" : [ ]
        },
        "serverInfo" : {
                "host" : "localhost.localdomain",
                "port" : 27017,
                "version" : "4.2.7",
                "gitVersion" : "51d9fe12b5d19720e72dcd7db0f2f17dd9a19212"
        },
        "ok" : 1
}

COLLSCAN (コレクション・スキャン)が実行され、クエリ内で SORT の処理が必要とされているのが、分かります。