
What is an access method?
An Access Method in PostgreSQL refers to the set of routines and functions that define how the database interacts with an index. It provides the means for storing, retrieving, and manipulating indexed data efficiently. The access method is responsible for how data is organized in an index, how queries can access it, and how operations like insertions, deletions, and updates are performed on the index data.
PostgreSQL includes several built-in access methods for indexing, such as:
- B-Tree (default index type for most use cases)
- Hash
- GiST (Generalized Search Tree)
- GIN (Generalized Inverted Index)
- BRIN (Block Range Index)
Each of these index types has its own Access Method implementation that dictates how data is stored and queried.
How does the access method interface work?
The Access Method interface in PostgreSQL provides a standard set of functions that each index type must implement. These functions define the behavior of the index during different stages of query processing. Below are the key components of the interface.
Index creation
When an index is created, PostgreSQL calls the corresponding access method to define how the index is physically stored. A structure called IndexAMRoutine (detailed below) holds pointers to the functions that manage index operations for that specific index type.
In the context of index access methods, a routine consists of a set of related functions that collectively define the behavior of a particular index type.
typedef struct IndexAmRoutine
{
NodeTag type;
/*
* Total number of strategies (operators) by which we can traverse/search
* this AM. Zero if AM does not have a fixed set of strategy assignments.
*/
uint16 amstrategies;
/* total number of support functions that this AM uses */
uint16 amsupport;
/* opclass options support function number or 0 */
uint16 amoptsprocnum;
/* does AM support ORDER BY indexed column's value? */
bool amcanorder;
/* interface functions */
ambuild_function ambuild;
ambuildempty_function ambuildempty;
aminsert_function aminsert;
aminsertcleanup_function aminsertcleanup;
ambulkdelete_function ambulkdelete;
amvacuumcleanup_function amvacuumcleanup;
amcanreturn_function amcanreturn; /* can be NULL */
amcostestimate_function amcostestimate;
amgettreeheight_function amgettreeheight; /* can be NULL */
amoptions_function amoptions;
amproperty_function amproperty; /* can be NULL */
ambuildphasename_function ambuildphasename; /* can be NULL */
amvalidate_function amvalidate;
amadjustmembers_function amadjustmembers; /* can be NULL */
ambeginscan_function ambeginscan;
amrescan_function amrescan;
amgettuple_function amgettuple; /* can be NULL */
amgetbitmap_function amgetbitmap; /* can be NULL */
amendscan_function amendscan;
ammarkpos_function ammarkpos; /* can be NULL */
amrestrpos_function amrestrpos; /* can be NULL */
/* interface functions to support parallel index scans */
amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
aminitparallelscan_function aminitparallelscan; /* can be NULL */
amparallelrescan_function amparallelrescan; /* can be NULL */
} IndexAmRoutine;
Index scanning
Scan functions retrieve matching index entries during query execution. PostgreSQL uses these functions to scan the index and identify relevant tuples that satisfy the query conditions.
Insertion, deletion, and updates
The Access Method defines how data is inserted, deleted, and updated within the index. For instance, in a B-tree index, new entries are inserted in sorted order, while a GIN index stores data as key-value pairs, representing terms in a full-text search.
Index rebuilding and vacuuming
To maintain index consistency, the Access Method includes routines for rebuilding and vacuuming indexes. These processes optimize storage and ensure the index remains efficient.
Index scan walkthrough
Understanding how PostgreSQL scans an index is crucial for optimizing query performance. PostgreSQL’s extensibility relies on metadata to define how indexes work, with catalogs storing information about index handlers and associated operations.
Catalog metadata and index lookup:
- The pg_class catalog table stores metadata about all database objects, including indexes.
- A column called relam (Relation Access Method) stores the OID of the Access Method associated with each index.
- By joining pg_class with pg_am (Access Method table) and pg_opclass (Operator Class table), we can retrieve details about index handlers and operator classes.
Query example to retrieve metadata:
SELECT c.relname AS index_name, a.amname AS access_method, o.opcname AS operator_class
FROM pg_class c
JOIN pg_am a ON c.relam = a.oid
JOIN pg_opclass o ON o.opcmethod = a.oid;
This allows us to determine that the OID for the B-Tree handler is 403, with multiple operators available for this handler type.
We can then use this as a predicate to see all operators associated with all B-Tree indexes.
postgres=# SELECT c.relname, c.reltype, a.oid, a.amhandler, o.opcmethod, o.opcname FROM pg_class c
postgres-# JOIN pg_am a ON c.relam = a.oid JOIN pg_opclass o ON a.oid = o.opcmethod AND a.oid = 403;
-[ RECORD 1 ]--------------------------------------- relname | pg_toast_1255_index reltype | 0 oid | 403 amhandler | bthandler opcmethod | 403 opcname | array_ops -[ RECORD 2 ]--------------------------------------- relname | pg_toast_1255_index reltype | 0 oid | 403 amhandler | bthandler opcmethod | 403 opcname | bit_ops -[ RECORD 3 ]--------------------------------------- relname | pg_toast_1255_index reltype | 0 oid | 403 amhandler | bthandler opcmethod | 403 opcname | bool_ops -[ RECORD 4 ]--------------------------------------- relname | pg_toast_1255_index reltype | 0 oid | 403 amhandler | bthandler opcmethod | 403 opcname | bytea_ops
Querying metadata and initializing function pointers every time a query is planned and executed would be inefficient. PostgreSQL optimizes this with a caching system that avoids repeated catalog lookups for commonly used relations (tables, indexes, sequences, etc.). The key cache components include:
- Relation cache
- Cache initialization
- Relation cache lookup and access
- Cache invalidation
Relation cache
A cache of metadata for all relations, including tables, indexes, and other types of relations. This improves query performance by reducing the need to access the system catalog every time a relation is referenced.
Cache initialization
PostgreSQL initializes the relation cache when the backend process starts. This process loads metadata for all relations into the cache, so it can be quickly accessed later during query execution.
Relation cache lookup and access
Once the cache is initialized, various parts of PostgreSQL (like the query planner and executor) can access relation metadata without needing to repeatedly consult the system catalog. Cached relation objects are used to lookup the necessary details about tables, indexes, columns, and constraints.
Cache invalidation
When changes are made to relations (e.g., an index is dropped or modified), PostgreSQL must invalidate the relevant cache entries to ensure that it does not use stale metadata. This ensures that the database always operates with the most up-to-date relation information.
Query planning and index selection
When the query planner analyzes a query, it looks for indexes that could optimize execution. For example:
SELECT * FROM boat WHERE name = 'Summer Drift';
For each table in the query, the planner has access to the available indexes via the cached system catalog table pg_index (see notes on the cache system above). The indexes are related to specific columns or expressions, and each index has an associated Access Method (such as B-Tree, Hash, GIN, etc), which in turn has an associated Access Method handler.
The planner examines the relevant cached index metadata to gather details about:
- The columns indexes
- The index type (B-Tree, hash, etc)
- The unique nature or partial nature of the index
- The availability of multi-column indexes
Cost estimation
The planner evaluates the cost of using each available index to access the data. Cost is influenced by several factors including:
- Sequential scans if no index is particularly helpful
- Bitmap scans if multiple indexes might be used together for filtering
- Index scans if a single index provides the best cost
Eventually, the planner generates an execution plan that includes the chosen index, if any. We can use the EXPLAIN command to understand which indexes a planner has chosen.
EXPLAIN ANALYZE SELECT * FROM boat WHERE name = 'Summer Drift';
When the planner decides to use an index, it retrieves a structure (routine in PostgreSQL terminology) called IndexAmRoutine, which contains pointers to the appropriate Access Methods. It does this via a call to GetIndexAMRoutine(OID) passing the OID of the handler (403 in this example for B-Tree). The OidFunctionCall0 macro will call the appropriate Access Method handler (in this case, bthandler) to return the structure containing pointers to the appropriate functions for B-Tree.
/* * GetIndexAmRoutine - call the specified access method handler routine to get * its IndexAmRoutine struct, which will be palloc'd in the caller's context. * * Note that if the amhandler function is built-in, this will not involve * any catalog access. It's therefore safe to use this while bootstrapping * indexes for the system catalogs. relcache.c relies on that. */ IndexAmRoutine * GetIndexAmRoutine(Oid amhandler) { Datum datum; IndexAmRoutine *routine; datum = OidFunctionCall0(amhandler); routine = (IndexAmRoutine *) DatumGetPointer(datum); if (routine == NULL || !IsA(routine, IndexAmRoutine)) elog(ERROR, "index access method handler function %u did not return an IndexAmRoutine struct", amhandler); return routine; }
The planner passes the plan to the executor for execution. It will prepare for an index scan based on the plan. The executor calls index_open to open the index relation. The OID of our boat_idx1 index is passed to index open, which internally opens the index and locks it in the specified lock mode. Now the executor can interact with it.
The executor will then call index_beginscan to begin the process of scanning an index to locate matching rows/tuples in a table based on the query predicates.
IndexScanDesc
index_beginscan(Relation heapRelation,
Relation indexRelation,
Snapshot snapshot,
int nkeys, int norderbys)
{
IndexScanDesc scan;
Assert(snapshot != InvalidSnapshot);
scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot, NULL, false);
This calls an internal version of itself called index_beginscan_internal, which starts by setting up the necessary structures to perform an index scan for our query. It is the first step in using an index to retrieve data. The actual search through the index occurs after this is called, during the execution of the scan using index_gettuple, index_fetch_tuple, and similar.
The beginscan function is passed the relation object for the table (heap) being scanned (boat), the relation object for the index being used (e.g., btree), the number of search key being used in the scan, and the number of columns the scan will use for ordering results (if applicable).
Our earlier initialization of the IndexAMRoutine structure means the ambeginscan function pointer will refer to our B-Tree begin scan function.
amroutine->ambeginscan = btbeginscan;
The function will then use this information to set up structures, allocate memory, and return an index scan descriptor which holds the state information for the scan. The descriptor is then passed to subsequent functions in the index scan process like index_gettuple which uses it to track and maintain the state of the scan as it navigates the index.
static IndexScanDesc index_beginscan_internal(Relation indexRelation, int nkeys, int norderbys, Snapshot snapshot, ParallelIndexScanDesc pscan, bool temp_snap) { IndexScanDesc scan; RELATION_CHECKS; CHECK_REL_PROCEDURE(ambeginscan);
... /* * Tell the AM to open a scan. */ scan = indexRelation->rd_indam->ambeginscan(indexRelation, nkeys, norderbys); /* Initialize information for parallel scan. */ scan->parallel_scan = pscan; scan->xs_temp_snap = temp_snap; return scan; }
Above we can see where the call is made via the Access Method interface to the B-Tree-specific begin scan function.
In this bit of code, we have:
- indexRelation: A pointer to the relation structure of the index being used for the scan
- indexRelation->rd_indam: Refers to the Index Access Method routine for the specific index type (B-Tree in our case) associated with the index
- ambeginscan: Pointer to the B-Tree-specific function, which can be seen in the code snippet below
/* * btbeginscan() -- start a scan on a btree index */ IndexScanDesc btbeginscan(Relation rel, int nkeys, int norderbys) { IndexScanDesc scan; BTScanOpaque so; /* no order by operators allowed */ Assert(norderbys == 0); /* get the scan */ scan = RelationGetIndexScan(rel, nkeys, norderbys); /* allocate private workspace */ so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData)); BTScanPosInvalidate(so->currPos); BTScanPosInvalidate(so->markPos); if (scan->numberOfKeys > 0) so->keyData = (ScanKey) palloc(scan->numberOfKeys * sizeof(ScanKeyData)); else so->keyData = NULL;
Below is part of the B-Tree-specific gettuple function. We can see the Index Descriptor first parameter called scan, which is used to maintain state. We can see this function actually returns a row/tuple from the index that meets our predicate.
/* * btgettuple() -- Get the next tuple in the scan. */ bool btgettuple(IndexScanDesc scan, ScanDirection dir) { BTScanOpaque so = (BTScanOpaque) scan->opaque; bool res; /* btree indexes are never lossy */ scan->xs_recheck = false; /* Each loop iteration performs another primitive index scan */ do {
... /* If we have a tuple, return it ... */ if (res) break; /* ... otherwise see if we need another primitive index scan */ } while (so->numArrayKeys && _bt_start_prim_scan(scan, dir)); return res; }
Summary
PostgreSQL’s Access Method interface provides a robust framework for defining and managing index types. By abstracting the implementation details, it enables extensibility and optimization for diverse workloads.
Understanding the Access Method internals—index creation, scanning, caching, and query planning—empowers database administrators and developers to make informed indexing decisions. By leveraging built-in access methods and PostgreSQL’s efficient caching mechanisms, users can enhance query performance and ensure scalable database operations.