<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2826169&amp;fmt=gif">
Start  trial

    Start trial

      Discover PostgreSQL's Access Method interface for indexes, which empowers users to create custom index types for optimized data storage, retrieval, and query performance.
      PostgreSQL provides a powerful and flexible Access Method (AM) interface, which serves as the foundation for implementing and interacting with different index types in the database. The Access Method framework in PostgreSQL allows for the creation of custom index types, enabling users to tailor indexing strategies to meet specific performance or use case requirements.

      What is an access method?

      An Access Method in PostgreSQL refers to the set of routines and functions that define how the database interacts with an index. It provides the means for storing, retrieving, and manipulating indexed data efficiently. The access method is responsible for how data is organized in an index, how queries can access it, and how operations like insertions, deletions, and updates are performed on the index data.

      PostgreSQL includes several built-in access methods for indexing, such as:

      • B-Tree (default index type for most use cases)
      • Hash
      • GiST (Generalized Search Tree)
      • GIN (Generalized Inverted Index)
      • BRIN (Block Range Index)

      Each of these index types has its own Access Method implementation that dictates how data is stored and queried.

      How does the access method interface work?

      The Access Method interface in PostgreSQL provides a standard set of functions that each index type must implement. These functions define the behavior of the index during different stages of query processing. Below are the key components of the interface.

      Index creation

      When an index is created, PostgreSQL calls the corresponding access method to define how the index is physically stored. A structure called IndexAMRoutine (detailed below) holds pointers to the functions that manage index operations for that specific index type.

      In PostgreSQL, the term routine refers to functions or procedures that are part of an interface or API.
      In the context of index access methods, a routine consists of a set of related functions that collectively define the behavior of a particular index type.
      typedef struct IndexAmRoutine
      {
          NodeTag     type;
      
          /*
          * Total number of strategies (operators) by which we can traverse/search
          * this AM.  Zero if AM does not have a fixed set of strategy assignments.
          */
      
          uint16      amstrategies;
          /* total number of support functions that this AM uses */
          uint16      amsupport;
          /* opclass options support function number or 0 */
          uint16      amoptsprocnum;
          /* does AM support ORDER BY indexed column's value? */
          bool        amcanorder;
          /* interface functions */
          ambuild_function          ambuild;
          ambuildempty_function     ambuildempty;
          aminsert_function         aminsert;
          aminsertcleanup_function  aminsertcleanup;
          ambulkdelete_function     ambulkdelete;
          amvacuumcleanup_function  amvacuumcleanup;
          amcanreturn_function      amcanreturn;       /* can be NULL */
          amcostestimate_function   amcostestimate;
          amgettreeheight_function  amgettreeheight;   /* can be NULL */
          amoptions_function        amoptions;
          amproperty_function       amproperty;        /* can be NULL */
          ambuildphasename_function ambuildphasename;  /* can be NULL */
          amvalidate_function       amvalidate;
          amadjustmembers_function  amadjustmembers;   /* can be NULL */
          ambeginscan_function      ambeginscan;
          amrescan_function         amrescan;
          amgettuple_function       amgettuple;        /* can be NULL */
          amgetbitmap_function      amgetbitmap;       /* can be NULL */
          amendscan_function        amendscan;
          ammarkpos_function        ammarkpos;         /* can be NULL */
          amrestrpos_function       amrestrpos;        /* can be NULL */                   
      
          /* interface functions to support parallel index scans */
          amestimateparallelscan_function amestimateparallelscan; /* can be NULL */
          aminitparallelscan_function     aminitparallelscan;     /* can be NULL */
          amparallelrescan_function       amparallelrescan;       /* can be NULL */
      } IndexAmRoutine;

      Index scanning

      Scan functions retrieve matching index entries during query execution. PostgreSQL uses these functions to scan the index and identify relevant tuples that satisfy the query conditions.

      Insertion, deletion, and updates

      The Access Method defines how data is inserted, deleted, and updated within the index. For instance, in a B-tree index, new entries are inserted in sorted order, while a GIN index stores data as key-value pairs, representing terms in a full-text search.

      Index rebuilding and vacuuming

      To maintain index consistency, the Access Method includes routines for rebuilding and vacuuming indexes. These processes optimize storage and ensure the index remains efficient.

      Index scan walkthrough

      Understanding how PostgreSQL scans an index is crucial for optimizing query performance. PostgreSQL’s extensibility relies on metadata to define how indexes work, with catalogs storing information about index handlers and associated operations.

      Catalog metadata and index lookup:

      • The pg_class catalog table stores metadata about all database objects, including indexes.
      • A column called relam (Relation Access Method) stores the OID of the Access Method associated with each index.
      • By joining pg_class with pg_am (Access Method table) and pg_opclass (Operator Class table), we can retrieve details about index handlers and operator classes.

      Query example to retrieve metadata:

      SELECT c.relname AS index_name, a.amname AS access_method, o.opcname AS operator_class
      FROM pg_class c
      JOIN pg_am a ON c.relam = a.oid
      JOIN pg_opclass o ON o.opcmethod = a.oid;

      This allows us to determine that the OID for the B-Tree handler is 403, with multiple operators available for this handler type. 

      We can then use this as a predicate to see all operators associated with all B-Tree indexes.

      postgres=# SELECT c.relname, c.reltype, a.oid, a.amhandler, o.opcmethod, o.opcname FROM pg_class c 
      postgres-# JOIN pg_am a ON c.relam = a.oid JOIN pg_opclass o ON a.oid = o.opcmethod AND a.oid = 403;
      -[ RECORD 1 ]---------------------------------------
       relname    | pg_toast_1255_index
       reltype    | 0
       oid        | 403
       amhandler  | bthandler
       opcmethod  | 403
       opcname    | array_ops
      -[ RECORD 2 ]---------------------------------------
       relname    | pg_toast_1255_index
       reltype    | 0
       oid        | 403
       amhandler  | bthandler
       opcmethod  | 403
       opcname    | bit_ops
      -[ RECORD 3 ]---------------------------------------
       relname    | pg_toast_1255_index
       reltype    | 0
       oid        | 403
       amhandler  | bthandler
       opcmethod  | 403
       opcname    | bool_ops
      -[ RECORD 4 ]---------------------------------------
       relname    | pg_toast_1255_index
       reltype    | 0
       oid        | 403
       amhandler  | bthandler
       opcmethod  | 403
       opcname    | bytea_ops

      Querying metadata and initializing function pointers every time a query is planned and executed would be inefficient. PostgreSQL optimizes this with a caching system that avoids repeated catalog lookups for commonly used relations (tables, indexes, sequences, etc.). The key cache components include:

      • Relation cache
      • Cache initialization
      • Relation cache lookup and access
      • Cache invalidation
      We will check each component below.

      Relation cache

      A cache of metadata for all relations, including tables, indexes, and other types of relations. This improves query performance by reducing the need to access the system catalog every time a relation is referenced.

      Cache initializationill-isometric-cube-01 (2)

      PostgreSQL initializes the relation cache when the backend process starts. This process loads metadata for all relations into the cache, so it can be quickly accessed later during query execution.

      Relation cache lookup and access

      Once the cache is initialized, various parts of PostgreSQL (like the query planner and executor) can access relation metadata without needing to repeatedly consult the system catalog. Cached relation objects are used to lookup the necessary details about tables, indexes, columns, and constraints.

      Cache invalidation

      When changes are made to relations (e.g., an index is dropped or modified), PostgreSQL must invalidate the relevant cache entries to ensure that it does not use stale metadata. This ensures that the database always operates with the most up-to-date relation information.

      Query planning and index selection

      When the query planner analyzes a query, it looks for indexes that could optimize execution. For example:

      SELECT * FROM boat WHERE name = 'Summer Drift';

      For each table in the query, the planner has access to the available indexes via the cached system catalog table pg_index (see notes on the cache system above). The indexes are related to specific columns or expressions, and each index has an associated Access Method (such as B-Tree, Hash, GIN, etc), which in turn has an associated Access Method handler.

      The planner examines the relevant cached index metadata to gather details about:

      • The columns indexes
      • The index type (B-Tree, hash, etc)
      • The unique nature or partial nature of the index
      • The availability of multi-column indexes

      Cost estimation

      The planner evaluates the cost of using each available index to access the data. Cost is influenced by several factors including:

      • Sequential scans if no index is particularly helpful
      • Bitmap scans if multiple indexes might be used together for filtering
      • Index scans if a single index provides the best cost

      Eventually, the planner generates an execution plan that includes the chosen index, if any. We can use the EXPLAIN command to understand which indexes a planner has chosen.

      EXPLAIN ANALYZE SELECT * FROM boat WHERE name = 'Summer Drift';

      When the planner decides to use an index, it retrieves a structure (routine in PostgreSQL terminology) called IndexAmRoutine, which contains pointers to the appropriate Access Methods. It does this via a call to GetIndexAMRoutine(OID) passing the OID of the handler (403 in this example for B-Tree). The OidFunctionCall0 macro will call the appropriate Access Method handler (in this case, bthandler) to return the structure containing pointers to the appropriate functions for B-Tree.

      /*
       * GetIndexAmRoutine - call the specified access method handler routine to get
       * its IndexAmRoutine struct, which will be palloc'd in the caller's context.
       *
       * Note that if the amhandler function is built-in, this will not involve
       * any catalog access.  It's therefore safe to use this while bootstrapping
       * indexes for the system catalogs.  relcache.c relies on that.
       */
      IndexAmRoutine *
      GetIndexAmRoutine(Oid amhandler)
      {
         Datum           datum;
         IndexAmRoutine *routine;
      
         datum = OidFunctionCall0(amhandler);
         routine = (IndexAmRoutine *) DatumGetPointer(datum);
      
         if (routine == NULL || !IsA(routine, IndexAmRoutine))
             elog(ERROR, "index access method handler function %u did not return an IndexAmRoutine struct",
                  amhandler);
         return routine;
      }

      The planner passes the plan to the executor for execution. It will prepare for an index scan based on the plan. The executor calls index_open to open the index relation. The OID of our boat_idx1 index is passed to index open, which internally opens the index and locks it in the specified lock mode. Now the executor can interact with it.

      The executor will then call index_beginscan to begin the process of scanning an index to locate matching rows/tuples in a table based on the query predicates.

      IndexScanDesc
      index_beginscan(Relation heapRelation,
                      Relation indexRelation,
                      Snapshot snapshot,
                      int nkeys, int norderbys)
      {
          IndexScanDesc scan;
      
          Assert(snapshot != InvalidSnapshot);
      
          scan = index_beginscan_internal(indexRelation, nkeys, norderbys, snapshot, NULL, false);

      This calls an internal version of itself called index_beginscan_internal, which starts by setting up the necessary structures to perform an index scan for our query. It is the first step in using an index to retrieve data. The actual search through the index occurs after this is called, during the execution of the scan using index_gettuple, index_fetch_tuple, and similar.

      The beginscan function is passed the relation object for the table (heap) being scanned (boat), the relation object for the index being used (e.g., btree), the number of search key being used in the scan, and the number of columns the scan will use for ordering results (if applicable).

      Our earlier initialization of the IndexAMRoutine structure means the ambeginscan function pointer will refer to our B-Tree begin scan function.

      amroutine->ambeginscan = btbeginscan;

      The function will then use this information to set up structures, allocate memory, and return an index scan descriptor which holds the state information for the scan. The descriptor is then passed to subsequent functions in the index scan process like index_gettuple which uses it to track and maintain the state of the scan as it navigates the index.

      static IndexScanDesc
      index_beginscan_internal(Relation indexRelation,
                               int nkeys, int norderbys, Snapshot snapshot,
                               ParallelIndexScanDesc pscan, bool temp_snap)
      {
          IndexScanDesc scan;
      
          RELATION_CHECKS;
          CHECK_REL_PROCEDURE(ambeginscan);
      
      ... /* * Tell the AM to open a scan. */ scan = indexRelation->rd_indam->ambeginscan(indexRelation, nkeys, norderbys); /* Initialize information for parallel scan. */ scan->parallel_scan = pscan; scan->xs_temp_snap = temp_snap; return scan; }

      Above we can see where the call is made via the Access Method interface to the B-Tree-specific begin scan function.

      In this bit of code, we have:

      • indexRelation: A pointer to the relation structure of the index being used for the scan
      • indexRelation->rd_indam: Refers to the Index Access Method routine for the specific index type (B-Tree in our case) associated with the index
      • ambeginscan: Pointer to the B-Tree-specific function, which can be seen in the code snippet below
      /*
       * btbeginscan() -- start a scan on a btree index
       */
      IndexScanDesc
      btbeginscan(Relation rel, int nkeys, int norderbys)
      {
          IndexScanDesc scan;
          BTScanOpaque so;
      
          /* no order by operators allowed */
          Assert(norderbys == 0);
      
          /* get the scan */
          scan = RelationGetIndexScan(rel, nkeys, norderbys);
      
          /* allocate private workspace */
          so = (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));
      
          BTScanPosInvalidate(so->currPos);
          BTScanPosInvalidate(so->markPos);
      
          if (scan->numberOfKeys > 0)
              so->keyData = (ScanKey) palloc(scan->numberOfKeys * sizeof(ScanKeyData));
          else
              so->keyData = NULL;
      

      Below is part of the B-Tree-specific gettuple function. We can see the Index Descriptor first parameter called scan, which is used to maintain state. We can see this function actually returns a row/tuple from the index that meets our predicate.

      /*
       * btgettuple() -- Get the next tuple in the scan.
       */
      bool
      btgettuple(IndexScanDesc scan, ScanDirection dir)
      {
          BTScanOpaque so = (BTScanOpaque) scan->opaque;
          bool res;
      
          /* btree indexes are never lossy */
          scan->xs_recheck = false;
      
          /* Each loop iteration performs another primitive index scan */
          do
          {
      ... /* If we have a tuple, return it ... */ if (res) break; /* ... otherwise see if we need another primitive index scan */ } while (so->numArrayKeys && _bt_start_prim_scan(scan, dir)); return res; }

      Summary

      PostgreSQL’s Access Method interface provides a robust framework for defining and managing index types. By abstracting the implementation details, it enables extensibility and optimization for diverse workloads.

      Understanding the Access Method internals—index creation, scanning, caching, and query planning—empowers database administrators and developers to make informed indexing decisions. By leveraging built-in access methods and PostgreSQL’s efficient caching mechanisms, users can enhance query performance and ensure scalable database operations.

       

      Topics: PostgreSQL, Database management, Database indexing, Indexing techniques

      Receive our blog

      Search by topic

      Posts by Tag

      See all
      Fujitsu Enterprise Postgres
      leverages and extends the strength and reliability of PostgreSQL with additional enterprise features.
      Compare the list of features.
      photo-gary-evans-in-hlight-circle-cyan-to-blue-02
      Gary Evans
      Senior Offerings and Center of Excellence Manager
      Gary Evans heads the Center of Excellence team at Fujitsu Software, providing expert services for customers in relation to PostgreSQL and Fujitsu Enterprise Postgres.
      He previously worked in IBM, Cable and Wireless based in London and the Inland Revenue Department of New Zealand, before joining Fujitsu. With over 15 years’ experience in database technology, Gary appreciates the value of data and how to make it accessible across your organization.
      Gary loves working with organizations to create great outcomes through tailored data services and software.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >