<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2826169&amp;fmt=gif">
Start  trial

    Start trial

      Fujitsu Enterprise Postgres and pg_bigm improve full-text search performance by using bigram indexes.

      Extending full-text search

      The pg_bigm module is an extension we use with Fujitsu Enterprise Postgres to improve full-text search performance by using 2-gram (bigram) indexes. A bigram is a pair of adjacent characters in a string. The extension helps find text patterns more quickly, especially for languages or datasets with lots of similar words or where the traditional full-text search is slow.

      Use cases

      • Fast text search

        When you need to search through large amounts of text data quickly, such as in chat logs, emails, or articles, pg_bigm can speed up the process by indexing bigrams.

      • Handling typo acceptability

        It's useful for scenarios where exact matches aren't always required, like searching for user-generated content where spelling mistakes are common.

      • Multilingual search

        pg_bigm can be particularly effective in languages where words can be split into meaningful parts or where inflection is common, improving search relevance.

      Examples

      • Searching in a product catalog

        Imagine you have an e-commerce site where users frequently search for products. If someone searches for smartfon instead of smartphone, pg_bigm can help return relevant results quickly by matching the bigrams.

      • Searching through large logs

        If you're managing logs that contain error messages, you might want to find specific errors quickly. With pg_bigm, you can search for the text 404 Not Found and find matches even if the logs contain slightly varied messages like 404 not found or 404 error not found.

      How it works

      1. Bigram index creation
        The extension creates indexes on bigrams, making it quicker to locate text patterns.
      2. Search query execution
        When a search query is executed, the extension uses the bigram index to quickly find matching records.

      Executing a text search without pg_bigm

      1. Add content to be searched

        CREATE TABLE documents (id SERIAL PRIMARY KEY, content TEXT );
        INSERT INTO documents (content) VALUES
        ('is All gold glitters that not'),
        ('that not All is glitters gold'),
        ('A database cluster is a collection of databases managed by a Fujitsu Enterprise Postgres server'),
        ('If you are hearing this definition for the first time, you might be wondering what it means'),
        ('The term database cluster in Fujitsu Enterprise Postgres does not mean ‘a group of database servers’'),
        ('A Fujitsu Enterprise Postgres server runs on a single host and manages a single database cluster'),
        ('Figure 1.1 shows the logical structure of a database cluster'), ('A database is a collection of database objects'),
        ('In the relational database theory, a database object is a data structure used to store or reference data'),
        ('A (heap) table is a typical example, and there are many others, such as indexes, sequences, views, functions'),
        ('In Fujitsu Enterprise Postgres, databases themselves are also database objects and are logically separated from each other'),
        ('All other database objects (e.g., tables, indexes, etc) belong to their respective databases'),
        ('All the database objects in Fujitsu Enterprise Postgres are internally managed by respective object identifiers (OIDs), which are unsigned 4-byte integers'),
        ('The relations between database objects and their respective OIDs are stored in appropriate system catalogs, depending on the type of objects'),
        ('For example, OIDs of databases and heap tables are stored in pg_database and pg_class respectively'),
        ('You can find out the OIDs you want to know by issuing the queries such as the following'),
        ('A database cluster is basically a single directory, referred to as base directory'),
        ('It contains some subdirectories and many files'),
        ('When you execute the initdb utility to initialize a new database cluster, a base directory will be created under the specified directory'),
        ('The path of the base directory is usually set to the environment variable PGDATA'),
        ('Figure 1.2 shows an example of database cluster in Fujitsu Enterprise Postgres'),
        ('A database is a subdirectory under the base subdirectory, and each of the tables and indexes is (at least) one file stored under the subdirectory of the database to which it belongs'),
        ('There are several subdirectories containing particular data and configuration files'),
        ('While Fujitsu Enterprise Postgres supports tablespaces, the meaning of the term is different from other RDBMSs'),
        ('A tablespace in Fujitsu Enterprise Postgres is a single directory that contains some data outside of the base directory'),
        ('In the following subsections, the layout of a database cluster, databases, files associated with tables and indexes, and tablespaces in Fujitsu Enterprise Postgres are described'),
        ('The layout of database cluster has been described in the official document'),
        ('Main files and subdirectories in a part of the document have been listed in Table 1.1'),
        ('jumps fox The dog brown the quick lazy over'),
        ('A database is a subdirectory under the base subdirectory'),
        ('The database directory names are identical to the respective OIDs'),
        ('For example, when the OID of the database ‘sampledb’ is 16384, its subdirectory name is 16384'),
        ('Each table or index whose size is less than 1GB is stored in a single file under the database directory to which it belongs'),
        ('Tables and indexes are internally managed by individual OIDs, while their data files are managed by the variable, relfilenode'),
        ('The relfilenode values of tables and indexes basically but not always match the respective OIDs, the details are described below');

      2. Create a GIN index

        CREATE INDEX gin_content_idx ON documents USING GIN (to_tsvector('english', content));

      3. Query with GIN index

        postgres=# EXPLAIN ANALYZE
        postgres-# SELECT id, content FROM documents
        postgres-# WHERE to_tsvector('english', content) @@ to_tsquery('fox & quick');

        QUERY PLAN
        ------------------------------------------------------------------------------------------------------------------------
        Bitmap Heap Scan on documents (cost=12.25..16.76 rows=1 width=36) (actual time=0.029..0.032 rows=1
        loops=1)
        Recheck Cond: (to_tsvector('english'::regconfig, content) @@ to_tsquery('fox & quick'::text))
        Heap Blocks: exact=1
        -> Bitmap Index Scan on gin_content_idx (cost=0.00..12.25 rows=1 width=0) (actual time=0.021..0.022
        rows=1 loops=1)
        Index Cond: (to_tsvector('english'::regconfig, content) @@ to_tsquery('fox & quick'::text))
        Planning Time: 0.140 ms
        Execution Time: 0.071 ms

        (7 rows)

      Searching with pg_bigm

      By using pg_bigm, you can significantly reduce the time taken for full-text searches, especially in large datasets or in applications where speed and accuracy are crucial.

      To implement pg_bigm in a Fujitsu Enterprise Postgres database, you need to install the extension and to follow these steps (the process may vary depending on your operating system and Fujitsu Enterprise Postgres version).

      1. Create the extension.

        CREATE EXTENSION pg_bigm;

        The pg_bigm module can be obtained here. Installation varies depending on the operating system and Fujitsu Enterprise Postgres version.

      2. Add content to be searched

        CREATE TABLE bigm_documents (id SERIAL PRIMARY KEY, content TEXT );
        INSERT INTO bigm_documents (content) VALUES…

      3. Create a pg_bigm index

        CREATE INDEX bigm_content_idx ON bigm_documents USING gin (content gin_bigm_ops);

      4. Query with pg_bigm index

        postgres=# EXPLAIN ANALYZE
        postgres-# SELECT id, content FROM bigm_documents WHERE content LIKE '%fox%quick%';
        QUERY PLAN

        Seq Scan on bigm_documents (cost=0.00..1.44 rows=1 width=36) (actual time=0.027..0.031 rows=1 loops=1)
        Filter: (content ~~ '%fox%quick%'::text) Rows Removed by Filter: 34
        Planning Time: 0.101 ms
         Execution Time: 0.057 ms

        (5 rows)

      Topics: Fujitsu Enterprise Postgres, Full-text search, pg_bigm, Bigram indexes, Search, Search optimization

      Receive our blog

      Search by topic

      Posts by Tag

      See all
      Fujitsu Enterprise Postgres
      The hybrid multi-cloud Postgres backed by Fujitsu
      Nishchay Kothari
      Technical Consultant, Fujitsu Enterprise Postgres Center of Excellence
      Nishchay Kothari is an outstanding technical consultant with over 13 years of expertise in relational database management systems (RDBMS). Nishchay has experience with a wide range of database technologies, including PostgreSQL, SQL Server, and Oracle.
      Nishchay has positioned himself as a go-to resource for organizations wanting to optimize their database infrastructure and architectural solutions driven by his passion for addressing complicated technological challenges.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >