Start  trial

    Start trial

      I was recently asked where PostgreSQL actually stores data on disk. So it's time to share this...

      At the basic level, your data is simply stored as a set of files on disk.

      If you want a backup you can simply stop the database, copy all the database data files that are on your disk and you create a backup. There are also a number of other ways to back up your PostgreSQL data and I’ll go over that in another blog entry. For now though, back to where the data resides.

      Let’s assume that your PostgreSQL data directory is a directory called /database

      If you aren’t quite sure where the database directory is but you can log into the database using psql you can find where your database cluster’s directory is very easily by typing the following:

      show data_directory;

      and you’ll see something like:

      data_directory
      ------------------
      /database

      If you take a look in the /database directory you will see lots of files including:

      • postgresql.conf
      • pg_hba.conf and
      • various other files

      You’ll also see a directory called base.

      This is where all the database data for your whole cluster is held.

      You’ll see many directories in the base directory that are just numbers. Each one of these numbered directories is a single database in your cluster.

      If you want to know which directory number relates to which database, you can log into your database using psql and type the following:

      SELECT oid as object_id, datname as database_name FROM pg_database;

      Which returns:

      object_id | database_name
      ----------+----------------
              1 | template1
          14799 | template0
          14804 | postgres
          20886 | test

      You could alternatively use the supplied oid2name utility from a UNIX shell like this:

      oid2name

      Which returns:

      All databases:
          Oid  Database Name  Tablespace
      ----------------------------------
        14804       postgres  pg_default
        14799      template0  pg_default
            1      template1  pg_default
        20886           test  pg_default

      Now you have a list of databases and each object id which is used as the name of the directory within the /database/base directory.

      If you wanted to look at the data in my test database (which has an OID of 20886), you could cd /database/base/20886 then list the directory contents using ls -l , at which point you will see the following:

      -rw-------. 1 postgres postgres
      16384 Jun 2 13:10 112
      -rw-------. 1 postgres postgres
      16384 Jun 2 13:11 113
      -rw-------. 1 postgres postgres
      8192 Jun 19 16:39 12168
      -rw-------. 1 postgres postgres
      24576 Jun 19 17:01 12168_fsm
      -rw-------. 1 postgres postgres
      8192 Jun 19 17:01 12168_vm
      -rw-------. 1 postgres postgres
      8192 Jun 1 23:58 12172
      -rw-------. 1 postgres postgres
      73728 Aug 1 17:47 1247
      -rw-------. 1 postgres postgres
      24576 Jun 19 17:01 1247_fsm

      This is a small subset of the files in that directory. Notice anything about the file sizes? They all divide exactly by 8192 (8k). This is because PostgreSQL (by default) writes blocks of data (what PostgreSQL calls pages) to disk in 8k chunks. If you have a large table that has more than 1GB of data in it, you will see multiple files with the same number appended with .1 .2 .3 and so on like this:

      -rw-------. 1 postgres postgres
      1073741824 Jun 12 10:12 20211
      -rw-------. 1 postgres postgres
      1073741824 Jun 12 10:12 20211.1
      -rw-------. 1 postgres postgres
      1073741824 Jun 12 10:12 20211.2
      -rw-------. 1 postgres postgres
      1073741824 Jun 12 10:12 20211.3

      In this test database I have a table called test_data. If you want to see which file actually contains the table test_data’s data you can do the following:

      Start up psql and:

      \c test
      SELECT pg_relation_filepath('test_data');

      pg_relation_filepath
      ----------------------
      base/20886/186770

      You could alternatively use the oid2name utility again like this:

      oid2name -d test -t test_data

      Which would output this:

      From database "test":
      Filenode Table Name
      ----------------------
      186770 test_data

      So the file /database/base/20866/186770 contains the actual data for the table test_data
      If you run something like this (assuming you have hexdump installed):

      hexdump -C /database/master/base/20866/186770

      You will see the test_data’s data table as it’s stored in the file on the disk.

      So it’s time to test this all out:

      psql
      CREATE DATABASE test;

      \c test

      CREATE TABLE test_table my_id serial , mytext varchar;

      INSERT INTO test_table (mytext) VALUES ('hello there');

      Don’t forget to issue a checkpoint to make sure everything is written to the database tables from the WAL

      CHECKPOINT;

      SELECT oid as object_id, datname as database_name FROM pg_database where datname ='test';
      object_id | database_name
      -----------+---------------
      20886 | test

      SELECT pg_relation_filepath('test_table');
      pg_relation_filepath
      ----------------------
      base/20886/186770
      (1 row)

      \q

      hexdump -C /database/master/base/20886/186770

      00000000 a8 00 00 00 d0 2a d7 55 00 00 00 00 1c 00 d8 1f |.....*.U........|
      00000010 00 20 04 20 00 00 00 00 d8 9f 50 00 00 00 00 00 |. . ......P.....|
      00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
      *
      00001fd0 00 00 00 00 00 00 00 00 99 30 5b 02 00 00 00 00 |.........0[.....|
      00001fe0 00 00 00 00 00 00 00 00 01 00 02 00 02 09 18 00 |................|
      00001ff0 01 00 00 00 19 68 65 6c 6c 6f 20 74 68 65 72 65 |.....hello there|
      00002000

      And there it is.

      My inserted varchar string, 'hello there' is now stored and visible on my disk. You might consider this visibility on disk to be a problem. If you want to encrypt the data in the files on disk you can use the FUJITSU Enterprise Postgres transparent data encryption feature that I have described here.

      The files that have a number and have _fsm or _vm appended are the free space map and the visibility map for each page. This will be the subject of another post. Stay tuned.

      If you require any help at all with your PostgreSQL database, then feel free to review our Support services, customized Training, or Health Check Assessment.

      Topics: Fujitsu Enterprise Postgres, Enhanced enterprise open source database, PostgreSQL development

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >

      Read our latest blogs

      Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >