I was recently asked where PostgreSQL actually stores data on disk. So it's time to share this...

Where does PostgreSQL store my data?

At the basic level, your data is simply stored as a set of files on disk.

If you want a backup you can simply stop the database, copy all the database data files that are on your disk and you create a backup. There are also a number of other ways to back up your PostgreSQL data and I’ll go over that in another blog entry. For now though, back to where the data resides.

Let’s assume that your PostgreSQL data directory is a directory called /database

If you aren’t quite sure where the database directory is but you can log into the database using psql you can find where your database cluster’s directory is very easily by typing the following:

show data_directory;

and you’ll see something like:

data_directory
------------------
/database

If you take a look in the /database directory you will see lots of files including:

  • postgresql.conf
  • pg_hba.conf and
  • various other files

You’ll also see a directory called base.

This is where all the database data for your whole cluster is held.

You’ll see many directories in the base directory that are just numbers. Each one of these numbered directories is a single database in your cluster.

If you want to know which directory number relates to which database, you can log into your database using psql and type the following:

SELECT oid as object_id, datname as database_name FROM pg_database;

Which returns:

object_id | database_name
----------+----------------
        1 | template1
    14799 | template0
    14804 | postgres
    20886 | test

You could alternatively use the supplied oid2name utility from a UNIX shell like this:

oid2name

Which returns:

All databases:
    Oid  Database Name  Tablespace
----------------------------------
  14804       postgres  pg_default
  14799      template0  pg_default
      1      template1  pg_default
  20886           test  pg_default

Now you have a list of databases and each object id which is used as the name of the directory within the /database/base directory.

If you wanted to look at the data in my test database (which has an OID of 20886), you could cd /database/base/20886 then list the directory contents using ls -l , at which point you will see the following:

-rw-------. 1 postgres postgres
16384 Jun 2 13:10 112
-rw-------. 1 postgres postgres
16384 Jun 2 13:11 113
-rw-------. 1 postgres postgres
8192 Jun 19 16:39 12168
-rw-------. 1 postgres postgres
24576 Jun 19 17:01 12168_fsm
-rw-------. 1 postgres postgres
8192 Jun 19 17:01 12168_vm
-rw-------. 1 postgres postgres
8192 Jun 1 23:58 12172
-rw-------. 1 postgres postgres
73728 Aug 1 17:47 1247
-rw-------. 1 postgres postgres
24576 Jun 19 17:01 1247_fsm

This is a small subset of the files in that directory. Notice anything about the file sizes? They all divide exactly by 8192 (8k). This is because PostgreSQL (by default) writes blocks of data (what PostgreSQL calls pages) to disk in 8k chunks. If you have a large table that has more than 1GB of data in it, you will see multiple files with the same number appended with .1 .2 .3 and so on like this:

-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211
-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211.1
-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211.2
-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211.3

In this test database I have a table called test_data. If you want to see which file actually contains the table test_data’s data you can do the following:

Start up psql and:

\c test
SELECT pg_relation_filepath('test_data');

pg_relation_filepath
----------------------
base/20886/186770

You could alternatively use the oid2name utility again like this:

oid2name -d test -t test_data

Which would output this:

From database "test":
Filenode Table Name
----------------------
186770 test_data

So the file /database/base/20866/186770 contains the actual data for the table test_data
If you run something like this (assuming you have hexdump installed):

hexdump -C /database/master/base/20866/186770

You will see the test_data’s data table as it’s stored in the file on the disk.

So it’s time to test this all out:

psql
CREATE DATABASE test;

\c test

CREATE TABLE test_table my_id serial , mytext varchar;

INSERT INTO test_table (mytext) VALUES ('hello there');

Don’t forget to issue a checkpoint to make sure everything is written to the database tables from the WAL

CHECKPOINT;

SELECT oid as object_id, datname as database_name FROM pg_database where datname ='test';
object_id | database_name
-----------+---------------
20886 | test

SELECT pg_relation_filepath('test_table');
pg_relation_filepath
----------------------
base/20886/186770
(1 row)

\q

hexdump -C /database/master/base/20886/186770

00000000 a8 00 00 00 d0 2a d7 55 00 00 00 00 1c 00 d8 1f |.....*.U........|
00000010 00 20 04 20 00 00 00 00 d8 9f 50 00 00 00 00 00 |. . ......P.....|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fd0 00 00 00 00 00 00 00 00 99 30 5b 02 00 00 00 00 |.........0[.....|
00001fe0 00 00 00 00 00 00 00 00 01 00 02 00 02 09 18 00 |................|
00001ff0 01 00 00 00 19 68 65 6c 6c 6f 20 74 68 65 72 65 |.....hello there|
00002000

And there it is.

My inserted varchar string, 'hello there' is now stored and visible on my disk. You might consider this visibility on disk to be a problem. If you want to encrypt the data in the files on disk you can use the FUJITSU Enterprise Postgres transparent data encryption feature that I have described here.

The files that have a number and have _fsm or _vm appended are the free space map and the visibility map for each page. This will be the subject of another post. Stay tuned.

If you require any help at all with your PostgreSQL database, then feel free to review our Support services, customised Training, or Health Check Assessment.

Topics: Fujitsu Enterprise Postgres, Enhanced Enterprise Open Source Database, PostgreSQL Development