I was recently asked where PostgreSQL actually stores data on disk. So it's time to share this...

Where does PostgreSQL store my data?

At the basic level, your data is simply stored as a set of files on disk.

If you want a backup you can simply stop the database, copy all the database data files that are on your disk and you create a backup.  There are also a number of other ways to back up your PostgreSQL data and I’ll go over that in another blog entry. For now though, back to where the data resides.  

Let’s assume that your PostgreSQL data directory is a directory called  /database

If you aren’t quite sure where the database directory is but you can log into the database using psql you can find where your database cluster’s directory is very easily by typing the following:

show data_directory;

and you’ll see something like :

 data_directory
------------------
/database

If you take a look in the /database directory you will see lots of files including 

  • postgresql.conf
  • pg_hba.conf and
  • various other files.

You’ll also see a directory called base.

This is where all the database data for your whole cluster is held.

You’ll see many directories in the base directory that are just numbers. Each one of these numbered directories is a single database in your cluster.

If you want to know which directory number relates to which database, you can log into your database using psql and type the following:

SELECT oid as object_id, datname as database_name FROM pg_database;
Which returns:
object_id | database_name
-------------+----------------
    1 | template1
14799 | template0
14804 | postgres
20886 | test

 

You could alternatively use the supplied oid2name utility from a UNIX shell like this:

oid2name
Which returns:
All databases:
Oid Database Name Tablespace
---------------------------------------------
14804 postgres pg_default
14799 template0 pg_default
1 template1 pg_default
20886 test pg_default

 Now you have a list of databases and each object id which is used as the name of the directory within the /database/base directory.

If you wanted to look at the data in my test database (which has an OID of 20886), you could cd /database/base/20886 then list the directory contents using ls -l , at which point you will see the following:

-rw-------. 1 postgres postgres16384 Jun 2 13:10 112
-rw-------. 1 postgres postgres16384 Jun 2 13:11 113
-rw-------. 1 postgres postgres8192 Jun 19 16:39 12168
-rw-------. 1 postgres postgres24576 Jun 19 17:01 12168_fsm
-rw-------. 1 postgres postgres8192 Jun 19 17:01 12168_vm
-rw-------. 1 postgres postgres8192 Jun 1 23:58 12172
-rw-------. 1 postgres postgres73728 Aug 1 17:47 1247
-rw-------. 1 postgres postgres24576 Jun 19 17:01 1247_fsm
This is a small subset of the files in that directory. Notice anything about the file sizes? They all divide exactly by 8192 (8k). This is because PostgreSQL (by default) writes blocks of data (what PostgreSQL calls pages) to disk in 8k chunks. If you have a large table that has more than 1GB of data in it, you will see multiple files with the same number appended with .1 .2 .3 and so on like this:
-rw-------. 1 postgres postgres1073741824 Jun 12 10:12 20211
-rw-------. 1 postgres postgres1073741824 Jun 12 10:12 20211.1
-rw-------. 1 postgres postgres1073741824 Jun 12 10:12 20211.2
-rw-------. 1 postgres postgres1073741824 Jun 12 10:12 20211.3

 

In this test database I have a table called test_data. If you want to see which file actually contains the table test_data’s data you  can do the following:

Start up psql then:
\c test
SELECT pg_relation_filepath('test_data');

pg_relation_filepath
----------------------
base/20886/186770

 

You could alternatively use the oid2name utility again like this:

oid2name -d test -t test_data
Which would output this:
From database "test":
Filenode Table Name
----------------------
186770 test_data

 

So the file /database/base/20866/186770 contains the actual data for the table test_data
If you run something like this (assuming you have hexdump installed) :
hexdump -C /database/master/base/20866/186770

You will see the test_data’s data table as it’s stored in the file on the disk.

So it’s time to test this all out:
psql
CREATE DATABASE test;

\c test

CREATE TABLE test_table my_id serial , mytext varchar;

INSERT INTO test_table (mytext) VALUES ('hello there');

-- Don’t forget to issue a checkpoint to make sure everything is written to the database tables from the WAL


CHECKPOINT;

SELECT oid as object_id, datname as database_name FROM pg_database where datname ='test';
object_id | database_name
-----------+---------------
20886 | test

SELECT pg_relation_filepath('test_table');
pg_relation_filepath
----------------------
base/20886/186770
(1 row)

\q

hexdump -C /database/master/base/20886/186770

00000000 a8 00 00 00 d0 2a d7 55 00 00 00 00 1c 00 d8 1f |.....*.U........|
00000010 00 20 04 20 00 00 00 00 d8 9f 50 00 00 00 00 00 |. . ......P.....|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fd0 00 00 00 00 00 00 00 00 99 30 5b 02 00 00 00 00 |.........0[.....|
00001fe0 00 00 00 00 00 00 00 00 01 00 02 00 02 09 18 00 |................|
00001ff0 01 00 00 00 19 68 65 6c 6c 6f 20 74 68 65 72 65 |.....hello there|
00002000

 

And there it is.

My inserted varchar string, 'hello there' is now stored and visible on my disk. You might consider this visibility on disk to be a problem.  If you want to encrypt the data in the files on disk you can use the FUJITSU Enterprise Postgres transparent data encryption feature that I have described here

The files that have a number and have _fsm or _vm appended are the free space map and the visibility map for each page. This will be the subject of another post. Stay tuned. 

If you require any help at all with your PostgreSQL database, then feel free to review our Support services, customised Training, or Health Check Assessment

Topics: Fujitsu Enterprise Postgres, Enhanced Enterprise Open Source Database, PostgreSQL Development