At the basic level, your data is simply stored as a set of files on disk.
If you want a backup you can simply stop the database, copy all the database data files that are on your disk and you create a backup. There are also a number of other ways to back up your PostgreSQL data and I’ll go over that in another blog entry. For now though, back to where the data resides.
Let’s assume that your PostgreSQL data directory is a directory called /database
If you aren’t quite sure where the database directory is but you can log into the database using psql you can find where your database cluster’s directory is very easily by typing the following:
and you’ll see something like:
------------------
/database
If you take a look in the /database directory you will see lots of files including:
- postgresql.conf
- pg_hba.conf and
- various other files
You’ll also see a directory called base.
This is where all the database data for your whole cluster is held.
You’ll see many directories in the base directory that are just numbers. Each one of these numbered directories is a single database in your cluster.
If you want to know which directory number relates to which database, you can log into your database using psql and type the following:
Which returns:
----------+----------------
1 | template1
14799 | template0
14804 | postgres
20886 | test
You could alternatively use the supplied oid2name utility from a UNIX shell like this:
Which returns:
Oid Database Name Tablespace
----------------------------------
14804 postgres pg_default
14799 template0 pg_default
1 template1 pg_default
20886 test pg_default
Now you have a list of databases and each object id which is used as the name of the directory within the /database/base directory.
If you wanted to look at the data in my test database (which has an OID of 20886), you could cd /database/base/20886 then list the directory contents using ls -l , at which point you will see the following:
16384 Jun 2 13:10 112
-rw-------. 1 postgres postgres
16384 Jun 2 13:11 113
-rw-------. 1 postgres postgres
8192 Jun 19 16:39 12168
-rw-------. 1 postgres postgres
24576 Jun 19 17:01 12168_fsm
-rw-------. 1 postgres postgres
8192 Jun 19 17:01 12168_vm
-rw-------. 1 postgres postgres
8192 Jun 1 23:58 12172
-rw-------. 1 postgres postgres
73728 Aug 1 17:47 1247
-rw-------. 1 postgres postgres
24576 Jun 19 17:01 1247_fsm
This is a small subset of the files in that directory. Notice anything about the file sizes? They all divide exactly by 8192 (8k). This is because PostgreSQL (by default) writes blocks of data (what PostgreSQL calls pages) to disk in 8k chunks. If you have a large table that has more than 1GB of data in it, you will see multiple files with the same number appended with .1 .2 .3 and so on like this:
1073741824 Jun 12 10:12 20211
-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211.1
-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211.2
-rw-------. 1 postgres postgres
1073741824 Jun 12 10:12 20211.3
In this test database I have a table called test_data. If you want to see which file actually contains the table test_data’s data you can do the following:
Start up psql and:
SELECT pg_relation_filepath('test_data');
pg_relation_filepath
----------------------
base/20886/186770
You could alternatively use the oid2name utility again like this:
Which would output this:
Filenode Table Name
----------------------
186770 test_data
So the file /database/base/20866/186770 contains the actual data for the table test_data
If you run something like this (assuming you have hexdump installed):
You will see the test_data’s data table as it’s stored in the file on the disk.
So it’s time to test this all out:
CREATE DATABASE test;
\c test
CREATE TABLE test_table my_id serial , mytext varchar;
INSERT INTO test_table (mytext) VALUES ('hello there');
Don’t forget to issue a checkpoint to make sure everything is written to the database tables from the WAL
SELECT oid as object_id, datname as database_name FROM pg_database where datname ='test';
object_id | database_name
-----------+---------------
20886 | test
SELECT pg_relation_filepath('test_table');
pg_relation_filepath
----------------------
base/20886/186770
(1 row)
\q
hexdump -C /database/master/base/20886/186770
00000000 a8 00 00 00 d0 2a d7 55 00 00 00 00 1c 00 d8 1f |.....*.U........|
00000010 00 20 04 20 00 00 00 00 d8 9f 50 00 00 00 00 00 |. . ......P.....|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fd0 00 00 00 00 00 00 00 00 99 30 5b 02 00 00 00 00 |.........0[.....|
00001fe0 00 00 00 00 00 00 00 00 01 00 02 00 02 09 18 00 |................|
00001ff0 01 00 00 00 19 68 65 6c 6c 6f 20 74 68 65 72 65 |.....hello there|
00002000
And there it is.
My inserted varchar string, 'hello there' is now stored and visible on my disk. You might consider this visibility on disk to be a problem. If you want to encrypt the data in the files on disk you can use the FUJITSU Enterprise Postgres transparent data encryption feature that I have described here.
The files that have a number and have _fsm or _vm appended are the free space map and the visibility map for each page. This will be the subject of another post. Stay tuned.
If you require any help at all with your PostgreSQL database, then feel free to review our Support services, customized Training, or Health Check Assessment.