<img height="1" width="1" style="display:none;" alt="" src="https://px.ads.linkedin.com/collect/?pid=2826169&amp;fmt=gif">
Start  trial

    Start trial

      img-blog-curtain-author-vincent-odea-blue-to-cyan
      A customer contacted me reporting problems with database archiving, with 1,000s of errors reported in the log file.

      A problem with archiving

      During a recent client issue, the customer reported to me as TAM that their database archiving was not working as expeted.

      The customer could see the build up of WAL files in the pg_wal folder, and the archive errors listed in the Fujitsu Enterprise Postgres log file. The archive_command was using pgBackRest with archive-push. The pgBackRest repo-type was using an s3 interface to write to an IBM Spectrum Protect Storage Container (formerly known as Tivoli Storage Manager aka TSM).

      The customer storage solution was considered highly specialized infrastructure, with a team of highly experienced engineers operating it, and who were reporting it as healthy, so they felt confident that any issues must be on the database side.

      The errors seen in the database log file were:

      archive command failed with exit code 42

      and

      ERROR: [042]: raised from local-1 protocol: timeout after 60000ms waiting for read from X.X.X.X:XXXX

      The issue was that the WAL archive process was constantly timing out, and asking the customer to run the below SQL showed that the failed count was in high 1000s.
      SELECT pg_walfile_name(pg_current_wal_lsn()),last_archived_wal, last_archived_time::timestamp(0),
      now()::timestamp(0) ,last_failed_wal, last_failed_time::timestamp(0),failed_count,
      ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),9,8))::bit(32)::int*256 +
           ('x'||substring(pg_walfile_name(pg_current_wal_lsn()),17))::bit(32)::int -
           ('x'||substring(last_archived_wal,9,8))::bit(32)::int*256 -
           ('x'||substring(last_archived_wal,17))::bit(32)::int as lag
      FROM pg_stat_archiver;
      pg_walfile_name | last_archived_wal | last_archived_time | now | last_failed_wal | last_failed_time | failed# | lag
      -------------------------+--------------------------+---------------------+---------------------+--------------------------+--------------------+---------+----
      000000010000078200000067 | 000000010000078100000080 | 2024-01-22 13:36:56 | 2024-01-22 14:05:50 | 000000010000078100000081 |2024-01-22 14:06:39 | 8096 | 231

      Next step: Tweaking pgBackRest configuration

      To shed more light, I recommended increasing both the pgBackRest archive-timeout and the io- timeout options (both default to 60 seconds) to higher but different values, after which we were able to see a different error.

      For readability purposes, we broke long lines in the example below.
      SONAR_AUDIT=1;TIMESTAMP="2024-01-23 08:29:50.839 CET";APPNAME="";USER="";DB="";
      CLIENTHOSTPORT="";ERR="00000";SESSIONID="65af43e7.339e8";
      SESSIONSTART="2024-01-23 05:43:19 CET";PID="211432";VTRANSID="";TRANSID="0";
      LOG: archive command failed with exit code 101 (11263) SONAR_AUDIT=1;TIMESTAMP="2024-01-23 08:29:50.839 CET";APPNAME="";USER="";DB="";
      CLIENTHOSTPORT="";ERR="00000";SESSIONID="65af43e7.339e8";
      SESSIONSTART="2024-01-23 05:43:19 CET";PID="211432";VTRANSID="";TRANSID="0";
      DETAIL: The failed archive command was: pgbackrest --stanza=stanza_prodd7dx_20104
      --config=/var/lib/postgres_prodd7dx_20104/postgres_backup/pgbackrest/pgbackrest.conf
      archive-push pg_wal/0000000100000781000000E3 (14792) pgbackrest: /opt/fsepv12server64/lib/libpq.so.5: no version information available (required by pgbackrest)

      ERROR: [101]: raised from local-1 protocol: [507] Insufficient Storage

      Although insufficient storage usually means that the storage volume is full, it can also indicate that storage is offline or unavailable. However, the information above gave us evidence that something was not right with the TSM storage, and we were able to request the TSM team to take another look.

      So, what was the issue?

      The storage team raised a ticket with the vendor, who advised that the tool they were using to calculate utilization was flawed and reporting significantly more free space than was there - they then recommended another tool that showed the TSM storage was in fact nearly full. Given that, additional storage was added, and the archiving flowed normally.

      So after all, Fujitsu Enterprise Postgres archiving was operating as expected, which did not surprise me, considering how robust it is.

       

      In the latest versions of Fujitsu Enterprise Postgres, pgBackRest is not automatically installed during server and client installation.

      If you want to use pgBackRest on the same server as the database server, refer to Installation and Setup Guide for Server > Setup > pgBackRest 

      If you want to use pgBackRest on a different server, refer to Installation and Setup Guide for Client > Setup > pgBackRest.

      Topics: Fujitsu Enterprise Postgres, pgBackRest, Database archiving, Troubleshooting, Client issue resolution

      Receive our blog

      Search by topic

      Posts by Tag

      See all
      Learn more about the extended and unique features that
      Fujitsu Enterprise Postgres
      provides to harness your data.
      Click below to view the list of features.
      photo-vince-odea-in-hlight-circle-blue-cyan
      Vince O'Dea
      Senior Technical Consultant, Fujitsu Enterprise Postgres Center of Excellence
      Vince is a very experienced technical consultant with over 30 years industry experience as a Data Architect/Database Engineer, specialising in Oracle/Sybase/MSSQL and PostgreSQL. Vince previously spent many years working for the London Stock Exchange (LSEG), the New York Stock Exchange (NYSE), Hewlett Packard (HP), Nomura Investment Bank, Barclays De Zoete Wedd Investment Bank (BZW), and Andersen Consulting before joining Fujitsu. His Industry experience spans Financials/Manufacturing/Services & UK Government.
      Vince enjoys working with customers to help produce and implement solutions for all their technical challenges.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >