This is great news for users of FUJITSU Enterprise Postgres, as it adds the capability to store time series data at massively high ingestion rates and then query it efficiently using the Structured Query Language (SQL) that we already know.
TimescaleDB supports all the SQL functionality that we are familiar with from your existing relational databases, and provides the advantage of not having to learn a whole new language like with other time series databases.
The ability to capture data as it changes over time and extract information from that data can provide a significant business advantage across market sectors such as:
- Internet of Things (IOT) – Industrial, Utilities, Oil & Gas, Healthcare
- Financial Services
TimescaleDB is an easy-to-learn extension that can be installed on one or all of the databases in your FUJITSU Enterprise Postgres instance.
TimescaleDB introduces the concept of hypertables to FUJITSU Enterprise Postgres, which provide an interface to many individual partitions or tables that store data in chunks. You essentially create an ordinary FUJITSU Enterprise Postgres table including a time column (usually with a type of timestamp) that will act as the primary axis.
The above screenshot demonstrates how an ordinary table that stores temperature information returned from sensors is converted to a hypertable by executing the function create_hypertable (tablename, timecolumn).
Creation of the hypertable causes a trigger to be created on the table which causes checks to be conducted on any inserts to the table, to ensure that the operation is conducted within hypertable rules.
Metadata about the hypertable is stored in specialised TimescaleDB catalog tables that are organised into a number of schemas.
Once we insert data into the hypertable, it will automatically add chunks, or specialised TimescaleDB partitions in which the data is stored. Organising data into chunks of column-oriented data is part of what greatly increases performance. These chunks can be seen in the table description below.
Querying time series data
Now that we have created the hypertable and have populated it with data, let’s look at some simple things we can do with it. The data in our tempsensor table is made up of readings taken every 5 minutes for each sensor that we store data for.
If we look at a very small sample of this data for sensor 2134, we can see the 5-minute increments.
In most cases, we don’t want to be dictated by our data when writing queries — for example, I might need to join temperature readings to other data that is provided at 30-minute intervals.
We can use the function called time_bucket_gapfill that is part of TimescaleDB to return results in buckets of 30 minutes and average all the sensor readings for that time period.
Now we have temperature readings being returned in 30-minute intervals. Unfortunately, for this sensor, data is missing for the period between 12:00 and 13:00. The missing data may be a result of the sensor being turned off, or a network failure. So TimescaleDB provides us with a way of filling in those gaps using 2 different methods:
- Last Observation Carried Forward (LOFC) — This method fills the gap with the last known value
- Interpolation — This method uses an algorithm to fill gaps between the last known and the next known value (that is, linear)
The example below uses the locf() function to fill in the values.
Now the query results can be used to join with other data at 30-minute intervals in order to find correlations between temperatures and of something like productivity/output.
This was a very simple example of using time series data, and further blogs will highlight some of the other functionality that TimescaleDB brings to FUJITSU Enterprise Postgres.
TimescaleDB requires customised settings to run optimally. The command-line tool timescaledb-tune can assist you in finding the correct settings for your FUJITSU Enterprise Instance. The utility will examine your existing postgresql.conf file and add recommended settings to the end of the file based on your existing settings and environment.
A benchmarking tool called Time Series Benchmark Suite (TSBS) is a tool useful for comparing and evaluating databases around time series data. It is also useful for evaluating your configuration changes and ensuring that you are getting the best of your installation.
A noteworthy feature of TimescaleDB is its columnar compression, which has achieved lossless compression rates of 90-95%. This level of compression represents significant disk savings, and considering the volumes often encountered with this type of data, literally has the potential to pay for itself.
The compression feature uses Gorilla compression for floats, DoD+simple-8B for timestamps and integer types, whole row dictionary compression (great for columns with repeating values), and LZ-based array compression for all of the other types. All FUJITSU Enterprise Postgres types are usable in TimescaleDB’s compression feature. More enhancements are planned to further improve JSON data types.
Not only does compression reduce the amount of I/O required to read the same amount of data (moving work to the processor), TimescaleDB utilises it effectively by leaving the more recent (typically more frequently accessed) data uncompressed, and only compressing older data which chunks are often not even accessed due to partition pruning.
Adding TimescaleDB Enterprise to FUJITSU Enterprise Postgres opens up many opportunities around time based data. I’ll certainly be writing some more blogs on time series data as I make more use of this exciting new feature.
If you want to extract more value from how your data changes over time and continue to leverage a proven database system with enterprise features such as high availability, backup/recovery, replication, and auditing, then TimescaleDB Enterprise on FUJITSU Enterprise Postgres is worth considering.
If you’d like to see how adopting an enterprise Postgres system such as FUJITSU Enterprise Postgres can help you increase your system availability, click here to make an enquiry.