How data portability benefits not only your customers but also your organization

The concept of data portability is relevant not only as part of the regulatory framework governing customer data privacy - it is also important for organizations moving data across applications and hybrid cloud. Let me show you how.

Significant concern from consumers across the last few years has led to the creation of several regulations stipulating how organisations should collect, store, and manage personal data. Organizations are re-evaluating how they manage personal data, not only to comply with regulations like the General Data Protection Regulation (GDPR), but also as part of their digital transformation to improve efficiency and value.

Data portability is a key concept that organizations should understand if they wish to leverage the benefits provided by hybrid and multi-cloud architectures.

Defining data portability

Data portability is a term that describes a concept whereby:

Consumers have the right and ability to extract and move personal information between different applications, platforms, and entities
Organizations can move data between platforms to realize several benefits, from saving money to providing improved resilience

What is personal information?

Personally identifiable information (PII), also known as personal data or personal information is challenging to define. Definitions vary depending on which privacy act or regulation you happen to be reading. From an organizational standpoint, taking a broad approach to what is personal information is the safer option.

Data such as name, date of birth, tax file number, or address are easily recognizable as examples of personal information. There are also several different types of information defined in privacy acts. For example the Australian Privacy Act of 1988 recognizes the following as types of personal information:

Sensitive information such as racial, criminal, religious, sexual orientation information about a person
Health information
Credit information
Employee record

However, information not explicitly mentioned can still constitute personal information, and can extend to opinion about a person or even the person's work activities.

Data portability in the cloud

With an increase in organizations moving large amounts of their data into the cloud, concerns around portability are even more pertinent. Many service providers in the cloud store data in a proprietary format, which relies on specific software to read the data.

But issues can arise if the software does not provide the capability to efficiently export data, or if the exporting of data incurs considerable cost. In these instances, an organization could potentially find itself in a 'vendor lock-in’ scenario, or worse, in breach of data regulations where personal data is concerned.

Other reasons organizations prioritize the portability of their data, include:

Utilizing cheaper compute power – the ability to move data in an appropriate format to a provider that is offering discounted compute resources can provide a significant saving to an organization.
Added resilience – making data available on different cloud provider platforms offers a level of resiliency by being able to utilize a different provider should issues arise with the current one.

Hybrid and Multi cloud – Access considerations

Good data portability supports multi cloud architectures, which enables organizations to build flexibility and resilience into their infrastructure.

img-man-writing-on-glass-panel-light-bulb-and-business-data However, the ability to move an application’s data from one cloud platform to another isn’t the only consideration. How that data is accessed or queried by the application is also an important consideration.

Organizations dealing with large volumes of data require fast and accurate databases, with access to data via indexes and a range of other mechanisms.

A consistent feature implementation in the data storage layer regardless of which cloud platform the data layer resides in avoids many issues which may otherwise create roadblocks to achieving a beneficial multi cloud architecture where workloads and data can be moved between platforms.

Fujitsu Enterprise Postgres is available across several trusted cloud platforms, and on architectures that include X86, LinuxOne, and IBM Power, offering the same performance, security, and resilience features no matter where it resides.

This allows a consistent method to access data, implement security and other data-related policies, and administer and monitor storage across platforms, increasing reliability and also reducing risk and cost that might otherwise be associated with moving workloads between cloud platforms.

img-woman-in-data-center-01

Fujitsu Enterprise Postgres also supports hybrid architectures in a similar way, as the same flexibility and features are available in on-premises bare metal or VM-based installs as are available in the cloud versions of the product.

The ability to easily move data from any database (regardless of where it is located) to any other without having to perform transformations or change the application layer opens many options for an organization.

This article focuses on the use of an open data format, as this allows data to be utilized by other application types than that from which it is exported (true data portability).

Open data formats

To facilitate data portability, services should provide the ability to easily specify what data is to be exported and then export that data using an open data format. Two commonly used formats are JSON (JavaScript Object Notation) and XML (Extensible Markup Language). CSV (Comma Separated Values) is also commonly used, and while lacking the ability to be self-descriptive, it can provide good performance, especially with database-oriented applications due to its efficient format.

Using an open data format means that data can be easily understood by many applications, services, and people without error-prone transformation processes residing between parties.

Fujitsu Enterprise Postgres supports all the above open data formats, and can be exported from an instance on one cloud platform such as Amazon and imported on an instance running on another, such as Azure. Data exported from Fujitsu Enterprise Postgres can even be imported into OSS (open source software) Postgres, and vice versa.

Exporting data

There are two main methods of extracting data from a Fujitsu Enterprise Postgres table out to a file system file:

Structured Query Language (SQL) provides a method of selecting specific data to be exported, and when used in combination with PSQL command line utility commands such as \o (output standard outport to a file), or the copy command (designed to move the contents of a table out to a file), it provides a flexible and easy to use method of exporting specific sets of data.

copy (SELECT name, team, age FROM competitor) to '/data/comp.dat'
with CSV DELIMITER '|' HEADER;
The copy command provides an easy performance-oriented method of exporting all the data within a table out to a file.

Storage options

Fujitsu Enterprise Postgres provides the ability to extend types to cater for whatever your needs are. By default, however, data types are provided for the effective storage of both JSON and XML data in its native format.

An extensive range of operators and functions are also available for accessing data persisted in this format efficiently. This provides plenty of flexibility by providing application developers with the option of storing data in the format commonly used in communication between application servers and clients, eliminating the need to transform the data into a format for storage.

If data is stored directly in the database in an open data format, the process to extract data becomes simpler, as it is already in an appropriate format. If a more traditional relational structure is used, Fujitsu Enterprise Postgres provides an extensive range of functions and operators for transforming data to and from an open format. Some simple examples are provided in the following sections.

JSON Format

There are many JSON functions and operators available in Fujitsu Enterprise Postgres that can be found in the manual. Two useful functions to get started with for exporting data from your database in a JSON format are:

row_to_json ( record [, boolean ] ) → json
Converts an SQL composite value to a JSON object. The behaviour is the same as to_json , except that line feeds will be added between top-level elements if the optional boolean parameter is true.

row_to_json(row(1,'foo')) → {"f1":1,"f2":"foo"}

SELECT row_to_json(competitor)
FROM competitor
WHERE age < 65;
--------------------------------------------------
{"id":2, "name":"John", "team":"Lions", "age":42}
{"id":4, "name":"Mary", "team":"Seals", "age":22}
{"id":5, "name":"Paul", "team":"Lions", "age":31}
{"id":7, "name":"Anne", "team":"Bears", "age":19}
(4 rows)
json_agg ( anyelement ) → json
Collects all the input values, including nulls, into a JSON array. Values are converted to JSON as per to_json or to_jsonb.

SELECT json_agg(row_to_json(competitor))
FROM competitor
WHERE age < 65;
---------------------------------------------------
[{"id":2, "name":"John", "team":"Lions", "age":42},
{"id":4, "name":"Mary", "team":"Seals", "age":22},
{"id":5, "name":"Paul", "team":"Lions", "age":31},
{"id":7, "name":"Anne", "team":"Bears", "age":19}]
(1 row)

Note: The output above was artificially broken into separate lines for readability purposes only

XML Format

Fujitsu Enterprise Postgres also offers a range of functions for either constructing XML documents from data selected from database tables providing a high level of control, or exporting whole tables, schemas, or databases into an XML document along with its corresponding XML schema.

Below is an example of how simple it is to export data from a table in XML format.

In summary

Data portability is a key concept that organizations should understand if they wish to leverage the benefits provided by hybrid and multi-cloud architectures. It assists organizations in complying with GDPR and other privacy regulations.

Fujitsu Enterprise Postgres not only supports data portability across on-premise and cloud platforms, but also provides a consistent set of features wherever your data is located, reducing risk, and increasing resilience.

Many cloud service providers do not provide an acceptable level of data portability, which can lead to vender lock-in and challenges in conforming to privacy regulations.

This article focused on open data formats. However, if data is being moved between the same application running on different platforms, utilizing a proprietary (not open) format may yield advantages such as performance, and less susceptibility to errors.

Due consideration should be given to organizational and technical objectives and requirements to determine which approach or combination of approaches should be taken.

Subscribe to be notified of future blog posts

If you would like to be notified of my next blog posts and other PostgreSQL-related articles, fill the form here.

We also have a series of technical articles for PostgreSQL enthusiasts of all stripes, with tips and how-to's.

Explore PostgreSQL Insider >

Topics: PostgreSQL, Fujitsu Enterprise Postgres, EU General Data Protection Regulation (GDPR), Data governance, Data portability