Start  trial

    Start trial

      With the vast amount of data being handled by organizations, they face enormous pressure to effectively harness this information so it can be used to make timely and informed decisions in order to stay ahead of the competition.

      The volume of data being generated on a daily basis is truly impressive, and it is expected to continue growing at an exponential rate in the coming years. It is estimated that every day 2,5000,000 TB of data are created all over the world1, generated from a number of digital tools and platforms used for communication, entertainment, and business. This includes data from social media posts, online searches, emails, mobile phone usage, and IoT devices, as well as data generated by sensors and other specialized equipment in various industries.

       

      Harnessing your data requires teamwork and collaboration so you can gain full knowledge of your data, its types, and where it comes from.

      Apart from being generated at ever higher rates, data is also in high demand, being consumed by several sources daily. From medical appointments to online shopping or just subscribing to websites, blogs, and online periodicals. Do we really know where our data resides or if it’s being shared, yet we hope and pray that it’s protected?

      Studies show that the majority of data management decision makers believe their data is difficult to interpret and lacks observability, with challenges in fully understanding what data they currently hold, how it is used, and who the owner is2.

      The high demand for data drives the trends in storage formats and repositories - should you choose a data lake, warehouse, or a Postgres database? The decision needs to be workload-specific, but you must understand the differences between each option.

      What is a data lake?

      A data lake is a repository of unstructured data collected from various sources, which allows for raw data without a defined format captured from IoT and business applications. The data is typically used for full-text searches and analytics.

      Due to the various input sources, the data is usually not clean and can be very complex to consume. In most cases, it will take a data scientist to build the complex queries required to deliver reports. Business-level professionals are attracted to data lakes due to their data diversity and speed of data consumption capabilities, features that have led to their increased adoption, along with their ability to store extremely large data volumes.

      What is a data warehouse?

      A data warehouse is a repository of semi-structured data, where the primary use case is reporting.

      The data sets are traditionally composed of historical transactional data, which is queried for custom report generation. The data sources, being mainly operational in nature, are moved over via ETL tools to group the data into semi-structured formats. Although a data warehouse is frequently referred to as a DSS, or decision support system, there are 3 types of data warehouses. Denormalization or the use of the third normal form (3NF) is the typical trait.

      Type Description
      Enterprise data warehouse A central repository combining data from multiple business units or departments.
      Operational data store Contains operational reporting data and metrics used to maintain the business, such as personal time and office location details.
      Data mart A divisional category within a warehouse typically housing data related to a particular business unit.

      What is a database?

      A database is a structured filing system to store data. Often when the term database is mentioned, it’s related to relational or RDBMS, but there are many types of databases.

      Relational gained popularity in the 1980s, but was later followed by object-oriented and NoSQL databases. As database technology improves, new types gain popularity, with the likes of cloud, graph and - would you believe it - self-driving databases.

      Upon deciding the use case and selecting the database type of your choice, the true challenge of understanding who needs access and how secure the data will be a major task.

      Accessing data Securing data
      Query access for ad-hoc reporting Read/write access requirements
      Reporting tools Data masking requirements
      Third-party app integration Encryption requirements
      Data exports Auditing and compliance requirements

      As you begin to focus on the tasks required, it can often be easier to harness your data by understanding the lifecycle and business processing needed for strategic decision making.

      From an executive level, terms like roles, access, and masking may seem puzzling, as the audience may not all have the same technical skill, nor should they be expected to. Which means we must focus on tasks at a higher level, by grouping them into more business-related categories.

      Business category Examples
      Communication Connectors, drivers, interfaces, tooling, languages
      Ingestion Sources of data, workflows, unstructured data
      Analysis Reporting, Machine Learning, Predictive Analytics & Modeling
      Storage Location-lake, warehouse, database, data standards, sizing
      Investment Open source, open source-supported, closed source-supported.

      How does your data impact the business?

      Fully understanding the business categories allows you to focus on the areas where your business is impacted, which is the foundation for being a data-driven organization.

      For example, the analysis category is leveraged for the decision-making process. Analyzing data results in increased customer satisfaction, focused sales campaigns, and improved operations.

      The local grocer can be used as a great example of using analysis to drive sales. The weekly sales and coupons on items should be derived from data that shows what markets that item is typically sold in. In this way, analyzing existing data produces sales and increases the volume of shipments to that store. Using data to maintain customer retention by carrying specific products will also impact the supply chain.

      The impact of data shown through customer insights can be shared with other manufacturers or business partners, from whom you may also receive data. The ingestion of the data may be in the form of flat files or unstructured data. This process needs to be well managed as you stream data in and out of your pipelines.

      The method and time it’ll take to move data through the pipeline are critical. Utilizing efficient front- and back-end tools will define success. Overall, ingestion methods should incorporate three main benefits that can aid you in both real-time and batch-based ingestions:

      • Continuous integration
      • Reduction in time
      • Flexible architecture

      The most important of the benefits may be flexible architecture, which needs to be defined before your process begins, along with the database type chosen for storing the data.

      One known best practice for a data ingestion pipeline is protection of critical data. Among the various types of open source-based solutions, Fujitsu Enterprise Postgres can offer some key features to help harness and protect critical datasets.

      Have you implemented data governance?

      Achieving compliance is a critical aspect of your data governance strategy, and should be embedded into your framework. Fujitsu Enterprise Postgres provides the capabilities for data protection, allowing structured and unstructured data to be stored in encrypted tablespaces utilizing 256-bit encryption to assist in achieving PCI DSS compliance levels. Encryption and decryption are performed by manipulating data blocks instead of individual bits, resulting in minimal overhead.

      Complementing the benefit of a flexible architecture, the Transparent Data Encryption (TDE) functionality does not add additional storage areas, because the size of the encrypted objects is not modified, and it also allows backups and logs to include the encrypted version, with no additional licensing required.

      Is your data out of control?

      In order to become data-driven, an organization must take control of its data through a cultural adoption of this strategy. It must take into account that in a typical environment, data is ingested from multiple sources, and control processes are mandatory.

      Fujitsu Enterprise Postgres can be part of this solution as it provides, among others, the following relevant capabilities:

      • Full-text searches, provided by the pg_bigm extension.
      • Table reorganization (by deleting bloated tables and indexes and reclaiming unnecessary areas) to aid in the control and flexibility of the data, provided by the pg_repack extension.
      • Increased performance as you move data from flat files to cleanse data prior to loading and building indexes, provided by the High-Speed Data Load, which uses multiple parallel workers to simultaneously perform data conversion, table creation, and index creation.
      • Take advantage of a columnar index structure to enhance performance, provided by the Vertical Clustered Index.

      img-people-discussing-at-office-23-500x334Harnessing your data requires teamwork and collaboration so you can gain full knowledge of your data, its types, and where it comes from. Once your organization accomplishes that, it can decide how to best leverage and share that data both inside and outside of the organization.

      Imagine if a major grocer had no control over their data, no way to understand the typical shopper and how they discover new products - it would be difficult for them to drive target sales. This could be the difference between the major grocer and the average grocer.

      Without the data management controls and the ability to coherently harness your data, your investments may end up just like it, out of control.

      And one more thing

      An additional method you can use to harness your data is to utilize a powerful database management tool - I will be discussing this further in my blog post next week. Be sure to check it out.


      Sources used in this article

      1: https://explodingtopics.com/blog/big-data-stats

      2: https://www.capitalone.com/software/resources/data-management-trends

      Topics: PostgreSQL, PostgreSQL community, PostgreSQL development, Open source, High-Speed Data Load, Data management

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >
      Tim Steward
      Principal Data Enterprise Architect, Fujitsu
      Tim has more than 20 years of experience in the industry with significant expertise in RDBMS, including but not limited to Postgres and Oracle, helping customers understand their architectural landscape and how they can leverage open-source database technology.
      Acknowledged as an experienced Technical Leader, Tim has spoken frequently in conferences and written numerous papers and blogs.
      Our Migration Portal helps you assess the effort required to move to the enterprise-built version of Postgres - Fujitsu Enterprise Postgres.
      Learn more about the extended and unique features that
      Fujitsu Enterprise Postgres
      provides to harness your data.
      Click below to view the list of features.
      We also have a series of technical articles for PostgreSQL enthusiasts of all stripes, with tips and how-to's.

       

      Explore PostgreSQL Insider >
      Subscribe to be notified of future blog posts
      If you would like to be notified of my next blog posts and other PostgreSQL-related articles, fill the form here.

      Read our latest blogs

      Read our most recent articles regarding all aspects of PostgreSQL and Fujitsu Enterprise Postgres.

      Receive our blog

      Fill the form to receive notifications of future posts

      Search by topic

      see all >