Our blog section will be featuring Amit in a series of articles where he will be sharing his insights, hot from the front line of PostgreSQL.
In recent years, we have seen tremendous growth of data generation and consumption, with open source technology being a consistent source of innovation — and one of the key contributors to the innovation of relational databases has been the growth of PostgreSQL, with its focus on extensibility and standards compliance.
Fujitsu has been involved in Postgres for over 18 years, and has become an active part of this community. The transparency and feedback loop among community members has helped PostgreSQL become the 4th most popular RDBMS in the world†, used by businesses in finance, data center automation, government, education, application development, and more.
As part of our continuing commitment to the community, we are pleased to welcome a PostgreSQL committer to Fujitsu. Amit Kapila is a 20-year database expert focused in database internals, with deep expertise in SQL engines, storage engines, and replication. He has been actively engaged in PostgreSQL since 2012, and is recognized for his contribution in parallel query, performance, scalability, and durable hash indexes.
Driven by his passion and continuous efforts to learn, he has earned his position as Committer and Major Contributor in PostgreSQL. His development work in his focus areas include improvements in parallelism, which continues to push PostgreSQL to the next level.
But enough from us, let's hear from Amit himself now.
Amit Kapila and PostgreSQL community
First, I would like to thank everyone at Fujitsu for their warm welcome – I have really enjoyed my first couple of months here.
During my career I have had several significant milestones, starting with my recognition in Computer Science Engineering in NIT Allahabad (India), which provided a great foundation for my work, while my employment at Oracle introduced me to databases, and my interest grew from there.
Perhaps the most significant milestone occurred with my introduction to the PostgreSQL community. This was a place I immediately felt I belonged to, and that would provide a productive and engaging environment throughout my career. This led me to later becoming a Major Contributor and Committer, which has allowed me to not only meet personal goals but also to assist others in reaching their potential in their professional lives.
In the PostgreSQL community we are extremely fortunate to have some incredible people, such as Tom Lane and Bruce Momjan, with their work in the development and growth of the community. Naturally, I have also benefitted from working with other members, especially Robert Haas, Heikki Linnakangas, Andres Freund, Thomas Munro, and Simon Riggs. The PostgreSQL community continues to grow at a rapid pace, with many organizations looking to embrace open source.
Fujitsu and PostgreSQL development
I have only been at Fujitsu for a short time, and since then I have confirmed my basic perception of the company. I respect that Fujitsu is a well-known contributor in the PostgreSQL community — a contribution that goes back many years. Fujitsu has contributed to not only small bug fixes and mid-size features, but was also instrumental in a recent major contribution — pluggable table access methods, which my colleague Pankaj Kapoor will soon discuss here in this blog space.
Fujitsu is also active in OSS conferences, whether as participant or sponsor, which helps in increasing the community’s reach and plays an active role in the continuing development of PostgreSQL.
The healthy nature of this development in the community continues, and I would like to share with you some of what I believe are the most useful features from the recent release of PostgreSQL 13:
- Deduplication of B-Tree indexes
This is an improvement in B-Tree index storage, that allows to store duplicate keys only once. Say there is an index on the country column, with lots of rows containing the values "US" and "India". With this feature, keys with the same value will be stored only once. I have seen claims that in some specific cases, this results in index size reduction of 40% or more. So, this feature will help users or workloads which have a lot of non-unique indexes in their database.
- Parallel vacuum
I was directly involved in design, review, and code contributions for this feature. This development allows vacuum speed to be increase by up to 3x — parallel vacuum can be performed very quickly where an eager vacuum is required. When the database reaches the stage of near freezing due to database bloat caused by large transactional workloads, you can now get rid of the bloat quickly.
One of the things I really like about this feature is that it won’t use memory more than a single process of vacuum, and it won’t incur more I/O — basically, it respects cost-based vacuum parameters. We designed this feature carefully considering our past experience on parallelism.
Improved query performance in specific areas where partitioning or hash aggregation is involved.
Improved statistics where users can know how much WAL data I/O is happening, which can help them tune their systems.
It is in this area, I believe there is enormous potential for Fujitsu to further expand its contribution to the development of PostgreSQL, especially in areas such as logical replication, parallelism, sharding (scale-out), general performance, and scalability improvements. Fujitsu can play a more expansive role in focusing on major feature enhancements to PostgreSQL.
I see this as an opportunity to help PostgreSQL not only continue to grow, but also to enable it to be used for larger applications. I would also like this to evolve into Fujitsu building a strong team of PostgreSQL contributors, which would assist both the community and Fujitsu in expanding their footprint and maximizing their potential.
Amit outside PostgreSQL and Fujitsu
Outside of work, I like to spend time with my family and friends. In my spare time I love reading biographies and books on leadership. And of course, material related to software engineering.
I look forward to communicating my thoughts about PostgreSQL on an ongoing basis.
† Source: https://db-engines.com/en/ranking