The blog of , a Ruby on Rails development team

Investigating slow PostgreSQL index only scans

In July 2018 one of our hosting customers was confronted with slow PostgreSQL queries. Statements usually taking under 1 ms took multiple seconds. Our customer already found the slow statements and started to investigate the issue.

We jumped in later and started with an usual procedure to analyse slow PostgreSQL statements using EXPLAIN ANALYZE. The following output was shown for one of the problematic queries:

production database Primary

Aggregate  (cost=32988.28..32988.29 rows=1 width=8) (actual time=2877.870..2877.870 rows=1 loops=1)
   ->  Index Only Scan using index_shipments_on_state on shipments  (cost=0.43..32905.89 rows=32955 width=0) (actual time=0.048..2877.089 rows=6423 loops=1)
         Index Cond: (state = ANY     ('{init,waiting_for_manual_correction,unresolved,unassigned,price_not_calculatable,dispatching_unconfirmed,dispatching_confirmed,pending_check_out,dispatched,picked_up,undeliverable,passed_on}'::text[]))
         Heap Fetches: 1339084
 Planning time: 0.281 ms
 Execution time: 2877.928 ms
(6 rows)

The problem did not occur when sourcing a production dump on a local development machines. A restored PITR backup on our PITR host did not suffer from this problem either:

local database of customer

 Aggregate  (cost=24312.53..24312.54 rows=1 width=8) (actual time=6.948..6.948 rows=1 loops=1)
   ->  Index Only Scan using index_shipments_on_state on shipments  (cost=0.43..24265.94 rows=18637 width=0) (actual time=0.031..5.988 rows=6733 loops=1)
         Index Cond: (state = ANY     ('{init,waiting_for_manual_correction,unresolved,unassigned,price_not_calculatable,dispatching_unconfirmed,dispatching_confirmed,pending_check_out,dispatched,picked_up,undeliverable,passed_on}'::text[]))
         Heap Fetches: 6771
 Planning time: 0.322 ms
 Execution time: 6.999 ms
(6 rows)

We saw, that the index only scan was used which should be fast. But we also saw a high count of Heap Fetches on the production server. This is a bad sign and explains why the query was slow. To understand why you need to know a bit about how PostgreSQL stores data and handles Index only scans. I'll try a simple explanation:

Imagine a transaction running a read only query on a table. The query runs for a very long time and will return multiple rows of data. If an other transaction needs to update rows which are to be returned by the running read query you don't want to to wait for the read query to finish. On the other hand you don't want your read query to return inconsistent data. MVCC solves this problem:

  1. To return a consistent state of your database the running query sees only a snapshot from the data.
  2. If a row gets updated, a new row is created with the updated data
  3. Now there are two row versions, an old one and a new one
  4. PostgreSQL keeps track which row versions should be seen by which transactions
  5. The old row versions gets marked as "valid until X"
  6. The new row version gets marked as "valid from X"

Our still running read query fetches the old row versions while newer queries will get the new row versions. But there is a problem: If we would continue like this our database size would grow endlessly. To prevent this there is VACUUM. VACUUM regularly checks if there are row versions which can be removed. This is the case when there is no transaction left needing these old row versions. So if our read query finishes and we end our transaction VACUUM would remove the old row versions which got updates by our second transaction (and therefore are replaced). VACUUM doesn't need to check every row version (by the way, tuple in PostgreSQL terminology) everytime it runs. It maintains a Visibility Map which (briefly said) stores information about tuples visibile to all current transactions and all transactions in the future. The map doesn't contain data for eacht tuple but for all tuples stored in a memory page. So if a memory page is checked by VACUUM for "dead" tuples and all of them are visible to all transactions it flips the corresponding bit in the Visibility Map. Next time it runs, it doesn't need to check this page again. If an update or delete statement changes a tuple in this page the bit is flipped back again and next time VACUUM runs it will check the tuples in this page again. This is a really short explanation of how this works. If you want to know more you should continue to read the linked PostgreSQL documentation. Emanuel's blogpost about PostgreSQL query optimization can give you deeper understanding too.

So you may ask yourself what this have to do with the slow Index only scan?

You run a query using an Index only scan which will return ALL matching row versions. Matching rows of the query must be checked if they should be seen by the current transaction (MVCC). Index only scans access the Visibility Map which gets updated by VACUUM runs. PostgreSQL will check if the page where the returned row version is stored is visible to all active transactions. If yes, it can continue with the next matching row version. If no, it must fetch the table heap (the main data storage area of the table) to check if this row version is visible for the current transaction. This is comparable slow. So a high amount of heap fetches is bad. If we have a constant high (or growing) amount of heap fetches there are the following explanations:

  • more updates/deletes in the table than auto VACUUM is capable of scanning / cleaning up
  • no VACUUM for that table
  • VACUUM is not working

We checked for the amount of updates and transactions to the table and didn't see any problematic behaviour. A simple test run of VACUUM on the table showed, that it not changed anything about the heap fetches. We also checked via SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count FROM pg_stat_all_tables; if auto VACUUM is executed for the affected table. Finally a VACUUM VERBOSE showed us the issue:

...
DETAIL:  1349534 dead row versions cannot be removed yet.
...

This means that there were 1349534 old row versions but VACUUM refused to remove them because it thinks, they're still needed. The visibility map gets no updates for these rows because these row versions needs to be checked in the next VACUUM run again. This means that our Index only scan can not benefit from fast visibility map lookups. We needed to find out the reason why these row versions were not deleted by VACUUM. One reason can be a long running transaction (either active or idle) which still requires very old row versions. So we searched for active or idle transactions:

SELECT pid, usename, application_name, state, xact_start, query from pg_stat_activity where xact_start is not null;

Of course the query returned some transactions but there were just some seconds old and disappeared shortly afterwards. Because we could not believe this we started digging around with locks, searched for broken VM files and searched again for transactions with old xmin values:

SELECT pid, datname, usename, state, backend_xmin FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY age(backend_xmin) DESC;

This took us a lot of time. It appeared to us at last, that the Replica may be responsible for the issue (even though it appeared on the Primary):

Our customer was using the Replica not just for failover scenarios but to execute expensive read queries on it. These queries ran a long time so they got canceled when the VACUUM instructions replicated from the Primary to the Replica. The PostgreSQL binary replication works sequentially. That means that newer changes will be applied only if the changes before have been applied successfully. Because it's a binary replication and the Replica needs to represent the same state as the Primary all changes including the clean-up initiated by VACUUM must be replicated. If a transaction on the Replica uses old row versions which are already deleted on the Primary it causes a replication conflict.

By default PostgreSQL will wait a short time for transactions on the Replica to finish so that it can apply the VACUUM changes afterwards. But to prevent a long replication lag it will kill the transaction on the Replica after a configured time period. The parameter name is max_standby_streaming_delay , it's default is 30 seconds. We could have increased the time-out value to satisfy the time needed for the queries on the Replica, but as this Replicas main purpose was to be an automatic failover instance this wouldn't be a good idea.

There is an alternative: We decided to enable hot_standby_feedback to prevent this. Briefly said this parameter enables the PostgreSQL Replica to tell the Primary which row versions are still needed by transactions running on the Replica. We calculated with a bit more disk usage because VACUUM would sometimes be unable to clean-up some old row versions.

With the suspicion that the Replica causes the issue we checked for long running transactions on the Replica. We did not find any. We even stopped the Replica but the issue still persisted. Just to be sure we disabled hot_standby_feedback, started a VACUUM VERBOSE and saw that the old row versions got cleaned up.

The query execution time was fine again:

                                                                                                              QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------    --
 Aggregate  (cost=332.62..332.63 rows=1 width=8) (actual time=3.351..3.351 rows=1 loops=1)
   ->  Index Only Scan using index_shipments_on_state on shipments  (cost=0.43..316.34 rows=6509 width=0) (actual time=0.046..2.453 rows=6411 loops=1)
         Index Cond: (state = ANY     ('{init,waiting_for_manual_correction,unresolved,unassigned,price_not_calculatable,dispatching_unconfirmed,dispatching_confirmed,pending_check_out,dispatched,picked_up,undeliverable,passed_on}'::text[]))
         Heap Fetches: 476
 Planning time: 0.271 ms
 Execution time: 3.405 ms
(6 rows)

The question why weeks/months old dead rows were kept remains. If hot_standby_feedback is enabled the Replica tells the Primary regularly which transaction ID is the oldest one he needs. If there is no transaction on the Replica needing any old row versions the Replica wouldn't tell the Primary that he needs these. Afterwards we can not analyse suspicion, but it's likely that it was a dead replication slot as described here. But as we can't check it after disabling hot_standby_feedback we can't be sure. At the moment there is no other Replication Slot on the Primary than the active for the hot standby (Replica). Maybe the slot got removed when disabling hot_standby_feedback. Otherwise we have no explanation for the behaviour of hot_standby_feedback.

Conclusion

We've missed to see the hot_standby_feedback at an early point of debugging and searched for transactions from clients instead of an issue caused by replication configuration. Even though the parameter shouldn’t behave like this we should have taken it into account at a earlier state of debugging (for e.g. after seeing that there were not long running transactions on the Primary).

Introductory interview with Stefan

Intro

Hi, I'm Stefan Langenmaier, I'm originally from the greater area of Augsburg (somewhere between Augsburg, Ulm and Donauwörth). My spare time is spent between sports, cooking and tinkering with software and machines that are supposed to run it (if they want to or not) :). Before joining makandra I worked as a consultant for Search & Analytics in the Solr ecosystem and as a consultant for Free Software with a focus on Java and the Web.

How did you first learn about makandra and what drew you to makandra originally?

I think the first time I heard about makandra was when they started the LTS support for Rails, but this was some time ago. I think I also landed once or twice on makandra cards for obscure web dev problems. But the connection was established through a mutual friend who said I should contact them.

What is your role at makandra?

I will be working as an operations engineer, keeping an eye on the infrastructure.

What are 3 words to describe makandra?

  • open
  • friendly
  • knowledgeable

What challenges are you looking forward at makandra?

For the moment I think the challenge is to get a good grasp of the existing infrastructure. For the future I hope to make contributions to the reliability and performance.

Are you messy or organized?

In my opinion it's best to be and work in an organized state but if you want to find a new idea or approach it has to get messy from time to time… :)

What book did you read last?

Sechs Österreicher unter den ersten fünf. Roman einer Entpiefkenisierung. by Dirk Stermann. It's funny if you like morbid topics.

Do you recall any embarrassing moment at work?

Well there was this one time my boss thought I would try to set the office on fire…

Motto or personal mantra?

Only crazy people don't change their mind. (Not the original quote)

If you had to eat one meal, every day for the rest of your life, what would it be?

A very hard choice, but it would definitely be a Mehlspeise (I'd say this is also an established term), probably Griesschnitten with fruits.

If you could only drink one beer for the rest of your life, what would it be?

Blonde au chardonnay

What did you want to be when growing up?

Monk or architect.

Vim or Emacs? (Be careful here…)

Nano and Eclipse. Why settle for something in between?

What tools are you using that you don't want to miss anymore?

GNU/Linux, Wikipedia, OpenStreetMap

How custom elements will change web development

I wish we never explained web components by "well, it's actually four different things, but I won't tell you which one you care about."

Spoiler: It's custom elements. It's always been custom elements.

Since forever, library authors have felt the pain of needing to match JavaScript with DOM elements. For instance, if you're writing a date picker, someone needs to call YourDatePicker.init(element) for every <input class="date-picker"> that gets attached to the DOM.

There is no built-in way or even standard way to do this. Every library shipped their own solution to this, using a jQuery plugin, global activation function, magic onload handlers or a framework-specific base class.

Custom elements make all of that go away. Just use in your HTML, or create it using the framework of your choice and it will Just Work ™. No JavaScript ceremony required to make a DOM element come alive. It's a giant step forward for how we bootstrap web apps, or how we share code between apps.

I think that leap is the reason we're seeing an increasing number of JavaScript libs switching to custom elements for script/element matching, although browser support is an unholy mess. Microsoft is dragging their feet with Edge and of course pretends IE11 doesn't exist anymore, while at the same time blocking upgrades from IE11 to Edge on Windows 7. Firefox hides it behind a feature flag. Safari mostly supports it with some caveats.

The polyfill from hell

"But we can polyfill it". Except you can't easily because it breaks assumptions of your build pipeline in interesting ways. You see, Chrome's native support for Custom Elements v1 requires native (not transpiled) ES6 classes because politics. And your app probably ships ES5 prototype spaghetti in production. So the Babel or TypeScript presets you use to support IE11 will break custom elements in Chrome.

Now you need TWO builds of every JavaScript file: One with native ES6 classes for Chrome and another with dumbed-down ES5 constructors for IE11. (hat tip to Lucas Dohmen)

Understanding and maintaining two build pipelines is totally doable for an experienced development team, but will probably keep custom elements out of the mainstream for now. An interesting date to note here is Jan 14th 2020, when Microsoft kills Windows 7 / IE11 and we might finally be able to ship ES6 code without transpilation.

Introductory interview with Robin

Introduce Yourself

Hey there! I'm Robin Kara, living in southern Augsburg with a nice view on my garden. After work I love tinkering, discussing, hanging out and having some beers with other tech-enthusiasts and makers at the OpenLab.

Before joining makandra I was helping cloud providers in the EMEA region to have their storage backend running lightning fast, highly available and automatically provisioned. Even before I left school, both Linux and Ruby already were the tools of my choice when it came to solving problems. Computers being my hobby and making some nice summer jobs possible, it was just natural for me to start a apprenticeship for IT professional after graduation. Generally speaking, I would always prefer and advocate free software over closed source software, as long as it does the job well.

How did you first learn about makandra and what drew you to makandra originally?

Given my personal interests it's no surprise that I first learned about makandra at our local hackerspace. When I was looking for a new job where I wanted to do more Linux during the day, I was very glad to hear from Andi (who occasionally stays at the OpenLab too) that makandra could use some support with their server operations. The fact that I could also improve my ruby skills at makandra even put the icing on the cake. :)

What is your role at makandra?

I'm working as an operations engineer, keeping your servers happy and ready.

What are 3 words to describe makandra?

  • agile
  • reliable
  • ambitious

What challenges are you looking forward at makandra?

For now, I'm looking forward getting known to all the fancy projects you host and build on our OpsComplete infrastructure. Usually it's a lot of fun for me to understand or debug complex computer networks, so I hope I can find some potential for optimizations of our infrastructure.

Are you messy or organized?

I'm desperately trying to formalize and organize everything but in the end I'm a chaotic nature and love to see how beautiful things emerge from chaos.

Best vacation you've been to?

Two weeks in Sweden's nature with some colleagues. Next to our holiday home in the forest we had a hot tub, a nice fireplace and a cold lake but absolutely no mobile internet connection. That was just the perfect setting to relax and experience nature.

Describe what you were like at age 10.

I was an easygoing but always bubbly guy, wondering about a different detail of our wondrous world every day. Often, my parents had to hold me back when I got a new toy and immediately started taking it into pieces to understand its inner workings.

Do you recall any embarrassing moment at work?

Yes. Once, I showed up at a customers site in Czechia for some weekend maintenance work, but unfortunately "off by one" week. I only realized it when I gave my customer a call to ask when he would show up and he told me "next week".. I don't want that to ever happen again. ;-)

If you were an animal what would you be?

A robin, I guess.

People would be surprised if they knew:

I did use Microsoft Windows for a certain challenge, out of free will.

What book did you read last?

Homo Deus, it's worth reading.

Vim or Emacs? (Be careful here…)

:set nocompatible and the party starts.

What tools are you using that you don't want to miss anymore?

Well, I probably couldn't live without my screw gun and my battery powered circular saw anymore. When it comes to software tools, I love my i3 window manager, taskwarrior, notmuch, vim and zsh running on a debian based distribution.


Welcome to makandra, Robin!

Rails 2.3 now supports Ruby 2.3

Rails 2.3 is 9 years old this week. And it's now compatible with Ruby 2.3.

While this is not a drop-in solution and will involve some manual work for those who want to upgrade to Ruby 2.3, it will allow Rails LTS 2.3 applications to benefit from the 2x - 4x performance improvements of Ruby 2.3 over 1.8.

If you don't want to or are unable to upgrade your Ruby version, there is no action required on your part. Rails 2.3 LTS will continue to work on Ruby 1.8.7, and we are committed to supporting 1.8.7 indefinitely.

If you have already added some monkey-patches to make your Rails LTS app work with Ruby 2.x, we recommend to remove the monkey patches and follow our upgrade guide.

Some assembly required

You can now upgrade your Rails 2.3 LTS application to Ruby 2.3 without a need for the numerous patches and workarounds related to Rails that were necessary for a Ruby upgrade in the past. It also means that we will test all new releases against Ruby 2.3 (as well as 1.8.7).

However, your application will very likely have some incompatibilities outside of Rails itself (either in your own code, or in third-party gems). Upgrading will require the attention of a Ruby developer, and a moderate amount of work and testing, depending on the size of your application.

We have managed to upgrade two medium-sized Rails 2.3 applications in about 2 days of effort each. Both are now running at roughly twice their former speed. Your mileage might vary.

If you want to attempt the upgrade, we have written a detailed upgrade guide.

Our address:
makandra GmbH
Werner-von-Siemens-Str. 6
86159 Augsburg
Germany
Contact us:
+49 821 58866 180
info@makandra.de
Commercial register court:
Augsburg Municipal Court
Register number:
HRB 24202
Sales tax identification number:
DE243555898
Chief executive officers:
Henning Koch
Thomas Eisenbarth
This website uses cookies to improve usability and analyze traffic.
I accept or learn more