Thursday, August 18, 2011

Index-Only Scans: Now There's a Patch

In November of 2010, I blogged about a much-requested PostgreSQL feature: index-only scans.  We've made some progress!  In June of this year, I committed a patch (and then, after Heikki found some bugs, another patch) to make the visibility map crash-safe.  In previous releases, it was possible for the visibility map to become incorrect after a system crash, which means that it could not be relied on for anything very critical.  That should be fixed now.  Last week, I posted a patch for the main feature: index-only scans.

Ibrar Ahmed (also of EnterpriseDB) and I worked together on this patch, and it still needs more work, and more testing.  Heikki Linnakangas quickly spotted a bug, which fortunately should be easy to fix, and there's some other stuff that needs to be improved as well.  Greg Smith posted some nifty performance results, showing that the patch can produce a large performance boost when it avoids pulling lots of blocks into cache.

The test query I came up with to exercise this patch (which you can see in some of the emails linked above) generates lots of random I/O by scanning one table and probing a second table for a row with a key matching a value extracted from the first table.  This is, in some ways, the best case for this type of optimization, because it involves a single query generating lots and lots of index probes that the optimization can potentially manage to skip.


In theory, with this patch applied, it should also be possible to get a performance benefit by creating a so-called "covering index", where you index columns you're not filtering on so that the database will be able to find all the columns it needs in the index, without fetching the underlying table blocks.  I haven't tested this scenario yet, though.

Assuming all goes well, I expect some future version of this patch to be committed to PostgreSQL 9.2, due out next year.

7 comments:

  1. this is something I've been waiting for a long time! Thank you very much. I wonder if this can be applied to 8.4?

    ReplyDelete
  2. @Slava The PostgreSQL project only ever backports bug fixes. This is a major new feature, so no.

    ReplyDelete
  3. This is great. You don't really need to be so cautious about your theory on whether covering indexes are going to help. Thousands of people can verify that they work wonders for performance on lots of other databases. This is going to be a huge step forward for PostgreSQL's performance. If it isn't, then something is wrong with the implementation, it's that simple :-)

    ReplyDelete
  4. Wow, is it in the 09/2011 commitfest?

    ReplyDelete
  5. Does this also mean that there are no more heap hint bit writes during read-only queries?

    ReplyDelete
  6. @intgr: No, this has no impact on the setting of hint bits:

    @Baron: Thanks for the positive feedback. I agree that there SHOULD be a big benefit; but it's always better to test it out before drawing conclusions.

    @pabloj: I haven't added it to the CommitFest page yet, but will do so soon.

    ReplyDelete