Monday, December 20, 2010

PostgreSQL Performance vs. Microsoft SQL Server

A recent poster to the pgsql-performance mailing list enquired as to the relative performance of Microsoft SQL Server vs. PostgreSQL.  It's a reasonable question.  Switching databases can be a major project, and you certainly wouldn't want to do it and then find out at the end that you'd taken a huge performance hit and had to throw all your work away and switch back.  The good news is that this scenario is fairly unlikely.

Thursday, December 16, 2010

Two Hundred Commits

My first patch (add a separate TRUNCATE privilege) was committed to the PostgreSQL source code repository (at the time, CVS) by Tom Lane on September 8, 2008.  I became a committer a little over a year later, and the first commit I did myself was on December 10, 2009 (a patch from Marcin Mank to fix levenshtein_with_costs).  Of course, I screwed it up: the release team was in the middle of wrapping a minor release, and I back-patched the fix in the brief window after the release notes were written and before the release went out.

Wednesday, December 15, 2010

SE-Linux for PostgreSQL: Part 3

Back in September, I wrote two blog posts on SE-Linux integration for PostgreSQL.  In part 1, I discussed the work that had already been committed as of that time.  In part 2, I discussed what I learned at the meeting, and planned next steps.  Since then, a considerable amount of progress has been made, so it seems like a good time to revisit the issue.

Monday, December 13, 2010

Crunch Time for PostgreSQL 9.1

According to the PostgreSQL 9.1 development plan, the final CommitFest for PostgreSQL 9.1 development will begin in 33 days.  Approximately 30 days later, we'll stamp our final alpha release and begin preparations for PostgreSQL 9.1 beta.  This is exciting news, because I'm really looking forward to PostgreSQL 9.1.  It's also scary news, because there is a lot of work left to be done between now and then, and at least in the United States, Christmas is going to take a bite out of that time.

We have a number of very interesting, very significant features which were submitted to the 2010-11 CommitFest.  These include SQL/MED, extensions, synchronous replication, writeable CTEs, per-column collation, MERGE, checkpoint improvements, further infrastructure for SE-Linux integration, and my own work on unlogged tables.  While we've made significant progress on most of these features during this CommitFest, major work still remains to be done on nearly all of them, and none of them can be considered a sure thing for PostgreSQL 9.1.  It's possible - maybe even likely - that even more worthwhile features will be added to the queue between now and mid-January.

So it's crunch time.  We have about two months to define what PostgreSQL 9.1 will be.  Let's make the most of it.

Monday, December 06, 2010

The Cost of Page Faults

Over Thanksgiving, I wrote a bit about some work I did profiling PostgreSQL, and about squeezing a bit of overhead out of the backend shutdown process.  After making that change, I did some further profiling of connection startup/tearddown, and was dismayed to see that the revised profile looked pretty mundane, with most of the time being taken up by functions like memset() and memcpy() that are typically hard to optimize.

As it turns out, this profile wasn't really showing what I thought it was showing.  Andres Freund and Tom Lane theorized that the reason why memset() and memcpy() showed up so high in the profile was not because those operations were intrinsically expensive, but because those functions were triggering page faults.  Page faults occur when a process attempts to access a portion of its address space it hasn't previously touched, and the kernel must arrange to map that chunk of address space to an actual chunk of physical memory.  As it turns out, it appears that Andres and Tom were right: processing a page fault is 2 or 3 times more expensive than zeroing a page of memory.

I found this a bit surprising, because I'm in the habit of thinking of process startup on UNIX-like systems as being very cheap, but it appears that in this case there's so little actual work going on the page faults actually become the dominant cost.  This means that if we want to make a significant further reduction in our connection overhead, we're probably going to have to avoid starting a new process for each new connection.  I posted a few ideas on this topic, to which Tom Lane responded.  In short, there may be some benefit in making PostgreSQL follow a model more like Apache, where workers are spawned before they are actually needed, rather than on demand.  I don't presently have time to follow up on this, but I think it's got potential.

Friday, December 03, 2010

Troubleshooting Database Unresponsiveness

From time to time, we get complaints on the pgsql-performance mailing list about a PostgreSQL database that - for the most part - performs reasonably well, but every once in a while it becomes unresponsive for some number of seconds or even minutes; and then eventually recovers.  What's going on?