Sunday, June 8, 2014

buildfarm vs vpath vs ccache

I think we've got more or less to the bottom of the ccache mystery I wrote about recently. It turns out that the problem of close to 100% of cache misses occurs only when the buildfarm is doing a vpath build, and then only because the buildfarm script sets up a build directory that is different each run ("pgsql.$$"). There is actually no need for this. The locking code makes sure that we can't collide with ourselves, so a hardcoded name would do just as well. This was simple an easy choice I made, I suspect without much thought, 10 years ago or so, before the buildfarm even supported vpath builds.

It also turns out there is no great point in keeping a separate cache per branch. That was a bit of a thinko on my part.

So, in my lab machine ("crake") I have made these changes: the build directory is hard coded with a ".build" suffix rather than using the PID. And it keeps a single cache, not one per branch. After making these changes, warming the new cache, and zeroing the stats, I did fresh builds on each branch. Here's what the stats looked like (cache compression is turned on):
cache directory                     ccache
cache hit (direct)                  5988
cache hit (preprocessed)             132
cache miss                             0
called for link                     1007
called for preprocessing             316
compile failed                       185
preprocessor error                    69
bad compiler arguments                 6
autoconf compile/link                737
no input file                         25
files in cache                     12201
cache size                         179.8 Mbytes
max cache size                       1.0 Gbytes

So I will probably limit this cache to, say, 300MB or so. That will be a whole lot better than the gigabytes I was using previously.

As for the benefits: on HEAD "make -j 4" now runs in 13 seconds on crake, as opposed to 90 seconds or more previously.

If we have a unified cache, it makes sense to disable the removal of the cache in failure cases, which is what started me looking at all this. We will just need to be a bit vigilant about failures, as many years ago there was at least some suspicion of persistent failures due to ccache problems.

All this should be coming to a buildfarm release soon, after I have let this test setup run for a week or so.

