Lessons Learned Upgrading Harvest to Ruby 1.9.3

We're thrilled to announce that all of our apps have been upgraded from REE to Ruby 1.9.3. We wanted to share some notes about what went well, what went wrong, and what we learned in the process.

The Payoff

NewRelic graph of average response time

The vertical red line marks our update to Ruby 1.9.3, and as you can see, the results were impressive (lower is better). Our average response time dropped from around 150ms per request to around 50ms.

Our server loads took similar dips:

Non-core cluster load average Main Harvest app load average

The first graph shows our server load for our non-core apps (Co-op, our forum, and some internal tools), and the second graph shows the load for our marketing site and for the main Harvest application.

Importantly, during the period shown in all the graphs above, our traffic volume was increasing steadily (and sometimes dramatically), and yet our resource usage still decreased with the upgrade.

Aside from those server-side gains, we enjoyed some local benefits as well. I did some benchmarking of our test suite for Harvest before and after the upgrade, and our suite runs 12.67% faster on Ruby 1.9.3, which saves us a few minutes on every run.

Procedure and Timelines

We upgraded six different apps from REE to 1.9.3. Our goal was to start with the smaller apps first and slowly learn our way up to the main Harvest application. Ultimately, I think this worked out well — we discovered a lot of the smaller gotchas earlier in the process on our simpler apps, and weren't sent on quite as many wild goose chases in the more complicated ones.

As we moved to each new app, our general procedure stayed roughly the same:

  1. Use RVM to jump to 1.9.3 and a clean gemset, fix any errors from bundle install (usually just by simply upgrading gem versions), then attempt to run our test suite. (Note: We've since transitioned to rbenv and rbenv-gemset due to some compatibility issues with pow, but the process is the same.)

  2. Usually, our suite would crash, and we'd have to upgrade a handful of gems and plugins.

  3. Once our tests were running, we'd step through each error and failure and work our way towards a clean run.

  4. Once we had a clean run of our test suite, we did some local click testing (hitting what we thought would be pain points), and then checked that app off the list and moved on, saving formal QA for after all apps were upgraded.

To run through these steps for each app was actually a surprisingly quick process. This blog and our forum each took less than one day, our marketing site took less than two days, and Co-op and Harvest each took just a week, although that was with the full-time focus of two developers (myself and prime hacker Barry Hess).

We were able to upgrade Ruby on all of our application servers without any downtime by using Chef and the nginx Healthcheck module (special hat-tip to the dev-ops wizardry of our very own Warwick Poole).

Changes and Pain Points

On the whole, the upgrade was a smooth affair, but we still needed to make a fair number of updates and ran into a couple of problems along the way.

Method Changes

The majority of our test failures and errors were caused by assorted syntax updates and deprecations in 1.9.3. Most of these were pretty minor, but were often hard to hunt down (like the changes to to_s for many-but-not-all classes).

  • Array#to_s performed a join in 1.8. In 1.9, it became an alias for inspect. A similar change occurred with Hash#to_s.

  • String no longer includes Enumerable, so there's no more String#each. It's been replaced by #each_byte, #each_char, #each_codepoint, and #each_line, depending on what you're after.

  • String#starts_with? and String#ends_with? became String#start_with? and String#end_with?, which was a nice and easy find-and-replace fix.

  • No more colons with when in case statements.

  • Date#parse no longer plays nicely with MM/DD/YYYY-style dates:

    1.8.7 > Date.parse("12/14/1986")
    => Sun, 14 Dec 1986
    1.9.3 > Date.parse("12/14/1986")
    ArgumentError: invalid date
    
  • Rational#to_s no longer reduces fractions-over-1 to just their integer representation:

    1.8.7 > Rational(2,1).to_s
    => "2"
    1.9.3 > Rational(2,1).to_s
    => "2/1"
    

    This caused us to briefly inform customers that they had invoices that were "38/1 days late".

This list is not exhaustive, and we found many more in our pre-upgrade research that didn't hit us (Hash#key was replaced with Hash#index, Hash#select now returns a Hash instead of an Array, Object#type became Object#class, etc.), so your mileage may vary.

CSV Changes

FasterCSV has been brought into the 1.9 standard library and is now just CSV.

Most of the fixes to handle this were pretty easy: simply update the class name from FasterCSV to CSV, then make some straightforward updates to the new CSV reading and writing methods. That knocked out almost all of our issues.

Two edge cases ended up taking up the majority of time spent on CSV fixes: properly handling imported CSVs with BOMs and with carriage returns. Let's ignore those particular fixes, though, and focus on why this is another great example of why having an exhaustive test suite is a very good thing.

We probably would have never thought to check for these edge cases in our QA, but luckily, they're covered by tests in our suite. If those tests weren't there, we probably wouldn't have known those problems existed until a customer unsuccessfully tried uploading an Excel-generated CSV, leading to a support ticket and wasted developer time to fix a bug that we've seemingly already fixed once before.

So write those tests.

Encoding

Encoding ended up being our biggest real world problem, because it didn't bite us until we went to production with Co-op. We weren't the only ones to experience this pain.

If you're interested in the ins-and-outs of encoding in 1.9, check out James Edward Gray II's 11 Part Series on Character Encoding in 1.9.

Most of our problems in development were relatively minor and fixed with magic comments.

Our big problems came up in production with data that had been stored as one encoding but now was coming out and assumed to be in UTF-8.

  1. First, we had problems with encodings in the shared cache between Harvest and Co-op. Data was coming out of the shared cache in Co-op with a ASCII-8BIT encoding, which was not what the upgraded Co-op was expecting or could handle well with its own strings all in UTF-8. Monkeypatching memcache-client allowed us to force-encode all strings coming out of the cache to UTF-8. Notably, this was only an issue while Co-op and Harvest's Ruby versions were mismatched — once we upgraded Harvest, we removed the patch and everything worked perfectly between the two apps like before.

  2. Our next big encoding problem came from serialized YAML, just like Tobi said it would. Like with the shared cache with Co-op, when the data was serialized after the upgrade, there was no problem getting it back out, so this only affected data serialized before the upgrade that was accessed afterward. We considered a few fixes here — migrate the whole DB to fix the encoding, fix the encoding as the records were individually accessed, monkeypatch ActiveRecord — and ended up going with that last one.

    class ActiveRecord::Base
      def unserialize_attribute_with_utf8(attr_name)
        traverse = lambda do |object, block|
          if object.kind_of?(Hash)
            object.each_value { |o| traverse.call(o, block) }
          elsif object.kind_of?(Array)
            object.each { |o| traverse.call(o, block) }
          else
            block.call(object)
          end
          object
        end
    
        force_encoding = lambda do |o|
          o.force_encoding(Encoding::UTF_8) if o.respond_to?(:force_encoding)
        end
    
        value = unserialize_attribute_without_utf8(attr_name)
        traverse.call(value, force_encoding)
      end
      alias_method_chain :unserialize_attribute, :utf8
    end
    

Worth It?

We think so. There has been plenty written about the theoretical speed increases you'll see with Ruby 1.9.3, but we're glad to share that we've seen significant wins in our complex real-world applications and in our local environments with just a couple of weeks of development.

Discuss on Hacker News