YAML vs Marshal performance

January 29, 2008

A colleague of mine has built a quite sophisticated mechanism that allows for components to automatically reload if necessary in case a user interaction requires it. In order to do that though, it needs to store a significant amount of information for each reloadable component in a page. This information contains the different parameters needed to reload each component. This context information is associated to a particular browser window and is stored in a hash. This hash is persisted using the serialize method provided Rails. This method uses YAML for serialization.

We are currently working to improve the performance, and thanks to the excellent ruby-prof profiler I detected that an important amount of time was spent serializing the hash before persisting it. I decided to look for alternatives and the first one I came across was Marshal.dump.

I wrote a simple test case:


require ‘yaml’

hash = {:key1 => ‘value1’, :key2 => ‘value2’, :key3 => ‘value3’, :key4 => {:key41 => ‘value41’, :key41 => ‘value42’}}

iterations = 10000

serialized_hash = nil

start = Time.now
1.upto(iterations) { serialized_hash = Marshal.dump(hash) }
puts “Marshal hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = Marshal.load(serialized_hash) }
puts “Reload marshalled hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { serialized_hash = hash.to_yaml }
puts “YAMLize hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = YAML::load(serialized_hash) }
puts “Reload YAMLlized hash: #{Time.now – start} seconds”

The results show that YAML is awfully slow. I will not put here the complete report, but here are the timings:

Marshal hash: 0.13829 seconds
Reload marshalled hash: 0.184913 seconds
YAMLize hash: 4.792248 seconds
Reload YAMLlized hash: 1.046568 seconds

In my tests, YAML is 34.65 times slower in serialization and 5.66 times slower in unserialization.

So be careful when serializing big objects with YAML as the performance impact can be significant .