YAML vs Marshal performance

January 29, 2008

A colleague of mine has built a quite sophisticated mechanism that allows for components to automatically reload if necessary in case a user interaction requires it. In order to do that though, it needs to store a significant amount of information for each reloadable component in a page. This information contains the different parameters needed to reload each component. This context information is associated to a particular browser window and is stored in a hash. This hash is persisted using the serialize method provided Rails. This method uses YAML for serialization.

We are currently working to improve the performance, and thanks to the excellent ruby-prof profiler I detected that an important amount of time was spent serializing the hash before persisting it. I decided to look for alternatives and the first one I came across was Marshal.dump.

I wrote a simple test case:

#!/usr/bin/ruby

require ‘yaml’

hash = {:key1 => ‘value1’, :key2 => ‘value2’, :key3 => ‘value3’, :key4 => {:key41 => ‘value41’, :key41 => ‘value42’}}

iterations = 10000

serialized_hash = nil

start = Time.now
1.upto(iterations) { serialized_hash = Marshal.dump(hash) }
puts “Marshal hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = Marshal.load(serialized_hash) }
puts “Reload marshalled hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { serialized_hash = hash.to_yaml }
puts “YAMLize hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = YAML::load(serialized_hash) }
puts “Reload YAMLlized hash: #{Time.now – start} seconds”

The results show that YAML is awfully slow. I will not put here the complete report, but here are the timings:

Marshal hash: 0.13829 seconds
Reload marshalled hash: 0.184913 seconds
YAMLize hash: 4.792248 seconds
Reload YAMLlized hash: 1.046568 seconds

In my tests, YAML is 34.65 times slower in serialization and 5.66 times slower in unserialization.

So be careful when serializing big objects with YAML as the performance impact can be significant .

8 Responses to “YAML vs Marshal performance”


  1. I was curious as to how JSON would compare, so I added it to the tests:

    Marshal hash: 0.12498 seconds
    Reload marshalled hash: 0.145168 seconds
    YAMLize hash: 3.655557 seconds
    Reload YAMLize hash: 0.844616 seconds
    JSONize hash: 0.183414 seconds
    Reload JSONize hash: 0.211665 seconds

    Marshal is still the clear winner, but JSON holds its own and is portable to store the state on the client (browser) as well.

  2. Ramon Guiu Says:

    Thanks for sharing. I guess, however, that you cannot serialize any object as JSON and then unserialize it as you loose the type information and therefore you are unable to reconstruct the original object.
    If you just need to serialize hashes, arrays and basic types then it’s probably a better choice because, as you say, it allows you to store the context on the client if needed.

  3. Ian Says:

    I tried your code with our YAML.dump replacement algorithm, code-named ZAML.dump[1]:

    YAMLize hash: 2.592796 seconds
    ZAMLize hash: 0.987091 seconds

    The link below details a 16x improvement with real-world data.

    [1] http://gnomecoder.wordpress.com/2008/09/27/yaml-dump-1600-percent-faster/

  4. Jake Says:

    Thanks!

    I am having exactly this problem. My yaml-serializing is taking up the bulk of the processing, which is of course not desirable at all.

    I will have to try Marshal, and thanks to your article am pretty confident the improvement will be enough to use Marshal instead.


  5. […] Otimizamos a maneira como salvamos os dados no Redis. Ao invés de objetos complexos salvamos apenas um hash com as informações essenciais da visualização e modificamos o metodo de serialização de YAML para Marshal. […]


  6. […] Otimizamos a maneira como salvamos os dados no Redis. Ao invés de objetos complexos, salvamos apenas um hash com as informações essenciais da visualização e modificamos o metodo de serialização de YAML para Marshal. […]


  7. […] Otimizamos a maneira como salvamos os dados no Redis. Ao invés de objetos complexos salvamos apenas um hash com as informações essenciais da visualização e modificamos o metodo de serialização de YAML para Marshal. […]


Leave a reply to Usando o banco de dados NoSQL Redis para otimizar sistemas de alta escalabilidade :: Top Tutoriais Cancel reply