YAML vs Marshal performance

January 29, 2008

A colleague of mine has built a quite sophisticated mechanism that allows for components to automatically reload if necessary in case a user interaction requires it. In order to do that though, it needs to store a significant amount of information for each reloadable component in a page. This information contains the different parameters needed to reload each component. This context information is associated to a particular browser window and is stored in a hash. This hash is persisted using the serialize method provided Rails. This method uses YAML for serialization.

We are currently working to improve the performance, and thanks to the excellent ruby-prof profiler I detected that an important amount of time was spent serializing the hash before persisting it. I decided to look for alternatives and the first one I came across was Marshal.dump.

I wrote a simple test case:

#!/usr/bin/ruby

require ‘yaml’

hash = {:key1 => ‘value1’, :key2 => ‘value2’, :key3 => ‘value3’, :key4 => {:key41 => ‘value41’, :key41 => ‘value42’}}

iterations = 10000

serialized_hash = nil

start = Time.now
1.upto(iterations) { serialized_hash = Marshal.dump(hash) }
puts “Marshal hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = Marshal.load(serialized_hash) }
puts “Reload marshalled hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { serialized_hash = hash.to_yaml }
puts “YAMLize hash: #{Time.now – start} seconds”

start = Time.now
1.upto(iterations) { reloaded_hash = YAML::load(serialized_hash) }
puts “Reload YAMLlized hash: #{Time.now – start} seconds”

The results show that YAML is awfully slow. I will not put here the complete report, but here are the timings:

Marshal hash: 0.13829 seconds
Reload marshalled hash: 0.184913 seconds
YAMLize hash: 4.792248 seconds
Reload YAMLlized hash: 1.046568 seconds

In my tests, YAML is 34.65 times slower in serialization and 5.66 times slower in unserialization.

So be careful when serializing big objects with YAML as the performance impact can be significant .

Advertisements

Ever wondered how to easily add image attachment support to your Rails application? Then you should definitely give attachment_fu a go, a very easy to use Rails plugin by Rick Olson.

(Note: This article would not have been possible without Mike Clark’s excellent attachment_fu tutorial.)

Step 1: Installation (on Ubuntu 6.10)

Installing the plugin is as easy as it gets:
script/plugin install http://svn.techno-weenie.net/projects/plugins/attachment_fu/

In order to do some image processing you need to install one of the following packages as well:

  • ImageScience
  • RMagick
  • minimagick

ImageScience is the simplest of all of them only allowing to resize images. It depends on FreeImage and RubyInline.
This is the one I have ended up using as it is enough for me.
It is not available on Ubuntu repositories, so I had to install it manually following the instructions in their website:

sudo gem install -y image_science

which also installs RubyInline, hoe and rubyforge gems.

Installing FreeImage required me to install cvs (to check out the sources) and g++ first:

sudo apt-get install cvs g++

cvs -z3 -d:pserver:anonymous@freeimage.cvs.sourceforge.net:/cvsroot/freeimage login (just type enter when asked for a password)
cvs -z3 -d:pserver:anonymous@freeimage.cvs.sourceforge.net:/cvsroot/freeimage co -P FreeImage
cd FreeImage
make
sudo make install

Step 2: Preparing your Rails application

In my application I have a Work model to which I want to associate images. Images are submitted by users and are associated to one single Work, a has_many / belongs_to association between a Work and the associated images. My application has also users and I want to know who added a particular image (to prevent abuse).

In order to make use of the functionality provided by attachment_fu you need to create an ActiveRecord model with at least the following attributes:

  • content_type: what sort of content you are storing. This is used by web browsers to know how to present this information to users (open an external application, show embedded using a plugin, etc).
  • filename: a pointer to the image location
  • size: the size in bytes of the attachment

When you store images, attachment_fu makes use of some other useful fields:

  • parent_id: if you store thumbnails to associate them to the parent image (this could actually be used for other type of content as well)
  • thumbnail: as you can have more than one thumbnail, this fields contains the identifier assign to each type of thumbnail.
  • width: the width of the image.
  • heigth: the height of the image.

In my case as I have added the following attributes:

  • work_id: the work that the image is associated to.
  • user_id: the user that added the image
  • default: whether this is the default image to be used when displaying the work
  • created_at: when the image was added

Let’s create the model:

script/generate model WorksImages

My migrations file looks like this one:



class CreateWorkImages < ActiveRecord::Migration

  def self.up

    create_table :work_images, :options => 'ENGINE=InnoDB DEFAULT CHARSET=utf8' do |t|
      t.column :work_id, :integer, :null => false
      t.column :user_id, :integer, :null => false
      t.column :default, :boolean, :null => false, :default => false
      t.column :created_at, :datetime, :null => false
      t.column :parent_id,  :integer, :null => true
      t.column :content_type, :string, :null => false
      t.column :filename, :string, :null => false
      t.column :thumbnail, :string, :null => true
      t.column :size, :integer, :null => false
      t.column :width, :integer, :null => true
      t.column :height, :integer, :null => true
    end
    execute "alter table work_images add constraint fk_wi_works foreign key (work_id) references works(id)"
    execute "alter table work_images add constraint fk_wi_user foreign key (user_id) references users(id)"
  end

  def self.down
    drop_table :work_images
  end
end

Let’s edit the WorksImages model to make use of the attachment_flu plugin:


class WorkImage < ActiveRecord::Base  
  has_attachment :content_type => :image,
                 :storage => :file_system,
                 :max_size => 100.kilobytes,
                 :resize_to => '200x200>',
                 :thumbnails => { :thumb => '50x50>' },
                 :processor => 'ImageScience'

validates_as_attachment

  belongs_to :work
  belongs_to :user

  #The block will be executed just before the thumbnail is saved.
  #We need to set extra values in the thumbnail class as
  #we want it to have the same extra attribute values as the original image
  #except for the default flag that is always set to false
  before_thumbnail_saved do |record, thumbnail|
    thumbnail.user_id = record.user_id
    thumbnail.work_id = record.work_id
    thumbnail.default = false
  end
  end

I wanted to be able to attach images by providing its url, rather than asking the user to download the image and upload it to the system, This can also be used when querying ecommerce apis (like the amazon one) to retrieve and store the images they return. So I enriched my WorkImage model with an extra method (which I guess would be a good feature to be added to the attachment_fu plugin)


def source_url=(url)
  return nil if not url
  http_getter = Net::HTTP
  uri = URI.parse(url)
  response = http_getter.start(uri.host, uri.port) {|http|
    http.get(uri.path)
  }
  case response
  when Net::HTTPSuccess
    file_data = response.body
    return nil if file_data.nil? || file_data.size == 0
    self.content_type = response.content_type
    self.temp_data = file_data
    self.filename = uri.path.split('/')[-1]
  else
    return nil
  end
end

I also enrich my Work model to easily retrieve associated images. You can easily add new relationships for easy access to thumbnails.


class Work < ActiveRecord::Base
...
  has_many :images, :class_name => 'WorkImage', :conditions => ["work_images.parent_id is null"] #The condition avoids retrieving thumbnails
  #Easily retrieve the default image
  has_one  :default_image, :class_name => 'WorkImage', :conditions => ["work_images.default"]
...
end 

Step 3: Make use of the new model in the controller and view

In my controller, when I want to add an image to a model I do something like the following:


def add_image
...
  #Store the image if any
  if params[:image_source_url]
    image = WorkImage.new(:source_url => params[:image_source_url])
    image.work_id = @work.id
    image.user_id = self.current_user.id
    image.default = true if params[:is_default_image]
    image.save!
  end
...
end

Images will be saved in public/work_images using something that Jaimis buck from 37signals called id partitioning.
That way you can theoretically store 9999 * 10000 attachments (thumbnails are not counted as attachments), which for standard purposes is enough. Anyway, this can easily be changed to support more files if you need it. Look for a method named partitioned_path in vendor/plugins/attachment_fu/lib/technoweenie/attachment_fu/backends/file_system_backend.rb.

In order to display the default image in a view I just need to do the following:

<%= image_tag(@work.default_image.public_filename()) %>

If what you want to display is the thumbnail, just pass the thumbnail identifier (in our case :thumb) to the file:

<%= image_tag(@work.default_image.public_filename(:thumb)) %>

And that should be it really. If you have questions, leave a comment.

Note:
I found a small bug in the plugin. It was not storing resized image sizes properly. I had to add edit the vendor/plugins/attachment_fu/lib/technoweenie/attachment_fu/processors/image_science_processor.rb file and set the correct size just after the image is saved in the resize_image method:


...
img.save self.temp_path
self.size = File.size(self.temp_path)
...

I also noticed that for images that do not need to be resized, something is done as the size of the images changes, although the dimensions remain the same. I have a file of 5KB that has a size of 12 KB after the resizing process!!! The size of the image is the same and it should have not been modified. Not sure what is going on here but I guess this is an ImageScience issue.

When a request asking for an action in a controller that does not exist in your application a not found error page is displayed. You can actually use routes to redirect this requests to the a default page.

Just add the following line the last rule in your config/routes.rb file:

map.connect ‘*path’, :controller => ‘main’, :action => ‘redirect_to_default’

Whenever a request asking for an action in a controller that you have not defined hits your application, rails will call the action ‘redirect_to_default’ in the ‘main’ controller (you can obviously change the controller and the action to fit your needs).

The code for the redirect_to_default action is a simple rails redirect:

def redirect_to_default
  redirect_to :action => 'index'
end

If you want to pass specific options like the table type and the charset to use when creating tabled though Rails migrations, pass an options parameter to the create_table method:

create_table :my_table, :options => 'ENGINE=InnoDB DEFAULT CHARSET=utf8', :force => true do |t|
t.column :column1, :string
t.column :column2, :string
end

If you want to a validation rule to be applied only when the corresponding attribute has a value, you can use the :allow_nil => true parameter. I don’t know if this works on any Rails model validation rule, but it might well be. The Rails documentation at http://api.rubyonrails.org/ is not very clear in that respect. For each validation rule it lists the parameters it accepts and for the validates_format rule the :allow_nil parameter was not listed. It is listed at the begining as a default parameter for all validation rules though.

In my case I wanted a column in the table to either be null or have a value that matched a particular regular expression, so I ended up with something like

validates_format_of       :email, :with => /^([^@\s]+)@((?:[-a-z0-9]+\.)+[a-z]{2,})$/, :message => ‘Invalid Email address’, :allow_nil => true

When you install a Rails application you need to initialize the DB tables where the data will be stored.

First, you need to create the database and give the correct privileges to a user to access the application. With mysql you will do the following:

  1. mysql -u root
  2. mysql> create database my_database;
  3. mysql> grant all privileges on my_database.* to my_user@localhost identified by ‘my_password’;
  4. mysql> flush privileges;

Then set this parameters in the rails application

  1. Edit the database.yml file in your application config folder
  2. You can configure three different environments there: development, test and production. Change the values for your environment:
    1. adapter: mysql
    2. database: my_database
    3. username: my_user
    4. password: my_password
    5. host: localhost
    6. socket should be set to the value specified  in the mysql configuration file. In Ubuntu this file is in /etc/mysql/my.cnf. In the [mysqld] section look for value of the socket variable, which in my system is /var/run/mysqld/mysqld.sock

Once your are done with the set up, creating the appropriate tables is damn easy with Rails migrations:

  1. cd into your Rails application folder
  2. Run ‘rake db:schema:load’
  3. You may also want to run ‘rake db:migrate’ in case you don’t have the latest version of the schema.rb file which sometimes  happens because developers forget to commit the changes.

You are done!

Note: In order to allow Rails to connect to the mysql database I had to enable the old password scheme in the mysql configuration file (/etc/mysql/my.cnf):

 old_passwords   = 1