Measuring Sidekiq with Librato
Here's what I recently put together so I could measure some Sidekiq behavior with Librato. This code makes use of the librato-metrics gem.
Measure the time taken to perform each type of job
I put this in my lib directory:
class SidekiqMetrics
def call(worker, msg, queue)
librato_queue = Librato::Metrics::Queue.new
librato_queue.time "sidekiq.execution_time", source: worker.class.to_s.underscore.gsub("/","-") do
yield
end
librato_queue.submit
end
end
And then added the middleware in config/initializers/sidekiq.rb:
Sidekiq.configure_server do |config|
# ...other config...
config.server_middleware do |chain|
chain.add SidekiqMetrics
end
end
Keeping track of queue size
I have a script which runs every hour that reports various app stats to various services. I stuck this in there:
librato_queue = Librato::Metrics::Queue.new
%w[auth_mailer mailer default low].each do |queue|
librato_queue.add "sidekiq.queue_size" => {value: Sidekiq::Queue.new(queue).size, source: queue}
end
librato_queue.submit
Measuring only once per hour doesn't really help understand anything interesting about app behavior. My initial purpose with this is to just keep track of outlier situations so I can set up a trigger and be notified if the queue is growing abnormally large. An alternative approach, and one that would be very easy and not incur much resource strain, would be to report the queue once per job in the same middleware shown above to report the execution time. A drawback with this approach is that in a situation where sidekiq workers are broken and not processing any jobs, the queue size would not be reported, defeating the entire purpose of being notified about problem situations. I could report the size in both the middleware and in the hourly script, but for the time being this redundancy rubs me the wrong way.
In the future if I have a more nimble scheduled job system (I'm currently using Heroku's clunky scheduler, then I'll report the stats every minute or ever second on schedule.