Benchmarking various methods of accumulating a set / unique array in ruby
I recently benchmarked a few ways to accumulate a set in ruby. Here are my findings and methodology - apologies for messiness, I'm just sharing what I came up.
This is with ruby 3.1. I also ran under 2.7, I believe the set case was actually faster, but I may be misremembering.
Results
Time
user system total real
set 0.095938 0.000404 0.096342 ( 0.096341)
array after 0.063998 0.002452 0.066450 ( 0.066508)
array in place 17.238500 0.939414 18.177914 ( 18.188532)
Memory Allocations
set 28.848k memsize ( 0.000 retained)
3.000 objects ( 0.000 retained)
0.000 strings ( 0.000 retained)
array after 8.040k memsize ( 0.000 retained)
1.000 objects ( 0.000 retained)
0.000 strings ( 0.000 retained)
array in place 8.072B memsize ( 0.000 retained)
2.000M objects ( 0.000 retained)
0.000 strings ( 0.000 retained)
Comparison:
array after: 8040 allocated
set: 28848 allocated - 3.59x more
array in place: 8071798800 allocated - 1003955.07x more
Process Memory
array_after
before (base): 42.95
memory above base:
largest during iteration: 869.63
average during iteration: 433.52
after final array is produced: 877.47
set
before (base): 43.41
memory above base:
largest during iteration: 865.65
average during iteration: 432.5
after final array is produced: 865.65
array_in_place
before (base): 42.98
memory above base:
largest during iteration: 3312.82
average during iteration: 1546.41
after final array is produced: 593.1
Discussion
Time
Array in place is very very slow, the others are very fast and in practice probably identical for most workloads, with array unique being the fastest, perhaps because it does less redundant work, at the expense of memory usage.
Memory Allocations
Array in place allocates 1 million times more objects than set because it needs to create the wrapper" array for each element. In practice the total impact on process memory would not be this much because GC would frequently run (which adds cost and is an additional reason this is a bad solution).
Set uses 3.6x more allocations than array after, I don't have a theory why, didn't look at the code.
Process Memory
I tried a variety of things but wasn't able to get consistent or sensible results out of this. Lack of control of the ruby memory allocator made it difficult. Hopefully allocations measurement above is sufficient to compare memory use.
Conclusions
array_after is the most common approach and is also the fastest and is probably Fine for most applications.
Set uses more total memory AND is slower than array_after. This does not make sense to me, I would have expected memory to be lower. Maybe benchmark-memory doesn't actually provide peak memory?
array_in_place uses the most memory and is also the slowest.
Of course array_after might use much much more memory than array_in_place depending on the data / application. (if building up a 100megabyte array and then after uniq!
it's only 0.1 megabyte, then it's using much more memory in the loop than it ultimately needs).
Code
require 'set'
# so we can control when GC happens, which I believe is:
# - valuable for the time measurements being more consistent, and realistic for a post-warmup app
# - irrelevant for the allocations measurements
# - chaotic for the process memory measurements
GC.disable
@data = []
1_000_000.times{ @data << rand(1..1_000) }
# Things we are not able to measure
# - average and max memory or allocation usage while uniq! is running
# - average and max memory or allocation usage while |= [d] or << d are happening (hopefully woudn't change conclusions... but maybe!?)
# - while @s.to_a is happening
############################
# The Three Implementations
#
# each ends with nil because without that, benchmark showed slight variations
# in certain scenarios when not expected (such as whether or not doing to_a at end of set)
# i'm guessing because under the hood benchmark is doing somethign with the return result, such as
# inspect
############################
# Typical simple implementation: build up an array, then call uniq on it afterward
def array_after
a2 = []
@data.each do |d|
yield if block_given?
a2 << d
end
a2.uniq!
nil
end
# Build in Array, unique on each iteration
# Array elements remain unique on each iteration, theoretically has
# total memory benefits over uniqueing after. But, cost of creating an Array on each iteration comes into play.
def array_in_place
a = []
@data.each do |d|
yield if block_given?
a |= [d]
end
nil
end
# Use Set, which is made for just this purpose
def set
s = Set.new
@data.each do |d|
yield if block_given?
s << d
end
set_a = @s.to_a
nil
end
#######
# Time
#######
puts "\nTime"
require 'benchmark'
5.times{GC.compact}
Benchmark.bmbm(15) do |x|
x.report("set") do
set
end
x.report("array after") do
array_after
end
x.report("array in place") do
array_in_place
end
end
#####################
# Memory Allocations
#####################
puts "\nMemory Allocations"
GC.enable
require 'benchmark-memory'
GC.disable
5.times{GC.compact}
Benchmark.memory do |x|
x.compare!
x.report("set") do
set
end
x.report("array after") do
array_after
end
x.report("array in place") do
array_in_place
end
end if false
exit
#################
# Process Memory
# To run these, i disabled all above tests, and only ran one at a time, sometimes with
# RUBY_GC_HEAP_GROWTH_FACTOR=1.1 ruby benchmark.rb
#################
puts "\nProcess Memory"
def print_mem_stats(m)
mem = GetProcessMem.new
mem_samples = []
puts "\n#{m}"
5.times{GC.compact}
base = mem.mb.round(2)
self.send(m){ mem_samples << mem.mb }
after = mem.mb.round(2)
puts "before (base): #{base}"
puts "memory above base:"
puts "\tlargest during iteration: #{(mem_samples.max - base).round(2)}"
puts "\taverage during iteration: #{((mem_samples.sum(0.0) / mem_samples.size) - base).round(2)}"
puts "\tafter final array is produced: #{(after - base).round(2)}"
end
require 'get_process_mem'
5.times{GC.compact}
print_mem_stats(:array_after)
print_mem_stats(:set)
print_mem_stats(:array_in_place)
puts @data.count # to trick GC into not thinking it's done, so that "after above base" comparison is more meaningful
# 5.times{GC.compact}
# puts "cleaned up: #{@mem.mb.round(2)}"
# puts "equal? #{@set_a.sort == @a.sort && @a.sort == @a2.sort}"