Saturday, June 11, 2016

Ruby Iterators, Enumerators, Enumerable, and Loops

Iterators


Iterator methods are available in collection objects such as arrays and hashes. The most widely used iterator method is each.


[ 1, 2, 3 ].each { |n| puts n }
Output:

1
2
3

In the examples above, the each method is called on an array. It takes a block as an argument and runs the code within the block on each element of the array. At each iteration, the value of n (which is passed to the block as an argument) corresponds to one item of the array. The code inside the block will print each array value as it is received from the each method. Instead of printing the array values, we could do a number of other things within the block. In other words, the iterator's job is to deliver each array item to the block and the block contains the code that will run on these items.

To iterate means to do the same thing multiple times. However, in Ruby, the term iterator is used in different ways. In this post, we will call iterator any method that expects a block and iterates (loops) through items in a collection.

As explained in this post about blocks, the { } is interchangeable with do..end. The above example can also be written like this:


[ 1, 2, 3 ].each do|n|
    puts n
end

Number (integer) iterators


The Integer class provides some useful numeric iterators. We won't go into details here as their names are pretty self-explanatory. The most widely used are the following:


3.times {  print "hello " }  # Output: hello hello hello

3.upto(10) { |n| print "#{n} " }  # Output: 3 4 5 6 7 8 9 10

10.downto(3) { |n| print "#{n} " }  # Output: 10 9 8 7 6 5 4 3

3.step(10, 2) { |n| print "#{n} " }  # Go from 3 to 10 in steps of 2. Output: 3 5 7 9

Note that calling the each and reverse_each methods on a range (an instance of the Range class) will yield the same result as the upto and downto methods. Example:


r = (3..10)

r.each { |n| print "#{n} " }  # Output: 3 4 5 6 7 8 9 10

r.reverse_each { |n| print "#{n} " }  # Output: 10 9 8 7 6 5 4 3

String Iterators


These method names are also self-explanatory.


s = "What is \nthe sound \nof silence?"  # Notice the line breaks (\n).

each_char

s.each_char { |x| puts x } 
Output:

W
h
a
t

i
s
# output truncated here

each_line

s.each_line { |x| puts x }
Output:

What is
the sound
of silence?

Strings also provide iterator methods called each_byte and each_codepoint.

We can also iterate through the letters of the alphabet using a range object.


("A".."E").each { |x| puts x }
Output:

A
B
C
D
E

Array Iterators


The iterator methods covered in this section are defined within the Array class. The Enumerable module, which is explained further below, provides dozens of other iterator methods which can also be used on arrays.


a = [ "zazen", "kinhin", "koan" ]

a.each { |x| puts x }
Output:

zazen
kinhin
koan

The each_index method is like each but will return the item's index instead of its value.

a.each_index { |x| puts x } 
Output:

0
1
2

Hash Iterators


The following iterators are defined within the Hash class. The Enumerable module (discussed below), offers many more iterator methods which can also be used on hashes.

Let's create a test hash to use on the following examples:


h = { "meditation": "zazen", "time": 40, "posture": "kekkafuza" }

Iterate through the hash keys with each_key:


h.each_key { |key| puts key }
Output:

meditation
time
posture

Iterate through the hash values with each_value:


h.each_value{ |key| puts key }
Output:

zazen
40
kekkafuza

Iterate through the hash keys and values:


h.each { |key,value| puts "#{key}: #{value}" }
Output:

meditation: zazen
time: 40
posture: kekkafuza

The each_pair method is an alias to the each method. Let's confirm:


Hash.instance_method(:each) == Hash.instance_method(:each_pair)  # Output: => true
 

The Enumerator class


Most iterator methods rely on the each method under the hood. Hence, it is useful to learn how it works.

When the each method is called and no block is provided, it returns an enumerator object, which is an instance of the Enumerator class.


e =  [ 1, 2, 3, 4, 5 ].each  # Output: => #<Enumerator: [1, 2, 3, 4, 5]:each>

Let's look at the methods provided by the Enumerator class:


Enumerator.instance_methods(false)
 => [:size, :each, :next, :rewind, :with_index, :with_object, :next_values, :peek_values, :peek, :feed]
 

Notice that the false flag was passed to instance_methods so only the methods implemented within the Enumerator class are shown (inherited methods are omitted).

Let's test some of the methods listed above:


e.size  # Return the number of items in the enumerator
 => 5
e.next  # Return the next item and move the internal position forward
 => 1
e.next
 => 2
e.peek  # Return the next item without moving the internal position
 => 3
e.next
 => 3
e.rewind  # Set the internal position to the first item
e.next
 => 1
 

As seen in the above example, enumerator objects provide an easy way to iterate through a collection, such as an array or hash.

Enumerators provide internal and external iteration. Internal means an iterator method will "drive" the iteration. An example is how the each or map methods automatically yield each item in a collection to a block of code. External iteration is when we control the iteration, as in the above example where we called next to get the next value in the collection and so on.

The Enumerable Module


In Ruby, almost everything is an object, a couple of relevant exceptions are methods and blocks. An object is an instance of a class. Hence, all arrays are instances of the Array class; hashes are instances of the Hash class and so on.

Collection classes such as Array, Hash, and Range provide the each method (explained above), which yields the values in the collection, one by one. There is a built-in module called Enumerable, which is used as a mixin by collection classes. It provides multiple methods for working with collections. Every time we create an array or a hash, the methods provided by the Enumerable module are available. These additional methods rely on the each iterator implemented in the corresponding collection class.

If we create a custom class and include a custom method called each, we can use the Enumerable mixin to add extra collection-related functionality. However, this is outside the scope of this post.

We can easily confirm that the most popular collection classes use the Enumerable mixin:


Array.included_modules # Output: => [Enumerable, Kernel]
Hash.included_modules # Output: => [Enumerable, Kernel]
Range.included_modules # Output: => [Enumerable, Kernel]

The full list of methods provided by Enumerable is as follows:


Enumerable.instance_methods(false).sort
 => [:all?, :any?, :chunk, :chunk_while, :collect, :collect_concat, :count, :cycle, :detect, :drop, :drop_while,
 :each_cons, :each_entry, :each_slice, :each_with_index, :each_with_object, :entries, :find, :find_all, 
 :find_index, :first, :flat_map, :grep, :grep_v, :group_by, :include?, :inject, :lazy, :map, :max, :max_by, 
 :member?, :min, :min_by, :minmax, :minmax_by, :none?, :one?, :partition, :reduce, :reject, :reverse_each, 
 :select, :slice_after, :slice_before, :slice_when, :sort, :sort_by, :take, :take_while, :to_a, :to_h, :zip]

This post covers the most commonly used methods from the above list.

Class implementations of methods provided by the Enumerable module


As seen above, collection classes include the methods provided by the Enumerable module. However, some classes provide their own implementations of some of these methods. When a module is included in a class as a mixin, any methods defined in the class will not be overwritten by methods of the same name defined in the module. Looking into this helps us understand why some Enumerable methods behave differently when called on different objects (which are instances of different classes).

An example is the select method, provided by the Enumerable module. Both Array and Hash classes use their own implementations. In contrast, the Range class uses the implementation from Enumerable.


Array.instance_method(:sort)  # Output: => #<UnboundMethod: Array#sort>
Hash.instance_method(:sort)  # Output: => #<UnboundMethod: Hash(Enumerable)#sort>
Range.instance_method(:sort)  # Output: => #<UnboundMethod: Range(Enumerable)#sort>

Methods provided the Enumerable module


We will use arrays as examples because of their simplicity, but the methods below can also be used on hashes and other collection objects.

Loop through indexes and values


The each_with_index method yields each item's index number and its corresponding value.


a = [ "zazen", "kinhin", "koan" ]

a.each_with_index { | value, index | puts "#{index} #{value}" } 
Output:

0 zazen
1 kinhin
2 koan

Loop through items and modify them


What if we need to modify the values of collection items? The each method seen above cannot do that, but the map method can. That is one of the most important and widely used iterator methods in Ruby. It passes each collection item to the block and returns a new collection containing the values returned by the block.


a.map { |x| x.upcase }  # Output: => ["ZAZEN", "KINHIN", "KOAN"]

We can also pass a method reference (as a symbol) to the map method, instead of a block. The result is the same as if we passed a block and, within the block, applied that method to each item in the collection. The above example can be rewritten like this:


[ "zazen", "kinhin" ].map &:upcase  # Output: => ["ZAZEN", "KINHIN"]

A longer explanation is available at the "Ampersand and object (&:method)" section of this post about methods.

The map! method is the same as map, except it will alter the collection items in-place instead of returning a new collection. In other words, it passes each collection item to the block and replaces its original value with the value returned by the block.

The collect and collect! methods are aliases to map and map!, so they can be used interchangeably. Let's verify:


Array.instance_method(:map) == Array.instance_method(:collect)  # Output: => true
Array.instance_method(:map!) == Array.instance_method(:collect!)  # Output: => true

Test if all items meet specific criteria


all?


[ 2, 4, 6 ].all? { |x| x.even? }  #Output: => true

If a block containing criteria is not provided, checks if all collection values are truthy. Remember, in Ruby only nil and false (boolean) are "falsy". This is useful to test a collection for nil values.


[ 1, 2, nil, 5 ].all? # Output: => false

Search collection (find items that meet specific criteria)


any?


[ "foo", "baar" ].any? { |x| x.length > 3 } # Output: => true

If a block containing criteria is not provided, checks if there are any truthy values in the collection.


[ nil, false, "foo" ].any? # Output => true

none?

Opposite of any?. Returns true if the block returns false for all elements in the collection. If a block is not provided, returns true if all values in the collection are falsey (false or nil).

include?

Returns true if the collection includes the value provided as an argument.


["zazen", "shamata", "tonglen" ].include?("tonglen")  # Output: => true

The member? method is an alias to the include? method, at least in the Range class:


Range.instance_method(:include?) == Range.instance_method(:member?)  # Output: => true
 

Count items matching criteria


count


a = [ "foo", "bar", "baz", "foo" ]
a.count("foo") # Output: => 2

a = [ 2, 5, 6, 8, 12 ]
a.count {|i| i.even?} # Output: => 4

Count occurrences of collection items


What if we want to count the occurrences of each item in a collection? In other words, to generate a list of unique values and count the occurrences of each value in the collection? There are several ways to do this, twelve of them are discussed and tested in this very informative post @ Carol's 10 Cents blog. I have chosen a couple of approaches to include here, based on simplicity and performance.


arr = [ "foo", "bar", "baz", "foo", "foo", "baz" ] # Set up test array

Solution 1 (best performance):


Hash[arr.group_by(&:itself).map {|key, value| [key, value.size] }]
# Output: => {"foo"=>3, "bar"=>1, "baz"=>2}

The itself method was introduced in Ruby 2.2; it returns the object it was called on. It is useful mostly for chaining methods.

Solution 2 (easy to understand, slight loss in performance):


count = Hash.new 0; arr.each { |arr_value| count[arr_value ] += 1 }; count
# Output: => {"foo"=>3, "bar"=>1, "baz"=>2}

By default, when we try to access an inexistent key in a hash, it returns nil. By providing 0 as an argument to Hash.new, it returns 0 instead.


h = Hash.new; h["foo"] #Output: => nil
h = Hash.new 0; h["foo"]  #Output: => 0

In the above example, every hash key corresponds to the value of an array item. When we encounter a value for the first time (in the array), we try to find an item with the corresponding key in the hash. However, it does not exist yet, so the hash returns 0. Then, the new hash key is created and its default value (0) is incremented by 1. Every time an array value with a matching hash key is found, the value of the corresponding hash item is incremented.

Select (filter) multiple collection items


Non-destructive selection:

These methods return a new collection containing only the selected items. None of them will modify the original collection.

select

Returns a new collection containing the items that meet the criteria defined within the block. In other words, the items for which the block returned true.


arr = [1, 2, 3, 4, 5, 6, 1, 2, 3, 8]
arr.select { |a| a > 3 }  # Output: => [4, 5, 6, 8]

The difference between the find_all and the select methods is that find_all will always return an array, regardless of the type of object it was called on. That can be demonstrated by calling both methods on a hash:


h = { a: 1, b: 2, c: 3 }

h.find_all { true } # => [[:a, 1], [:b, 2], [:c, 3]]
h.select { true } # => {:a=>1, :b=>2, :c=>3}

reject

Inverse of select. It returns the items for which the block returned false.


arr = [1, 2, 3, 4, 5, 6, 1, 2, 3, 8]
arr.reject { |a| a < 3 } # Output: => [3, 4, 5, 6, 3, 8]

grep

Uses the === operator to return a new collection containing the items whose values match the given expression. The === operator is explained in this post about operators.


a = [ "foo", 3, "bar", 7, "baz", 10, "qux" ] 

a.grep(/ba/) # Output: => ["bar", "baz"]
# is equivalent to:
a.select { |x| /ba/ === x }

Using the === operator under the hood makes grep powerful and flexible.


a = [ "foo", 3, "bar", 7, "baz", 10, "qux" ] 
a.grep(1..8)  # Output: => [3, 7]
a.grep(Integer) # Output: => [3, 7, 10]
a.grep(String) # Output: => ["foo", "bar", "baz", "qux"]

The grep method may also take a block. It yields each matching item to the block and a new array containing the block's output is returned. That is useful for applying operations only to items whose values match the regex.


a.grep(/ba/) { |x| x.upcase } # Output:  => ["BAR", "BAZ"]
# is equivalent to
a.select { |x| /ba/ === x }.map { |x| x.upcase }

grep_v

Reverse grep. Returns a new array containing the items that do not match the pattern.


[ "foo", "bar", "baz", "qux" ].grep_v(/ba/) # Output: => ["foo", "qux"]

drop_while and take_while

The drop_while and take_while methods are similar to reject and select, except they will stop looking once the first item that meets the specified criteria is reached.

There are cases when these methods offer a big performance increase compared to select and reject. For instance, if we have an array consisting of a sorted temperature range from -50 to +50 with 100.000 items in between and we want to take or drop all items with values below or above a certain threshold; take_while and drop_while will yield much better performance as they will stop evaluating once the specified threshold (temperature) is reached.


arr = [1, 2, 3, 4, 5, 6, 1, 2, 3, 8]
arr.take_while { |a| a < 4 }  # Output: => [1, 2, 3]

Notice that once it reached the first occurrence of a number < 4, it stopped looking. It didn't evaluate the following items, so the values 1, 2, 3 in the second half of the array were not returned.


arr.drop_while { |a| a < 3 }  # Output: => [3, 4, 5, 6, 1, 2, 3, 8]

Notice how the numbers 1, 2 and 3 in the second half of the array were not dropped.

Destructive selection

These methods will modify the collection in-place. Use with caution.

The select! and reject! methods are similar to select and reject. The only difference is they alter the original array instead of returning a new one.

The delete_if method is similar to reject! and keep_if is similar to select!.

Find (detect) item in collection


find

Returns the first value for which the expression within the block is true.


[ 1, 3, 6, 8, 10 ].find { |n| n > 5 } # Output: => 6

The detect method is an alias to find.


Array.instance_method(:detect) == Array.instance_method(:find)  # Output: => true

find_index

Same as find, except it returns the index of the item instead of its value.


[ "foo", "foo", "bar", "baz" ].find_index { |x| x.include? "ba" } # Output: => 2

Return the first or last (n) collection items


first


a = [ "foo", "bar", "baz" ]
a.first   # Output: => "foo"

First can also take an argument and return the first n items:


a.first(2) # Output: => ["foo", "bar"]

take

Yields the same result as the first method when used with an argument. The differences between them are: a) unlike first, take requires an argument; b) Enumerator::Lazy provides a lazy version of the take method, but not the first method, so first is always greedy. Lazy enumerators are explained further below.

last (implemented in the Array and Range classes)

Enumerable does not provide a method to return the last item of a collection. The Array and Range classes provide a method called last which will do just that.


[ "foo", "bar", "baz" ].last   # Output: => "baz"
(1..8).last    # Output: => 8

The Hash class does not provide such method, however, we can convert the hash keys, values or both into an array and use the last method from the Array class:


h = { "a": 1, "b": 2, "c": 3 }
h.values.last  # Output: => 3
h.keys.last  # Output: => :c
h.to_a.last  # Output: => [:c, 3]

Reduce/fold a collection (e.g., sum all items)


reduce and inject

Applies a binary operation (such as sum or division) to each collection item and stores the new value in the accumulator variable (memo). In other words, it reduces the collection to a single item. Hence, the name reduce. In math and some other programming languages, this operation is known as fold.

In the following example, n is the array item and sum is the accumulator.


[ 2, 3, 5 ].reduce { |sum, n| sum + n }  # Output: => 10

In Ruby, classes such as Fixnum (subclass of Integer) and Float provide methods like + (add), - (subtract), * (multiply) and / (divide). Instead of a block, the reduce class can take a reference to one of these methods as an argument. The given method is applied to all items in the collection and the accumulated result it returned.


[ 2, 3, 5 ].reduce(:+)  # Output: => 10
[ 2, 3, 5 ].reduce(:*)  # Output: => 30

The inject method is an alias to reduce. Let's verify:


Array.instance_method(:reduce) == Array.instance_method(:inject)  # Output: => true
 

Lazy evaluation


Ruby has a feature called lazy evaluation. Providing a thorough explanation would probably take an entire post. Briefly, it is an efficient way to get an arbitrary number of values from a very large or infinite collection.

Iterator methods are eager by default. That means they process all collection items before returning anything. In the following example, we attempt to multiply the first ten items of an infinite array by 2. It does not work as it tries to multiply all (infinite) items first and then return the first ten items. It generates an infinite loop.


1.upto(Float::INFINITY).map { |x| x * 2 }.take(10).to_a

Now let's include the lazy method and try again.


1.upto(Float::INFINITY).lazy.map { |x| x * 2 }.take(10).to_a
# Output: => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

By introducing the lazy method, we have created a lazy enumerator, which is an instance of the Enumerator::Lazy class, introduced in Ruby 2.0. Lazy enumerators will only evaluate (process) the required amount of items to generate the desired output.

Lazy enumerators implement lazy versions of many Enumerable methods, as seen below.


Enumerator::Lazy.instance_methods(false).sort
 => [:chunk, :collect, :collect_concat, :drop, :drop_while, :enum_for, :find_all, :flat_map, :force, :grep,
 :grep_v, :lazy, :map, :reject, :select, :slice_after, :slice_before, :slice_when, :take, :take_while, 
 :to_enum, :zip]

It may be easier to grasp the above example by splitting it into two steps:


l = 1.upto(Float::INFINITY).lazy # Output: => #<Enumerator::Lazy: #<Enumerator: 1:upto(Infinity)>>
l.map { |x| x * 2 }.take(10).to_a # Output: => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

Find the item with the maximum or minimum value


The Enumerable mixin provides methods for sorting collection items and finding those with the highest or lowest values. All of them use the Comparable mixin and the <=> (spaceship) operator under the hood, which are both covered in this post about operators.

max

Return the greatest (maximum) value in a collection.


(1..10).max # Output: => 10

Return the three greatest values in descending order.


(1..10).max(3) # Output: => [10, 9, 8]

max_by

Return the greatest value according to specific criteria defined within a block. In the following example, the longest string.


[ "zen", "zazen", "liberation" ].max_by { |x| x.length} # Output: => "liberation"

min

The min method is the inverse of max. It returns the lessest (minimum) value in a collection.

min_by

The min_by method works the same way as the max_by method, except it returns the minimum value in a collection according to specific criteria.

Iterate backwards


Reverse each

We already discussed the each method; reverse_each is the same, except it will iterate from the last item to the first.


[1,2,3].reverse_each { |x| print x }  # Output: 321

Sort items


Most sorting operations use the spaceship (<=>) operator, explained in in this post about operators.

sort

Sorts collection items. Numbers are sorted in ascending order and strings in ascending alphabetical order. Strings beginning with uppercase characters (e.g., Foo or FOO) always come before those beginning with lowercase characters. Example:


[ "foo", "Foo", "Bar", "bar", "qux", "Qux" ].sort
# Output: => ["Bar", "Foo", "Qux", "bar", "foo", "qux"]

Note that it's not possible to sort a collection containing both numbers and strings unless the numbers are stored as strings.


["foo", 2, "bar"].sort
# Output: ArgumentError: comparison of Fixnum with String failed

sort_by

Sort by specific criteria. The block must return a number for each item of the collection. The items will be sorted according to those numbers. In the following example, the items are sorted by length.


[ "liberation", "zen", "zazen",  ].sort_by { |word| word.length}
# Output: => ["zen", "zazen", "liberation"]

Convert collection into an array


The to_a method converts any collection into an array.


{ meditation: "zazen", time: 40 }.to_a  # Output: => [[:meditation, "zazen"], [:time, 40]]
(1..6).to_a  # Output => [1, 2, 3, 4, 5, 6]

The entries method is an alias to to_a, at least in the Range class.


Range.instance_method(:to_a) == Range.instance_method(:entries)  # Output: => true
 

Iterate (loop) through two arrays simultaneously


The zip method is one way to iterate over two arrays at once.


a = [ "zazen", "kinhin", "koan" ]
b = [ "sit", "walk", "contemplate" ]

a.zip(b).each { | x, y | puts "#{x} - #{y}" }
Output:

zazen - sit
kinhin - walk
koan - contemplate

Other loops


The for, while and until loops presented below are the usual way of looping through collections in many programming languages. However, in Ruby they are frowned upon and not used frequently. Most rubyists prefer using the iterators explained above, such as each and map, to loop through collections.

The for loop


For loops call the each method of the collection under the hood, which passes the value of each item to the loop, which in turn assigns the value to the loop variable.

The loop variable and any other variables defined within the loop will remain defined after it ends.


for i in 0..5
   print i
end

# Output: 012345

The while loop


The while loop keeps running while a condition is true (until it becomes false).


n = 0
while n < 10 do  # Run while n is lesser than 10
    print n
    n += 1  # Increment n by 1 at each iteration of the loop
end

# Output: 0123456789

While loops can also be used as modifiers. Modifiers allows us to append a conditional or loop statement (e.g., if, unless, while, until) onto the end of another statement, which will be executed conditionally or as a loop.


n = 0
print n += 1 while n < 10

# Output: 12345678910

Notice that the output of the two examples above is different. In the first example, n is displayed by print; then it's incremented by the n +=1 statement. In the second example, even though the while modifier is positioned after the print n +=1 statement, it evaluates the condition (n < 10) before executing the print statement. However, the value of n is incremented before being displayed by print; that's why it starts at 1 instead of 0. It ends at 10 instead of 9 because when the while condition is evaluated in the last iteration of the loop, the value of n is 9 but, before it is displayed by the print method, it is incremented to 10 by the n +=1 statement.

The until loop


The until loop is the inverse of the while loop. It runs until a condition becomes true (while it's false):


n = 0
until n > 10 do  # Run until n is greater than 10
    print n
    n += 1  # Increment n by 1 at each iteration of the loop
end

# Output: 012345678910

Until loops can also be used as modifiers:


n = 0
print n += 1 until n > 10

# Output: 1234567891011

Both while and until loops will run until the condition for terminating is met. That is a problem when the condition is never met, and the loop runs indefinitely. That is called infinite loop and ir can make the entire system unresponsive. Hence, while and until loops should be used with caution.

14 comments:

  1. Great post. Thank you for your effort!

    ReplyDelete
    Replies
    1. Thank you for commenting. I'm glad you liked it.

      Delete
  2. you forgot very useful method each_with_object

    ReplyDelete
    Replies
    1. I really did forget, thank you for pointing that out. I will include it.

      Delete
  3. Really comprehensive and useful for novices like me, thank you!

    ReplyDelete
    Replies
    1. I'm glad to know it was useful! Thank you for reading.

      Delete
  4. Now i got clear idea about loops, iterators and enumerable ...... you are the master.... Thank you

    ReplyDelete
    Replies
    1. Thank you for your feedback. I'm glad you liked it :)

      Delete
  5. it is very useful. thanks for sharing with us

    ReplyDelete
    Replies
    1. Thank you for reading and providing feedback.

      Delete
  6. This goes into the bookmarks, for sure. You should probably name it something like "Iterators cheat-sheet", because there will be a lot of people looking for that. Thank you for this post. The ruby docs don't do justice to the power of iterators.

    ReplyDelete
    Replies
    1. Thanks for your feedback. It's nice to know the post was useful to you :)

      Delete