Often, Less is Better!

When I was learning Ruby, I was searching for things that could be improved. I was looking for ways I could make things better. One of the areas I was involved in was testing. This should come as no surprise. If you are writing Ruby code, you need to be writing tests. The exact order in which this is done is not my point. Just that tests must be written and run frequently. Here is a typical test run using the minitest facility:

Standard Test Output

To my mind, this was not communicating that true progress of the tests. Consider: What version of minitest was in use? What files were being processed? How many tests were in each of those files? Most of all, why does this output look so boring?

Yes; boring! So being bored I thought: Surely I can do better! I came up with a gem I call minitest_visible to spruce things up. Here’s what its output looks like:

Enhanced Test Output

Much better! Clearer! Nice progress! and so entertaining? Well at least I thought it was a big improvement. It would seem that not many people shared my view. I can see their points. Notice that the bottom line is the same in both cases:

21 runs, 92 assertions, 0 failures, 0 errors, 0 skips

This result line is what counts. These are tests, not video games. They should test the code without a lot of bells and whistles and let you get on with the real work.

I was also told that my little gem did not fit in with the minitest ecosystem. I asked what was meant by this but I never got a reply. I suppose somebody was channeling their inner Linus Torvalds of rudeness that day. I’ll never know. It also does not matter.

I have finally come to realize that the critics were right. The best answer is to keep things as simple and lean as possible. No being fancy; no showing off. To that end I have worked to remove the minitest_visible augmentation from almost all of my work.

Affected gems are: composite_rng, counted_cache, fibonacci_rng, flex_array, format_engine, full_clone, full_dup, fully_freeze, in_array, insouciant, lexical_analyzer, make_gem, mini_erb, mini_readline, mini_term, mysh, parse_queue, pause_output, safe_clone, and safe_dup. The exception is the fOOrth gem experiment that has so many test files that it would be hard to sort out what was happening without a little help. At least for now.

Now these gems can be used by others without the hassle of having to install a non-standard testing extension.

Is less more? I’m not sure. Is less better? Very often, yes it is.

Your truly

Peter Camilleri (aka Squidly Jones)

Updating Dependencies

There are things you don’t want to see when you review your code. Here’s one:

Potential Security Vulnerability Warning Alert

What’s worse, this message was hard to track down. There was no way (that I found) to have GitHub tell me about all the code that was affected. I’d have to check each one manually until I had found and fixed them all.

I had several things going for me. The alerts were all caused by one thing; the discovery of a bug in the “rake” gem. Here’s one version of the bug report I found:

It was discovered that Rake incorrectly handled certain files. An attacker could use this issue to possibly execute arbitrary commands.

USN-4295-1: Rake vulnerability

Now, in theory, one could do an in depth analysis of this bug and conclude that it has no impact on your code. I do not recommend this approach. I would not have confidence that I could see all the devious ways this fault could be exploited. Further, I know I cannot possibly predict every move by the criminals. That is why I will now focus on ensuring that my code specifies a version of rake that does not have this defect. That is we need to specify versions of rake with version greater than or equal to 12.3.3.

I write Ruby Gems and for those gems, dependencies are specified in the <gem_name>.gemspec file. Let’s see some rake dependency entries and see how they stack up to our security requirement:

spec.add_development_dependency “rake”, “~> 10.0”

This was the dependency entry that generated my alert messages. To translate into plain English, it says that any version of rake is OK, so long as it was version 10.something. The last such release was 10.5.0 which clearly has the bug and thus generates a warning. This entry clearly must be replaced.

Now many of my Ruby Gems had the following entry that did not result in a warning, but still leaves something to be desired:

spec.add_development_dependency “rake”, “~> 12.0”

This entry says to accept any version of rake so long as it was version 12.something. The last such release was 12.3.3 which is clear and free of the bug! So, yes, this is much better, but it suffers from the fact that it limits rake to version 12. Version 13 is already out and forcing the use of an older version may not be a good idea. I chose to replace these entries as well.

So, we now examine the simplest and most permissive specification:

spec.add_development_dependency ‘rake’

This translates to anythings goes, but use the very latest if it’s not already installed. This is very simple, it generated no warnings, and it can get the latest. It will also settle for much older versions, including those with the bug. This too must be updated.

This is the entry recommended for use by the alert:

spec.add_development_dependency “rake”, “>= 12.3.3”

This translates to “Anything from 12.3.3 or later”. This ensures that we will not use older versions of (buggy) code, and that we won’t exclude newer versions of code either.

Now there may be legitimate reasons to stick to older code. I know there are working web sites using ancient versions of rails running on equally old versions of ruby. Even so, there are (too many to list here) real risks to running obsolete code. If at all possible, keep up-to-date. It may require some effort, but it is worth it. Test your code with newer versions before pushing them out into the wild. If problems/issues arise, fix them and update your dependencies. This usually can be accomplished in a modest amount of time. If it can’t then you need to lock in the obsolete code dependencies and take a serious look at re-engineering your application.

In the end, everything needs maintenance. The reason we all avoid gems that have gone many years with no updates is that we know that nobody is tending house. I was caught by surprise and it is hard to anticipate what the next disruption will be, but by defensively coding, the risk can be reduced.

Finally let me take a moment to list which of my gems needed to be updated due to this issue. If you have a current, safe version of rake installed, you need do nothing even if you are using these gems: composite_rng, connect_n_game, counted_cache, fibonacci_rng, format_engine, format_output, full_clone, full_dup, fully_freeze, in_array. insouciant, lexical_analyzer, make_gem, mini_erb, mini_readline, mini_term, mysh, parse_queue, pause_output, ruby_sscanf, safe_clone, safe_dup, test65, and vls. In addition, fOOrth, games_lessons, and rctp were updated but the changes are currently part of unreleased code.

Yours Truly

Peter Camilleri (aka Squidly Jones)

Hidden Traps: String Mutations

In the world of Sci-Fi movies and video games, mutants are not always a bad thing. In fact, sometimes, the mutants are the hero or the key to the hero’s success. On the other hand, no matter how good their intentions, trouble always seems to follow them.

In the world of programming, data can be mutated as well and it attracts trouble there too. Data is mutated when its internal state is modified. The alternative is data that is immutable. This means that the internal value cannot be modified, it can only be replaced with a new value.

Let’s see an example of immutable data in action first:

a=42
b=a
a+= 4
puts a   #Displays 46
puts b   #Displays 42

This is the safe, sane case. Changes to the variable a do not affect the variable b. Now, in Ruby, strings are mutable. Let’s see how this plays out.

a=”hello”
b=a
a << ” world”
puts a    #Displays: hello world
puts b    #Also displays: hello world

In this case, the string data in a and b was mutated. The change to a was expected. The change to the seemingly unrelated variable b could be a nasty surprise and a source of bugs. I can say that this very issue is without doubt the number one source of trouble in my own code.

The sensible question to ask is: Why are string mutated at all? Why not always protect the programmer from this issue? The answer is performance. Creating new instance of strings all the time is wasteful of resources and places a heavy load on the memory management system. Let’s see how this issue is handled by two leading languages: C# and Ruby.

C#

In C#, the mutation of strings issue is handled by giving the programmer choice. The default, goto class for strings, namely the String class, is fully safe and immutable. Thus for simple programming tasks and less experienced programmers, the obvious choice also is the easiest to get right. Maybe not super-fast, but the expected results.

When (and where) performance issues do arise, the StringBuffer class exists that can be more efficient because is allows data mutation. In fact, it embraces it with special mutating methods, not part of the String class. Since a conscious choice must be made to allow and use mutation, hopefully, the programmer will focus there attention on those areas, thus avoiding nasty surprises.

Ruby

In contrast, Ruby (currently) allows string mutation by default. This means that any string at any time could be mutated. This causes a lot of problems. There are some possible answers:

  • When assigning a string, clone it. A statement like: a = b.clone breaks the connection between a and b by making a full copy of the string. It works, but this can be slow.
  • When creating a string, freeze it. This looks like: b = “Hello”.freeze and now any mutation will raise an error that reveals the bug at the point it first occurs. This is not slow, but requires thorough testing to avoid exceptions in the field. You also need to remember to use the freeze in the first place.
  • In upcoming versions of Ruby (version 3 I believe), strings will be frozen by default. You will now need special syntax to create mutable strings. This will look (probably) something like b = String.new(“Hello”). This suffers from the fact that it is likely to break a lot of older code.

These two languages show two divergent approaches to the issue of string mutation. Now I really like Ruby, and C# is a pretty decent language too. On this issue though, C# looks like the clear winner here. Safe by default, gentle on programmers, efficient where needed, and above all, not breaking older code. To me it’s a slam-dunk!

What are your thoughts? Does another language do an even better job? I’d love to hear from you, so chime in with a comment, idea, or objection!

Yours Truly

Peter Camilleri (aka Squidly Jones)

Article Update: April 15, 2016

I am pleased to announce the release of the fOOrth 0.6.0 language system. The major change in this version is to split the String class into an immutable String class and a mutable StringBuffer class. Even though fOOrth is based on Ruby, the underlying architecture is more than flexible enough to accommodate this change.

Hidden Traps: Floating Point Math.

As a life long programmer, I’ve programmed a lot of calculations over the years. Yet, I was recently astounded to hear expressed the opinion that “Computer computations are error free and flawless”. The reasons that this is not (and never will be) the case are fairly deep and fundamental.

Mathematics routinely deals with matters of the infinite like the infinite digits in the value of the famous irrational number π (called pi). Engineers and computers on the others hand live in a world of the finite: finite memory, finite processing power, and finite time. The IEEE standards for computer arithmetic are marvels of ingenuity. They are also a compromise between the needs of speed, storage, and accuracy.

Let’s see a real example and test the associative property of addition. This crucial, fundamental rule of math says that:

(a+b)+c = a+(b+c)

Now lets try to program this. We’ll use Ruby for brevity, but the results would be the same for virtually any programming language that uses the computer’s floating point hardware.

a = 1.0e20
b = -1.0e20
c = 1.0

puts “(a+b)+c = #{(a+b)+c}”
puts “a+(b+c) = #{a+(b+c)}”

The output of this little snippet of code is:

(a+b)+c = 1.0
a+(b+c) = 0.0

The two results should be the same but they’re not. In one case, the use of a finite amount of memory to represent the numbers has caused an important “bit” of data to fall off the “end” and be lost. Addition in computers is not always associative.

Are there ways around this? Sure, you can improve things. In Ruby I can switch from Floats to Rational data, and this problem mostly goes away. The new program will consume more memory and take longer to run. Eventually, if the required precision becomes excessive, the program will slow to a crawl and/or run out of memory. Processing power and memory remain finite no matter how sophisticated the tools we use.

The moral: Know your math, yes, but realize that computers can’t do math, they can only approximate it! So know the limitations of your tools too.

Yours Truly

Peter Camilleri (aka Squidly Jones)