Friday, August 27, 2010

Why Scala's "Option" and Haskell's "Maybe" types will save you from null

Cedric Beust makes a bunch of claims in his post on Why Scala's "Option" and Haskell's "Maybe" types won't save you from null, commenters say he doesn't get it and he says nobody has argued with his main points. So I thought I'd do a careful investigation to see what, if anything, is being missed.

First, right off the top here: Scala has true blue Java-like null; any reference may be null. Its presence muddies the water quite a bit. But since Beust's article explicitly talks about Haskell he's clearly not talking about that aspect of Scala because Haskell doesn't have null. I'll get back to Scala's null at the end but for now pretend it doesn't exist.

Second, just to set the record straight: "Option" has roots in programming languages as far back as ML. Both Haskell's "Maybe" and Scala's "Option" (and F#'s "Option" and others) trace their ancestry to it.

Now, to the post.

First Objection

in practice, I find that hash tables allowing null values are rare to the point where this limitation has never bothered me in fifteen years of Java.

Really? Let me give one concrete example. You're writing a cache system (in the form of an associative map) in front of a persistent storage with nullable values. Question: how do you differentiate between "I haven't cached this value" from "I've cached the value and it was null in the database"? The answer is that you can't use "null" to mean both things, it's ambiguous. That may not "bother" Beust, but I find it irritating that you end up having to use two different mechanisms to say this in Java.

With Maybe/Option you just say that the map holds optional values. A fetch result of None/Nothing means you haven't put that key in your map. A result of Some(None)/Just Nothing means you've got the key but it maps to nothing, and a result of Some(value)/Just value means that the key is in the map and that there's a "real" value there in storage.

Save Me!

Back to Beust

The claim that Option eliminates NullPointerException is more outrageous and completely bogus. Here is how you avoid a null pointer exception with the Option class (from the blog):

val result = map.get( "Hello" )
 
result match {
  case None => print "No key with that name!"
  case Some(x) => print "Found value" + x
}

See what's going on here? You avoid a NullPointerException by... testing against null, except that it's called None. What have we gained, exactly?

Here, is, IMHO the heart of what people are arguing with Beust about. He thinks this code is no different from the kind of "if (result == null)" code you write in Java. Sure, identical. Except for one, huge, glaring, major, stab you in the face difference: in Java the type system says Map<T>#get will return a T. It doesn't say it MIGHT return a T. That's left to the documentation where it's out of reach for the compiler. Here in Java.

Map<String, Integer> myStuff = new HashMap<String, Integer()>;
myStuff.put("hello", 42);
int result = myStuff.get("goodbye") * 2; // runtime NPE

Java NPEs when you try to use the result of the fetch. On the other hand, when you use a Scala map you get an Option[T] while Haskell's says it returns a Maybe t. The type system says you can't use the underlying value directly. Here's me trying to pull something from a map and manipulate it directly in both Haskell

mystuff = fromList[("hello", 42)]
result = (lookup "goodbye" myStuff) * 2 -- compile time error

And Scala

val myStuff = Map("hello" -> 42)
val result = (myStuff get "goodbye") * 2 // compile time error

In both cases I get a static error.

Now...there are a lot of (IMHO crappy) debates on static vs dynamic typing. But one thing must be very clear: the two situations are at least different. This isn't the same as a Java NPE (or the equivalents in Ruby/Python/whatever). Option/Maybe tell the static type system that you may or may not have what you're looking for.

Blowups Can Happen

Onward.

The worst part about this example is that it forces me to deal with the null case right here. Sometimes, that's what I want to do but what if such an error is a programming error (e.g. an assertion error) that should simply never happen? In this case, I just want to assume that I will never receive null and I don't want to waste time testing for this case: my application should simply blow up if I get a null at this point.

There is a cavernous hole in Haskell Maybe and Scala Option. Beust's post does not talk about it at all, though. The hole is that Haskell and Scala are Turing complete, and that means that the type system cannot prevent you from throwing optionality out if that's what you really want to do. In Haskell

loopForever :: a
loopForever = loopForever -- really, forever

stripOption :: Maybe t -> t
stripOption Just x = x
stripOption _ = loopForever

-- or better yet
stripOption = fromMaybe loopForever

And Scala

final def loopForever[A] : A = loopForever // and ever, amen

def stripOption[T](o : Option[T]) : T = o match {
  case Some(x) => x
  case None => loopForever
}

// or, better still

def stripOption[T](o : Option[T]) : T = o getOrElse loopForever

In their respective languages language the following compile just fine but will cause non-termination at runtime.

result = (stripOption (lookup "goodbye" myStuff)) * 2
val result = stripOption(myStuff get "goodbye") * 2

Of course, if you really want to write that in real code you would replace "loopForever" with throwing an exception. That's exactly what Scala's Option#get and Haskell's Data.Option.fromJust do.

result = (fromJust (lookup "goodbye" myStuff)) * 2
val result = (myStuff get "goodbye").get * 2

Why did I write a silly loop? To make a point: Turing completeness punches a hole in the type system. "Throw" style operators exploit essentially the same hole. Every programmer should know this in the bottom of their heart.

Whether using the loopForever version or the exception version, stripOption/get/fromJust is different from null dereferencing. It requires an explicit step on the part of the programmer. The programmer says "I know what I'm doing here, let me continue, blow up if I'm wrong." But, IMHO, it's easy enough to say that it kills his objection of "forcing" him to deal with null instead of just blowing up. Yes, there's a hoop to jump through, but it's tiny. KABOOM!

Nullable Types

Next point:

When it comes to alleviating the problems caused by null pointer exceptions, the only approach I've seen recently that demonstrates a technical improvement is Fantom (Groovy also supports a similar approach), which attacks the problem from two different angles, static and runtime:

Static: In Fantom, the fact that a variable can be null or not is captured by a question mark appended to its type:

Str   // never stores null
Str?  // might store null

This allows the compiler to reason about the nullability of your code.

One technical quibble. Since Groovy doesn't have static types this static bit isn't relevant to Groovy.

Anyway, the equivalents in Scala and Haskell are String vs Option[String]/Maybe String. More characters, but the same idea. Except that Fantom and Nice and other languages with nullable types don't tend to support saying ??String whereas Option[Option[String]] and Maybe (Maybe String) are totally reasonable. See my caching example above.

Safe Invoke

Runtime: The second aspect is solved by Fantom's "safe invoke" operator, ?.. This operator allows you to dereference a null pointer without receiving a null pointer exception:

// hard way
Str? email := null
if (userList != null)
{
  user := userList.findUser("bob")
  if (user != null) email = user.email
}
 
// easy way
email := userList?.findUser("bob")?.email

Note how this second example is semantically equivalent to the first one but with a lot of needless boiler plate removed.

And in Scala that might look like

userList flatMap (_ findUser "bob") flatMap (_.email)

while haskell would use

userList >>= findUser "bob" >>= email

There are some more keystrokes indeed, but here's the thing...

Better

So, can someone explain to me how Option addresses the null pointer problem better than Fantom's approach

Option/Maybe/flatMap/>>= are all ordinary library code. No magic. The types are algebraic data types, an idea that is massively useful. The methods belong to a class of types called Monad. If Option/Maybe don't work for you as-is you can build your own versions. With nullable types and ?. operator you're stuck with what the language provides. If they don't work for your particular case then you can't create your own variants. Quick, in Fantom extend the ?. operator so that works for exception-like semantics such that failure carries a reason instead of just computing null. In Haskell and Scala that already exists and it's ordinary library code in a type called Either.

Conclusion

In practice, outside of learning, I've never ever gotten the equivelent of an NPE by abusing Haskell's fromJust or Scala's Option#get, not even in testing. On the other hand NPE's are a fact of my life when dealing with nulls in Java et al. And while it's true that testing has revealed the vast majority of potential NPEs (or equivalent) in my life, it certainly hasn't dug up all of them. I've seen plenty of production systems brought to their knees by NPEs.

Exceptions plus a deliberate weakening of the type system allow developers to create "oh, just blow up" code when that's the right answer. But the developer has to explicitly say so.

Fantom/Nice style nullable types plus a "safe invoke" operator are a nice improvement over plain old null. But they are not composable (no ??T) and they are special purpose magic aimed at a very narrow domain instead of general purpose magic such as algebraic data types and higher order functions that are broadly applicable to huge problem domains.

So there you have it, Cedric, what are we missing in your post?

PostScript on Scala and Null

Which brings me back to Scala's null. For interoperability reasons Scala has a full blown Java-like null. But the experience that all the Scala developers I know have is that NPE is mostly only a problem when dealing with existing Java libraries because Scala programmers and libraries avoid it. Plus, Scala's "Option(foo)" promotes foo to either Some(foo) or None depending on whether it's null or not that at least enables the map/flatMap style of "safe invoke." It's far less than perfect, but about the best that can be done when dealing directly with Java.

For Haskell, null is a non-issue.

Friday, August 13, 2010

Sometime in 1977

Kernighan:

"I know, let's open the book with a program that displays 'hello, world.'"

Ritchie:

"Great idea! I'll start the patent search."

Monday, August 2, 2010

On Removing Java Checked Exceptions By Means of Perversion

/*

What do you do in Java when you need to throw a checked exception in a context where it's not allowed, for instance when implementing a third party interface with no declared throws clause? If you're smart and its allowed you just write the code in another language. Otherwise you probably just wrap the bastard checked exception in an unchecked exception and be on your way.

But the wrapping solution a) requires an extra object allocation and b) unwrapping the original exception is a bit messy.

No, no, those justifications for what I'm about to do are horse shit rationalizations. The fact is I just want to pervert Java. I make it my job to pervert languages for their own good. Or at least, my own amusement.

Instead of "throw" meet "chuck" which turns checked exceptions into unchecked chucked exceptions. This post is an executable Java program.

Edit: This is an idea that can be traced back to Java Puzzlers by Bloch and Gafter and perhaps further. I've just refined it a bit with a bottom return and a way to catch the exception.

*/

import java.io.IOException;

public class Unchecked {

/*

The key to today's lesson in programming aberration is the fact that Java allows you to cast to a generic type parameter but can't actually check the cast at runtime due to type erasure. The following method does an unchecked cast from a Throwable to some generically specified subtype of Throwable and then immediately throws it. Neither the static nor dynamic type system ever have a chance to detect any deviant behavior such as casting to a "wrong" type. The code also lets the caller specify any return type. A previous article explains why that's useful.

*/

   @SuppressWarnings("unchecked")
   private static <T extends Throwable, A> 
     A pervertException(Throwable x) throws T {
      throw (T) x;
   }

/*

Then chuck puts some lipstick on that kinky little pig by telling it to act like it's throwing a RuntimeException. Et voilĂ , no need to declare the exception in a "throws" clause.

*/

   public static <A> A chuck(Throwable t) {
      return Unchecked.
        <RuntimeException, A>pervertException(t);
   }

/*

Some sample code shows how to use chuck.

*/

   public static int testChuck() {

/*

We can't throw an IOException because it's checked

*/

      // throw new IOException("checked, oh noes");

/*

But we can chuck an IOException

*/

      return chuck(new IOException("unchecked, hellsyeah"));

/*

And, just like with a throw the next line won't compile because it's unreachable.

*/

      // System.out.println("dead code");
   }

/*

If you never want to catch the exception or you want to handle it with a top level "catch Exception" then that's all there is to it. But if you want to catch and handle the IOException specifically then there's one tiny flaw in the system. Sadly, the following won't compile because IOException isn't statically known to be thrown by testChuck() and Java wouldn't want you to catch something that can't be thrown now would it?

*/


//   public static int wontCompile() {
//      try {
//         testChuck();
//      } catch (IOException e) {
//         System.out.println("Why did you leave me with" + 
//                            " the sad clown of life?");
//      }
//   }

/*

The fix is to add a way to tell the compiler what exceptions we might chuck, kinda like the "throws" keyword but this one chucks. Totally different. The chucks method threatens to throw a T but in a sudden fit of shame it does nothing - per a tip from a commenter it takes a class argument to avoid Java's hideous generic method invocation syntax.

*/

   public static <T extends Throwable> 
     void chucks(Class<T> clazz) throws T {}

/*

And here we go, we can catch our chucked exception.

*/

   public static void main(String[] args) {
      try {
         chucks(IOException.class);

         testChuck();         
      } catch (IOException e) {
         System.out.println("Caught chucked exception " + 
                             e + ".");
      }
   }
}

/*

Running that should display "Caught chucked exception java.io.IOException: unchecked, hellsyeah."

My depraved perversion work is done. Now I may rest. I leave you with this imponderable: how many unchecked checked exceptions will Unchecked.chuck() chuck?

*/