Friday, June 17, 2011

Sleep Sort! With Actors! For No Reason!

The world's stupidest, racy sort algorithm ported to Scala actors for no good reason.

import scala.actors._
import Actor._

def sleepSort(xs : List[Int]) : List[Int] = {
   def sorter(aggregator : Actor, n : Int) = actor {
      reactWithin(n * 1000) {
        case TIMEOUT => {
           aggregator ! n
           exit
        }
      }
   }

   xs map (sorter(self, _)) foreach (_.start)

   val inputLength = xs.length

   def aggregate(result : List[Int], resultLength : Int) : List[Int] = 
     if (resultLength == inputLength) {
        result.reverse 
     } else receive {
        case n : Int => 
          aggregate(n :: result, resultLength + 1)
     }

   aggregate(List[Int](), 0)
}

Example usage:

scala> sleepSort(List(3,1,2))
res0: List[Int] = List(1, 2, 3)

Caveats: Can take a shockingly long time. Doesn't handle negative numbers. Doesn't handle actor failures. Races are possible. If you try to use this in real code then you will be fired and possibly shot.

Monday, May 9, 2011

Why Eager Languages Don't Have Products and Lazy Languages Don't Have Sums

In The Point of Laziness, Bob Harper mentioned a core duality between lazy and eager languages[1]. Eager languages have sum types but not product types and lazy languages have product types but no sum types. I've seen a few questions about what he means. I thought I'd try to explain. It all has to do with non-termination, our old friend "bottom" which I'll write ⊥. In this entire article I'm focusing on languages that don't allow any side effects other than ⊥ - that way I don't have to deal with ordering print statements or whatever. Also, I'm using an MLish notation that hopefully isn't too confusing.

Update: This comment by Dan Doel does a better job of explaining the issues than my whole article.

Product Types

A product type is a pair. The pair type (A,B) is the product of A and B, and is often written as A * B or A × B. A 3-tuple[2] (A,B,C) can be seen as another way of saying ((A,B),C), etc. so any n-Tuple is just a pretty way to write nested pairs. The simplest way to define exactly what a pair means is through projection functions "first" and "second" which pull out the first or second element. The equations we want are

first (a,b) == a
second (a,b) == b

Pretty straight forward right? Well, those equations hold just fine in a lazy language even when a or b is ⊥. But in an eager language, first (a,⊥) == ⊥. Same deal with second. Oops. Hence lazy languages have product types in the sense that they completely meet these equations where eager languages don't.

That has a practical consequence. A compiler for an eager language can't necessarily simplify first(a, someComplicatedExpression) unless it can prove that someComplicatedExpression is never ⊥, and in general it can't do that. A lazy language compiler would always be free to throw away someComplicatedExpression.

Sum Types

A sum type is an alternative between two choices. A | B is the sum of the two things, A and B, and is frequently written A + B. Choices among three elements, like A | B | C can be considered as just a pretty form of (A | B) | C - a nesting of binary choices. The simplest sum type is good old Boolean, the choice between true or false. Here's one equation we might expect to hold for Booleans and something that examines booleans, the if expression[3]

((if condition then x else y), z) == if condition then (x,z) else (y,z)

Ignore the possibility of x, y, and z being ⊥ and focus on "condition". In an eager language, condition must be evaluated in either of the two forms above and either way you get ⊥ if condition is ⊥. But in a lazy language, the left side will compute (⊥, z) where he right side will compute ⊥. Pulling the second element from those two results gives differing results, proving they are unequal.

Again, this has consequences. An eager language can simplify the right side form to the left side form if it wants to. A lazy language can't.

Footnotes

  1. Strictly speaking I should say strict and non-strict instead of eager and lazy, but I'll go with Dr. Harper's wording because the difference isn't that important to the point.
  2. Or a record (aka a struct). A tuple can be seen as a record where the field labels are numbers or a record can be seen as a tuple with some convenient labels for the positions.
  3. If you are more familiar with C like languages that treat "if/else" as statements instead of expressions then pretend I'm using a ternary operator "condition ? x : y" instead of "if condition then x else y"

Wednesday, January 19, 2011

The Read Eval Print Lie

It's a common misconception that a language must be "dynamic" or "interpreted" to have a REPL, a Read Evaluate Print Loop (also sometimes confusingly called an interpreter), or to have an "eval" function. That's not true at all.

Haskell, OCaml, F#, and Scala are all statically typed languages with very sophisticated static type systems, yet all have REPLs. All are usually compiled although interpreters exist for Haskell. Typed Racket (a variant of Scheme) is statically typed but has a REPL and I think it's always compiled.

Nor does a language have to be interpreted to have a REPL. Groovy is a dynamic language in both the dynamically typed sense and the "mutate my program structure at runtime" sense. It's always compiled, yet it has a REPL. JRuby (Ruby for the JVM) is similarly always compiled as far as I know, but has a REPL. Clojure and Erlang are always compiled, but both have REPLs.

The misconception about the nature of REPLs is caused by the C family of statically typed languages: C, C++, Java, C#, Objective-C and D. They do not tend to lend themselves to REPLs, but it's not because of the static typing or the compiling. It's because of the statement orientation and verbose boilerplate around every construct. A REPL is possible in those languages, it just wouldn't be very useful.

At heart, a REPL is always possible because "eval" is possible in any Turing complete language. That's CS 101 - see universal Turing machine and the Church-Turing thesis.

Remember that compiling vs interpreting is just an implementation choice as far as semantics are concerned. And stop looking to the C-ish languages as examples of the limits of static typing.