Tuesday, March 26, 2013

King Null the Stubborn

It's mostly pretty easy to eliminate null in a language: just replace it with Maybe/Option types. Unfortunately it's only mostly easy to get rid of null that way in class based OO languages. There's one corner where null is surprisingly stubborn without limiting OO languages: initialization. I'll demonstrate using Java but I'll show that the same or similar problem can manifest in most class based mainstream OO languages.

Java

class Base {
  public Base() {
    System.out.println(message().length());
  }

  public String message() {
    return "Hello";
  }
}

class Sub extends Base{
  final String msg;

  public Sub() {
    msg = "HI!";
  }

  @Override public String message() {
    return msg;
  }
}

class Test {
  public static void main(String[] args) {
    new Sub();
  }
}

And that crashes with a nice null pointer exception even though null is nowhere to be found.

The null stems from a few simple rules:

  1. fields not initialized at their declaration are initialized to null.
  2. super constructors execute before sub class constructors.
  3. method invocation in a constructor exhibits the same runtime polymorphic method dispatch that you expect from method invocation outside of constructors.

So the Base constructor executes, calls the message implementation on Sub, which returns the msg field, but it's only been initialized to null, and kaboom.

Other OO Languages

Ruby and Python can exhibit a pretty similar problem, here some Ruby.

class Base
  def initialize
    puts message().size
  end

  def message
    "Hello"
  end
end

class Sub < Base
  def initialize
    super
    @msg = "HI!"
  end

  def message
    @msg
  end
end

Sub.new

Once again we get a null (well nil) related exception. The difference is that in Ruby and Python you have to explicit about calling the super class constructor and you have more flexibility about the placement of that super call, so the Ruby code can be fixed by writing

class Sub < Base
  def initialize
    @msg = "HI!"
    super
  end

  def message
    @msg
  end
end

But the point is that there's nothing preventing the "bad" version from being written.

Scala and C# work like the Java version. Scala has an extra wrinkle of having "abstract vals." C# has the small variation that methods aren't dynamically dispatched unless explicitly defined to be.

C++

Finally we get to the one mainstream OO language that has a guaranteed-to-work static solution for at least one part of the problem: C++ (1)

#include <iostream>
using namespace std;

class Base {
public:
  Base() {
    cout << message() << endl;
  }

  virtual string message() {
    return "Hello";
  }
};

class Sub : public Base {
private:
  string* msg;

public:
  Sub() {
    this -> msg = new string("HI!");
  }

  virtual string message() {
    return *msg;
  }
};

int main(int argc, char** argv) {
  Sub sub;
}

If you don't feel like compiling and running that then I'll cut to the chase: it prints "Hello". That's because in C++ 'this' is constructed in stages and during the Base constructor 'this' is only a Base and not yet a Sub. Even a variant which explicitly passes 'this' around will print 'Hello'

class Base {
public:
  Base(Base* self) {
    cout << self -> message() << endl;
  }

  virtual string message() {
    return "Hello";
  }
};

class Sub : public Base {
private:
  string* msg;

public:
  Sub() : Base(this) {
    this -> msg = new string("HI!");
  }

  virtual string message() {
    return *msg;
  }
};

C++'s rule works very well to prevent many uninitialized 'this` problems, but the downside is that it prevents some perfectly good code from working polymorphically. The following still gets "Hello" even though "HI!" would cause no problems.

class Sub : public Base {
public:
  virtual string message() {
    return "HI!";
  }
};

Which is a perpetual source of confusion.

The Big Hole

C++'s rule goes far, but it doesn't go far enough. If you're lucky the following code will seg fault. Formally it's completely undefined what will happen. In other, more safe languages, it will be a null pointer exception.

void dump(Base* base) {
  cout << base -> message() << endl;
}

class Sub : public Base {
private:
  string* msg;

public:
  Sub() {
    dump(this);
    this -> msg = new string("HI!");
  }

  virtual string message() {
    return *msg;
  }
};

Leaking 'this' out of a constructor is a known bad idea, but I don't think any mainstream-ish OO languages prevent it.

Nor does C++'s rule prevent doing something silly in a constructor like {string x = *msg; msg = new string("hello"); cout << x}

Conclusion

So there you have it, null is mostly preventable using Option/Maybe types. But to get rid of it entirely a class based OO language would need to

  1. limit the polymorphic dispatch on objects that are under construction like C++ does, and
  2. statically eliminate references to fields that aren't yet initialized in a constructor (e.g. {x = y; y = "hello";} needs to be prevented)

It would also have to do one of the following:

  1. prevent 'this' from leaking out of a constructor
  2. or only let 'this' leaking from a constructor represent the part of the object that has been fully initialized, e.g. a 'this' leaking from Sub's constructor must be a Base just as it is during Base construction.
  3. or require that all fields be initialized immediately at declaration site (or use an equivalent mechanism like C++'s initializer lists)(2)
  4. or do expensive whole program analysis to ensure that a leaking this isn't a problem

That's what it would take to make null go away. But it would also prevent perfectly good code from either working as desired or compiling at all.

Footnotes

  1. I'm avoiding initializer lists and needlessly using "new" and pointers on C++ strings to illustrate my point. If that bothers you then pretend I'm using something where pointers are actually useful. I should also be doing copy constructors, assignment operators, virtual destructors and all that other fun C++ stuff, but all that boilerplate would be a distraction from my point here.
  2. If you squint just right, the 'every field initialized at declaration site' rule is exactly how many statically typed functional languages like Haskell and ML deal with 'records' and algebraic data type constructors without requiring a null like construct for uninitialized fields.

Friday, February 15, 2013

The Best-Laid Schemes O' Mice An' Constructors Gang Aft Agley

Last time I talked about how the Scala compiler currently translates try/catch expressions used as arguments to method or constructor call into separate method definitions which in turn may require boxing of mutable variables. That's because an exception in the try would blow away the local stack of stuff needed to make the original method/constructor call. So I proposed a plan for the Scala compiler to translate

expr0.foo(expr1, try/catch)

into

val temp0 = expr0
val temp1 = expr1
val temp2 = try/catch
t0.foo(temp1, temp2)

And to do that recursively as needed.

The Flaw

It was a great plan. A beautiful plan. It had only one flaw. Under some circumstances it would change the behavior of some constructors calls. To understand why we need to look at some JVM bytecode.

At the bytecode level, the constructor call

val foo = new Foo(expr1, expr2,...)

looks like(1)

new Foo
dup
// compute the result of expr1 and leave on stack
...
// compute the result of expr2 and leave on stack
...
invokespecial Foo.init
store foo

Which may need some explanation. The first instruction, "new Foo", creates a new, uninitialized Foo object but does not invoke its constructor. A reference to the uninitialized object is left on the top of the local stack. Then dup duplicates the top of the stack so now there are two references to the uninitialized object. Then a series of instructions computes the results of the arguments to the constructor, leaving each result at the top of the stack in turn (and pushing down the previous stuff). Finally, Foo's constructor is called which consumes all the constructor args and one reference to the Foo object. After the constructor is done, what's left behind on the stack is one reference to the now initialized Foo object. The store instruction consumes that and stuffs it into the foo variable.

If expr2 were a try/catch and my proposed transformation were to kick in, then that would look like

// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
new Foo
dup
load temp1
load temp2
invokespecial Foo.init
store foo

At first glance that would appear to solve the problem nicely: the order of effects between expr1, expr2, and Foo's constructor is all preserved. Right? Sigh.

You see, the bytecode instruction "new Foo" potentially has a visible side effect. That's because it can cause Foo's static initializers to fire if they haven't already. In the original, inline version, the static initializers would have their effect before expr1 and expr2. In the second version the static initializer might happen after. Here's a bit of Java code that demonstrates the difference.

public class Sample {
  static class Foo1 {
    static { 
      System.out.println("Foo1 static init"); 
    }

public Foo1(int arg1, int arg2) { System.out.println("Foo1 constructor"); } } static class Foo2 { static { System.out.println("Foo2 static init"); }

public Foo2(int arg1, int arg2) { System.out.println("Foo2 constructor"); } }

static int expr1() { System.out.println("expr1"); return 1; }

static int expr2() { System.out.println("expr2"); return 2; }

public static void main(String[] args) { System.out.println("static init comes first."); Foo1 foo1 = new Foo1(expr1(), expr2());

System.out.println(); System.out.println("static init comes later."); int temp1 = expr1(); int temp2 = expr2(); Foo2 foo2 = new Foo2(temp1, temp2); } }

I wrote that in Java because Scala doesn't have static initializers (the equivalent is done via constuctors on declared objects). Which might make you think "then who cares." But, Scala has to inter-operate with Java. So we need a plan.

Plan 1 (which won't work)

It's tempting to try to stuff the uninitialized object into a variable before dealing with the args.

new Foo
store temp0
// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
load temp0
dup
load temp1
load temp2
invokespecial Foo.init
store foo

That looks beautiful but, sadly, it is expressly forbidden by the JVM spec to have a ref to an uninitialized object in a local variable when a catch block is executed. The bytecode verifier will reject that approach outright.(2)

Plan 2 (which is annoying, quite probably slow, and may not work everywhere)

So we need to force class initialization early. One way way to do that is Class.forName. So now we have this

ldc classOf[Foo]
invokevirtual Class.getName
iconst_1 // push `true` onto the stack
invokevirtual Class.getClassLoader
invokestatic Class.forName
pop // throw away the class object that results from Class.forName
// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
new Foo
dup
load temp1
load temp2
invokespecial Foo.init
store foo

That's a lot of code coming out of one measly constructor call! To some extent that can be mitigated by creating a static helper method in the runtime library and calling that instead. But that doesn't entirely mitigate a performance concern. How fast is Class.forName? The answer will take some benchmarking but I'll bet right now it's a whole lot slower than code without it.

Even ignoring pefromance concerns Class.forName has a potentially fatal flaw: security managers might very well prevent it. So library A happens to trigger this transformation and company B can't use the library because of a policy decision about restricting Class.forName.

Plan 2b (which might help a bit with Plan 2's slowness but definitely won't work everywhere)

A solution related to Plan 3 is to use sun.misc.Unsafe.ensureClassInitialized. It's a much simpler interface than Class.forName and is likely closer to the underlying mechanism to do class initialization win the "new" instruction. The problem is that it's not portable to all JVMs and it's even more likely to be hidden away by a security manager.

Plan 3 (which is so crazy it just might work)

A realistic possibility, which is a bit crazy, is to create an uninitialized object then throw it away which would force static initialization but wouldn't call the constructor.

new Foo // ensure initialization
pop // throw that object away, we don't need it
// compute the result of expr1 and leave on stack
...
store temp1
// compute the result of expr2 and leave on stack
...
store temp2
new Foo
dup
load temp1
load temp2
invokespecial Foo.init
store foo

Which looks distinctly WEIRD but it's valid bytecode and gets the job done. The annoying factor with this one is that it can only be expressed in bytecode so either the whole transformation has to be done at a very low level or we would have to add a new node type to the Scala AST which is used solely to express "force static initialization of thus and such class" and then let the code gen phase translate that to a new/pop pair.

There's also a bit of concern that it's TOO weird. It's valid by the spec, but it's nothing javac would ever spit out so it's entirely possible that some JVM implementation somewhere would be very confused by it.

Plan 4 (which is scary, but has some merit)

The final possibility is to ignore the problem. Scalac could just reorder expr1 and expr2 to come before the new and leave it at that. It's not something that most Scala programmers would ever even notice. The very few it does affect can manually do whatever translation they want to get their static initializers to fire at the right time. Perhaps the -Xlint flag could issue a warning about static initializer order when the transformation takes place.

But then again, the reason for the re-ordering will be pretty non-obvious so the few it does affect will likely struggle with it for awhile, then decide Scala is just broken without ever seeing what's behind -Xlint.

What Now?

Hopefully this has been an illuminating romp into some of the innards of compiling for the JVM. The immediate plan is that Scala will continue to turn problematic try/catch expressions into method definitions which will occasionally mean a bit of extra boxing. In the grand scheme of things that perf hit is probably small enough and rare enough to keep this whole experiment at a low-ish priority. Still, when I have some time I'd like to go see what other JVM language with try/catch expressions are doing here. They might have figured something out that I haven't.

Footnotes

  1. I'm using javap as a guide for how to disassemble bytecode. But I'm keeping variable names and class names and ignoring details about variable slots and the constant pool that aren't important for this discussion.
  2. Why is it not valid? The simple answer is "because the spec says so." Specifically it says "There must never be an uninitialized class instance in a local variable in code protected by an exception handler." But the underlying reason for that restriction isn't entirely clear to me. JVM bytecode is statically typed. The JVM's type system is just very peculiar by the standards of mainstream statically typed languages. The type checking is (a part of) the bytecode verification process. In the verifier, every variable and stack slot gets one static type over a static region of the bytecode. In different regions those types are allowed to be different, but when flow through the two regions converges all the types must be consistent. Now, the verifier thinks of an uninitialized Foo as a distinct type from an initialized Foo. Calls to a constructor (init) are the magic that turns an uninitialized and thus unsafe to use Foo in one region into one that is initialized and thus safe to use in another region. Which is all good. Except for some reason the uninitialized to initialized transition has been special cased for exception handlers rather than treated as just another type-state change to a variable that must converge consistently. Perhaps there was concern over the runtime cost of a more complicated verification rule. Or perhaps there was some soundness hole they didn't think they could plug. I dunno. Whatever the cause I have to go to war with the army I have not the army I wish I had.

Friday, January 25, 2013

Scala Try/Catch Lifting Proposal

TL;DR

Scalac currently deals with try/catch on a potentially non empty stack by lifting to a def. But that can lead to more boxing of vars captured in the lifted defs and a second pass to deal with try/catches that get put into the boxing constructor. I propose we use local vals to lift try/catch expressions instead of using defs. That will prevent the extra boxing and, perhaps later, allow us to inline more code.

Background

Scala makes try/catch an expression. But the JVM is actively hostile to using try/catch as an expression because the instant a throw happens the local stack frame is erased. So if we naively expand try/catch then an expression like "foo(bar(), try/catch)" leads to different stack heights before foo is finally applied. The "main" branch has both "this" and the bar() result on the stack, the catch branch does not. Boom, bytecode verify error. In fact that has been a source of several bugs as we've missed some corner case or other involving try/catch. In particular, nested try/catch can also break us in things like in foo(bar(), {val x = 42; blurg match {case whatever => if(something) x else try/catch}}) where the try/catch is nested in an if/else in a match in a block.

Current Situation

Currently UnCurry looks for try/catch in a spot where it could cause this problem and lifts such try/catch expressions into defs. In
other words

foo(bar(), try a catch {case b => c})

becomes

def liftedTry1 = try a catch {case b => c})
foo(bar(), liftedTry1)

And that's fine. But there's a nasty wrinkle. If the try/catch refers to a local var then suddenly that var is a captured var that needs boxing. That's a boxing that a user would probably not predict without a very subtle understanding of Scala internals.

Further, the boxing, which happens in LambdaLift, may well lead yet again to try/catch on a non-empty stack.

var x = try a catch {case b => c})
foo(bar(), try x catch{...})

After UnCurry

var x = try a catch {case b => c})
def litedTry1 = try x catch{...}
foo(bar(), liftedTry1)

After captured var boxing in LambdaLift

var x = new IntRef(try a catch {case b => c}))
def litedTry1 = try x.value catch{...}
foo(bar(), liftedTry1)

Having the try/catch in the constructor call means it's going to end up on a non-empty stack. To deal with that LambdaLift does a post-transform step to deal with try/catch yet again.

var x = try new IntRef(a) catch {case b => new IntRef(c)}))
def litedTry1 = try x.value catch{...}
foo(bar(), liftedTry1)

And again that post-transform needs to be smart about try/catch nested inside things.

Proposal Outline

I propose that we get rid of the try lifting logic in both UnCurry and LambdaLift and defer it to a later phase. That phase would look for try/catch problems and, when found, transform into local vals. An example:

val r = quux().foo(bar(), try a catch {case b => c}), baz())

So finding a try/catch inside an expression on a non-empty stack, we need to transform to local vals like so:

val r = {
  val temp0 = quux()
  val temp1 = bar()
  val temp2 = try a catch {case b => c})
  temp0.foo(temp1, temp2, baz())
}

It's important that quux() and bar() get evaluated and stuffed into vals in order to preserve order of evaluation. (By-name translation will have long since been done, so it need not be considered here)

We also need to deal recursively with nested try/catch

val r = quux().foo(bar(), qwerb().flurb(try a catch {case b => c})), baz())

step1:

val r = {
  val temp0 = quux()
  val temp1 = bar()
  val temp2 = qwerb().flurb(try a catch {case b => c}))
  temp0.foo(temp1, temp2, baz())
}

That leaves a try/catch expression on a non-empty stack, so
step2:

val r = {
  val temp0 = quux()
  val temp1 = bar()
  val temp2 = {
    val temp4 = qwerb()
    val temp5 = try a catch {case b => c})
    temp4.flurb(temp5)
  }
  temp0.foo(temp1, temp2, baz())
}

Details still need to be worked out. The existing logic in UnCurry seems to be a good starting point for finding problematic try/catch, though.

Inlining

The main body of the proposal stands alone and I think it's worth while no matter what. But it may allow some improvements in inlining

Because we want to inline code that may already be compiled, we defer inlining until we're working with i-code. As part of its analysis the inliner examines the code it wants to inline to see if it has any exception handlers. If it does then it refuses to inline the code into a spot that has a non-empty stack. In other words, it's a third place we deal with problematic try/catch, but this time by refusing to inline.

I propose that after try/catch lifting is implemented via local vals, we can make inlining decisions earlier in the compiler, perhaps right after UnCurry. Under this scheme, if we decide we want to inline then we would carry around a node representing the to-be inlined code. Then, when we get to the try/catch lifting phase it conservatively treat the code like any other node that has a try/catch, lifting to a local val if a potential try/catch would be problematic. Then the inlining phase would do the actual inlining.

Besides allowing code with exception handlers to be inlined, with this second part of the proposal the compiler should have to do less work over all. Currently we do a lot of work analyzing and transforming lambdas that may eventually be transformed away by the inliner.

I have to admit this second part of the proposal has no meat on it. It's not clear it's even workable. But it is at least some argument supporting lifting try/catch with local vals.