Tuesday, March 15, 2011

Scala vs. a Java Anti-Pattern

In the last few months I've been playing with Scala, a new(ish) programming language that runs on the JVM. The problem with learning a fun new language is that you start wanting to use its features, even when you're still using a different language. (In this case, Java.) But I came across a case this past week that illustrated this so well that I actually wanted to write it down to share.

The Anti-pattern

It's common to need to return more than one item. Maybe a name and date, or an ID and a status, or a result and an error code. Languages like Java don't really make this easy -- the return statement only takes one argument.

This often gets solved by returning some type that approximates a tuple. For example:

// ...
TuplePair<Integer,ErrorType> pair = foo.computeSomeValue();
if (checkErrorConditionOK(pair.getSecond())) return pair.getFirst();
else throw new AppropriatelyNamedExceptionTypeException(pair.getSecond().getErrorMessage());
// ...

This is pretty ugly. But foo has returned a couple values to us. And with Java generics, we even get some compile-time type checking -- pair.getFirst() is an Integer, and pair.getSecond() is an ErrorType. But that's a pretty ideal case with a nice descriptive type parameter. What if our parameters are of the same type?

// ...
TuplePair<Integer,Integer> pair = foo.computeSomeValue();
if (checkErrorConditionOK(pair.getSecond())) return pair.getFirst();
else throw new AppropriatelyNamedExceptionTypeException("Got error code: " + pair.getSecond());
// ...

From this code, I have no way of knowing that the second item in the tuple is an error code. I can infer it from the way the code is written. But maybe whoever wrote this code got the parameters mixed up. If I'm reading through this code (debugging or doing a thorough code review), I now have to open up foo's implementation to see what actually gets returned in that tuple.

In this case, our TuplePair is barely better than an array.

Why the Anti-pattern Persists

It's easy to understand why I keep seeing this pattern in Java, though. There's a lot of boring boilerplate code that has to be written to work around this problem. To make the code more meaningful and readable, and to allow for easier refactoring in the future, you should assign a name to each thing you want to return. (No, you don't want to do that with a Map. That's an anti-pattern for another blog post.) The best way to do this is with a new class.

In Java, writing a new class is pretty heavyweight. If you want to use proper encapsulation, your new class might look like this:

public class ComputedValue
{
private int value;
private int errorCode;

public ComputedValue(int value, int errorCode)
{
this.value = value;
this.errorCode = errorCode;
}

public getValue() { return value; }
public getErrorCode() { return errorCode; }
}

Since ComputedValue is a public class, you have the option of making it a static inner class, or going the more common route and creating an entirely new file. But at least now our code is a bit more readable:

// ...
ComputedValue computed = foo.computeSomeValue();
if (checkErrorConditionOK(computed.getErrorCode()))

return computed.getValue();

else throw new AppropriatelyNamedExceptionTypeException("Got error code: " + computed.getErrorCode())
// ...

What Scala Offers

Given the above option, it’s not surprising that people keep creating the TuplePair class. Frankly, it’s a pain in the ass to write new Java classes for every case like this. The thinking is “Write TuplePair once and re-use it so we don’t have to do it again,” even though it leads to useless names like getFirst() and getSecond().

Scala helps solve this class of problems by making creating these types of classes dead simple:

class ComputedValue(val value: Int, val errorCode: Int)

This single line gives you everything that the above Java implementation did: A type with “value” and “errorCode” properties, with a constructor for easily creating new instances. And while you didn’t have to write any boilerplate code to provide encapsulation, it’s there, implemented automatically by the Scala compiler.

Classes in Scala are public by default, and they don’t have to be declared in separate files or as static inner classes. That makes the barrier to entry to writing better code a single line in a file you already have open. The Scala version of the above snippet becomes:

// ...
val computed = foo.computeSomeValue();
if (checkErrorConditionOK(computed.errorCode)) return computed.value;
else throw new AppropriatelyNamedExceptionTypeException("Got error code: " + computed.errorCode)
// ...

But Scala gives you even one more nice feature in this simple case. So far, we’ve only been looking at code that consumes a ComputedValue instance. But what about the code that creates it?

// Java:
public ComputedValue computeValue()
{
return new ComputedValue(0, 42);
}

Even with our new ComputedValue class, this code still has the same problem as our TuplePair! Someone reading this code can’t tell which is the value and which is the error code. To be sure, we would have to open up ComputedValue and look at its constructor parameters. But we’re still not sure if the developer knew the proper parameter order when he wrote that line of code. We can’t see his intention from the information in the call to the constructor. I use the following pattern to make this more clear in Java:

public ComputedValue computeValue()
{
final int value = 0;
final int errorCode = 42;
return new ComputedValue(value, errorCode);
}

The above at least makes my intentions clear. But to check whether my argument order matches the argument order expected by the constructor, you’ll have to look at ComputedValue’s implementation. How does Scala improve on this?

def computeValue() = new ComputedValue(value = 0, errorCode = 42)

Again, a single line of Scala replaces several lines of Java. The code here is not only more concise, but more explicit. The return type of computeValue() is inferred by the result of the method’s implementation on the right-hand side. And by using named arguments, we not only make our intention clear to the reader, but also to the compiler. Scala will check at compile-time that ComputedValue’s constructor takes “value” and “errorCode” parameters, and will even re-order our paramters for us if we got them in the wrong order.

These are some of the most basic features that Scala offers, and yet they’re the ones that come up in day-to-day work over and over again. By getting tedious boilerplate out of the way, Scala lets coders write better code without worrying about making more work for themselves.

15 comments:

  1. Make it a case class, then you can also drop the `new`s and `val`s... You get equality and hashCode for free too :)

    ReplyDelete
  2. Nice post!

    Using Scala you can even go further, though: Do you know Either from the Scala library?

    It is basically your ComputedValue, but parameterized and hence can be used with any types for errors and values.

    It is also an algebraic type, i.e. can only be a Left (for error) or a Right (for correct value). These are case classes which can be instantiated without the new keyword.

    ReplyDelete
  3. Astonishing how experiences can differ. I've never seen this Anti-Pattern in 15 years of Java development. Glad I have been lucky.

    Best
    Stephan
    --
    http://codemonkeyism.com

    ReplyDelete
  4. @heiko - `Either` *might* not be suitable as a drop-in replacement here. As written, the errorCode in the example could have multiple possible values representing different types of success, relevant if you then need to retain the exact success reason for some other purpose.

    Otherwise, totally agree with you, 100%

    ReplyDelete
  5. @Stephan - The real anti-pattern is using exceptions to represent non-exceptional return values. That sort of thing happens much less with the availability of lightweight tuples and ADTs, making multiple-return values easy.

    ReplyDelete
  6. What would be the alternative for getting two values out of a method in Java?

    a) to use a Map (considered already an antipattern)
    b) passing some "Container type" as parameter to be filled with the second value?
    (decouples the items of the concept "result value")

    The use of the generic pair has it's drawbacks, but I think used with care they are
    not that strict an antipattern if:
    - the pair is not used further but the point of calling the method which returns the pair

    - the method's name and the name of the variable the result is assigned to,
    relect the different appropriate meanings of the items in the pair
    (e.g. final TuplePair columnValueAndWidth =
    getColumnValueAndWidthOf(row, index))

    - the values are extracted immediately into separate typed variables with their
    appropriate meanings
    (as done by Scala via pattern matching e.g.
    val (value, width) = getColumnValueAndWithOf(row, index)
    or Java:
    final int columnValue = columnValueAndWidth._1;
    final int columnWidth = columnValueAndWidth._2;
    )

    If this is taken into account, the "danger" of loosing sight of the
    meaning of the items in the pair is minimized
    but the coherence of "the result" maintained.

    ReplyDelete
  7. @kev.lee.wright: Yeah, I know about case classes. The post was already getting a bit long, though, so I didn't go into them. They do seem perfectly fit for this case, though.

    @Heiko: I wasn't familiar with Either. Thanks for the info. But as kev mentioned, error codes and results aren't always mutually exclusive. The most recent example I can think of is HTTP error codes. Even though HTTP returned an error, the page still has a body! I need to see both! (I've actually used an HTTP library that throws an exception on HTTP errors which made this way more difficult to do than it should have been.)

    @Stephan: How do you generally return multiple values?

    @Lutz: If you're immediately going to assign _1 and _2 to variables with better names, why not just return a data type with better names?

    ReplyDelete
  8. Yeah, I can confirm that in our (big) Java project, we have "generic" classes like TwoObjects, TwoInts, TwoStrings and others TwoOurCommonClassName...

    I have probably read too fast, but I fail to see why you don't mention returning a good old Tuple in your Scala code.
    Often we just need a couple of returned values from some local function/method and don't need them to have explicit names.

    ReplyDelete
  9. @philho: I know about tuples, but a Scala tuple would have one of problems as TuplePair above -- the values in the tuple don't have meaningful names. If I'm just going to unpack the tuple on the receiving end, why not instead use a meaningful return type so that the names are consistent throughout the code?

    Mostly, I wanted to demonstrate that if you actually want to give your values meaningful names, doing so in Scala is a lot easier than in Java.

    ReplyDelete
  10. @kev.lee.wright and blog author
    I’m with heiko on this one. Either fits nicely here. The blog author states that looking at this line:
    TuplePair pair = foo.computeSomeValue();
    we can not easy determine which represents failure and which success. But why is that? Because tuple is just
    a container with element 1 and 2. Tuple offers access to the elements but doesn’t help to distinct between success and failure.

    so lets look at the solution:
    def computeValue() = new ComputedValue(value = 0, errorCode = 42)

    Here the variable name is used to distinct the two desired states. Not the type.

    I think the question here is what we want to know about computeValue() which lead me to two things:
    1. Was the computation a success or a failure
    2. Which value is encapsulated as result of the method

    The blog post missed some important parts IMO, it shows only half of the interesting bits. So we can easily and concisly define
    a container class in scala using named and default parameters, fine. But how do you do ‘checkErrorConditionOK’ ?
    like this?
    def checkErrorConditionOK(c:ComputedValue) = c.value > 0
    or
    def checkErrorConditionOK(c:ComputedValue) = c.errorCode == 0

    using patternmatching and Either you can throw away your ifs and do it like that:

    computeValue() match {
    case Right(s) => // do soemthing good
    case Left(e) => // do something bad

    You could easily use a container class for each case where such a oppertunity arise, but personally i would not do it.
    Yes you have to read up about Either and how it works but it is already included in the standard library, why not use it?
    As a general rule, the type signature of a method:
    def computeValue():ComputedValue
    shows much less about what the method is actually doing than:
    def computeValue():Either[Int,Int]
    Eventually it might be a matter of taste but to me the second signature is more informative.

    here some more examples:
    http://stackoverflow.com/questions/1193333/using-either-to-process-failures-in-scala-code

    ps: scala is deep have fun exploring :)
    regards andreas

    ReplyDelete
  11. From Cody@Lutz: If you're immediately going to assign _1 and _2 to variables with better names,
    why not just return a data type with better names?

    Well I thought about it and I agree with you in case of
    return types of public methods.

    For private methods the alternative would be to write private static final
    classes for each type combination. The class name would be the same
    (except capital first letter) as the "tuple" holding reference like
    ColumnAndWidth columnAndWidth = getColumnAndWidth(row,index).

    I agree that one couldn't make too much errors writing such
    tuple-value-classes, but when I am working within the internals of
    a class I accept the small semantic gap, and it's just too convenient
    to write new Pair(column,width) rather than new ColumnAndWidth(column, width).

    I also experienced, that in the majority of cases I just needed to
    return *pairs*, with additionally no adequate name which I can give
    these pairs other than class And.

    For tuple with more than 2 items I made my own "customized tuple value class"
    as you suggested, with meaningful names, even when within a
    class implementation.

    Because this didn't happen too often reuse had not been an issue anyway.

    ReplyDelete
  12. sorry I meant:
    I also experienced, that in the majority of cases when I just needed to return *pairs*(rather than 3 or more object). For an own ComputedPairValue class though the most appropriate name would be "Item1AndItem2" in those cases. IMHO reusing a generic Pair class outweights the small advantage of meaningful names for a small part of the code in the internals of a class.

    ReplyDelete
  13. I won't be as categoric as Stephan but I agree that it's certainly not a very common problem, so calling it an anti-pattern is a bit of an exaggeration.

    As for named parameters, they are pretty easy to emulate in Java:

    new ComputedValue().value(0).errorCode(42);

    I do agree that Scala's compact declaration of classes and support of properties is a clear win over Java.

    ReplyDelete
  14. Unless I'm mistaken, the point of the post was about multiple return values, but the choice of illustrating it with a value and an error code has distracted some commenters to comment on the exception/return code/Either instead.

    You will notice that the Google Collections do *not* contain a Pair type. We agonized for a very long time about this decision but we decided that a Pair type is often enough a bad choice that we shouldn't encourage it. It's usually worth creating a specific type to express your intent better, especially when the creation of such a type is made lightweight by a language construct (e.g. a Scala trait).

    As for return codes and exceptions: I really can't think of an example where a return code is a superior approach to an exception. HTTP is not a good example of this since the value is a *result* code, not an error code. 200 and 404 are just equally valid you just need to handle them differently. 404 is not exceptional, and therefore should not be treated as an exception.

    ReplyDelete
  15. It's Really helpful information for Java Development.

    As per the general concept, it means "programmers who are skilled in their development segments like PHP Java dot Net etc."

    ReplyDelete