Function overloading by return type?

Question

Contrary to what others are saying, overloading by return type is possible and is done by some modern languages. The usual objection is that in code like

int func();
string func();
int main() { func(); }

you can’t tell which func() is being called. This can be resolved in a few ways:

Have a predictable method to determine which function is called in such a situation.
Whenever such a situation occurs, it’s a compile-time error. However, have a syntax that allows the programmer to disambiguate, e.g. int main() { (string)func(); }.
Don’t have side effects. If you don’t have side effects and you never use the return value of a function, then the compiler can avoid ever calling the function in the first place.

Two of the languages I regularly (ab)use overload by return type: Perl and Haskell. Let me describe what they do.

In Perl, there is a fundamental distinction between scalar and list context (and others, but we’ll pretend there are two). Every built-in function in Perl can do different things depending on the context in which it is called. For example, the join operator forces list context (on the thing being joined) while the scalar operator forces scalar context, so compare:

print join " ", localtime(); # printed "58 11 2 14 0 109 3 13 0" for me right now
print scalar localtime(); # printed "Wed Jan 14 02:12:44 2009" for me right now.

Every operator in Perl does something in scalar context and something in list context, and they may be different, as illustrated. (This isn’t just for random operators like localtime. If you use an array @a in list context, it returns the array, while in scalar context, it returns the number of elements. So for example print @a prints out the elements, while print 0+@a prints the size.) Furthermore, every operator can force a context, e.g. addition + forces scalar context. Every entry in man perlfunc documents this. For example, here is part of the entry for glob EXPR:

In list context, returns a (possibly
empty) list of filename expansions on
the value of EXPR such as the standard
Unix shell /bin/csh would do. In
scalar context, glob iterates through
such filename expansions, returning
undef when the list is exhausted.

Now, what’s the relation between list and scalar context? Well, man perlfunc says

Remember the following important rule:
There is no rule that relates the
behavior of an expression in list
context to its behavior in scalar
context, or vice versa. It might do
two totally different things. Each
operator and function decides which
sort of value it would be most
appropriate to return in scalar
context. Some operators return the
length of the list that would have
been returned in list context. Some
operators return the first value in
the list. Some operators return the
last value in the list. Some
operators return a count of successful
operations. In general, they do what
you want, unless you want consistency.

so it’s not a simple matter of having a single function, and then you do simple conversion at the end. In fact, I chose the localtime example for that reason.

It’s not just the built-ins that have this behavior. Any user can define such a function using wantarray, which allows you to distinguish between list, scalar, and void context. So, for example, you can decide to do nothing if you’re being called in void context.

Now, you may complain that this isn’t true overloading by return value because you only have one function, which is told the context it’s called in and then acts on that information. However, this is clearly equivalent (and analogous to how Perl doesn’t allow usual overloading literally, but a function can just examine its arguments). Moreover, it nicely resolves the ambiguous situation mentioned at the beginning of this response. Perl doesn’t complain that it doesn’t know which method to call; it just calls it. All it has to do is figure out what context the function was called in, which is always possible:

sub func {
    if( not defined wantarray ) {
        print "void\n";
    } elsif( wantarray ) {
        print "list\n";
    } else {
        print "scalar\n";
    }
}

func(); # prints "void"
() = func(); # prints "list"
0+func(); # prints "scalar"

(Note: I may sometimes say Perl operator when I mean function. This is not crucial to this discussion.)

Haskell takes the other approach, namely to not have side effects. It also has a strong type system, and so you can write code like the following:

main = do n <- readLn
          print (sqrt n) -- note that this is aligned below the n, if you care to run this

This code reads a floating point number from standard input, and prints its square root. But what is surprising about this? Well, the type of readLn is readLn :: Read a => IO a. What this means is that for any type that can be Read (formally, every type that is an instance of the Read type class), readLn can read it. How did Haskell know that I wanted to read a floating point number? Well, the type of sqrt is sqrt :: Floating a => a -> a, which essentially means that sqrt can only accept floating point numbers as inputs, and so Haskell inferred what I wanted.

What happens when Haskell can’t infer what I want? Well, there a few possibilities. If I don’t use the return value at all, Haskell simply won’t call the function in the first place. However, if I do use the return value, then Haskell will complain that it can’t infer the type:

main = do n <- readLn
          print n
-- this program results in a compile-time error "Unresolved top-level overloading"

I can resolve the ambiguity by specifying the type I want:

main = do n <- readLn
          print (n::Int)
-- this compiles (and does what I want)

Anyway, what this whole discussion means is that overloading by return value is possible and is done, which answers part of your question.

The other part of your question is why more languages don’t do it. I’ll let others answer that. However, a few comments: the principle reason is probably that the opportunity for confusion is truly greater here than in overloading by argument type. You can also look at rationales from individual languages:

Ada: “It might appear that the simplest overload resolution rule is to use everything – all information from as wide a context as possible – to resolve the overloaded reference. This rule may be simple, but it is not helpful. It requires the human reader to scan arbitrarily large pieces of text, and to make arbitrarily complex inferences (such as (g) above). We believe that a better rule is one that makes explicit the task a human reader or a compiler must perform, and that makes this task as natural for the human reader as possible.”

C++ (subsection 7.4.1of Bjarne Stroustrup’s “The C++ Programming Language”): “Return types are not considered in overload resolution. The reason is to keep resolution for an individual operator or function call context-independent. Consider:

float sqrt(float);
double sqrt(double);

void f(double da, float fla)
{
    float fl = sqrt(da);     // call sqrt(double)
    double d = sqrt(da); // call sqrt(double)
    fl = sqrt(fla);            // call sqrt(float)
    d = sqrt(fla);             // call sqrt(float)
}

If the return type were taken into account, it would no longer be possible to look at a call of sqrt() in isolation and determine which function was called.” (Note, for comparison, that in Haskell there are no implicit conversions.)

Java (Java Language Specification 9.4.1): “One of the inherited methods must be return-type-substitutable for every other inherited method, or else a compile-time error occurs.” (Yes, I know this doesn’t give a rationale. I’m sure the rationale is given by Gosling in “the Java Programming Language”. Maybe someone has a copy? I bet it’s the “principle of least surprise” in essence.) However, fun fact about Java: the JVM allows overloading by return value! This is used, for example, in Scala, and can be accessed directly through Java as well by playing around with internals.

PS. As a final note, it is actually possible to overload by return value in C++ with a trick. Witness:

struct func {
    operator string() { return "1";}
    operator int() { return 2; }
};

int main( ) {
    int x    = func(); // calls int version
    string y = func(); // calls string version
    double d = func(); // calls int version
    cout << func() << endl; // calls int version
    func(); // calls neither
}

More Related Contents:

Leave a Comment Cancel reply