January 2018

S M T W T F S
  123456
78910111213
14151617181920
21222324252627
28293031   

Style Credit

Expand Cut Tags

No cut tags
Sunday, August 19th, 2007 12:09 am
Here's something that occurred to me the other day: consider duck-typed languages like Python. The idea of duck typing is that if something walks like a duck, and quacks like a duck, then it might as well be a duck. Or rather, if something walks and quacks in a sufficiently ducklike manner, it doesn't really matter for your purposes whether it actually is a duck or not. Less metaphorically, we don't specify that the arguments to our functions can only accept (say) Integers, we only specify that they have to have an add() method. In fact, duck typing as commonly understood goes further: we don't explicitly specify a necessary interface at all, we specify it implicitly by the methods we call on our arguments.

An example is probably in order. Consider the Python code
def foo(bar, baz) :
    bar.spoffle()
    bar.buffy(baz.willow())
    baz.xander(bar.angel())
From that code, we can deduce that the first argument to foo must have methods called spoffle, buffy and angel, and the second argument must support methods called willow and xander. Now, here comes the clever bit. That's all that's necessary for foo to work. Supposing some clever hacker comes along later and invents a new datatype which it would make sense to foo, all they need to do to allow your code to work on theirs is to implement those methods in a sensible way. This can be done without any knowledge or forethought on your part. Maybe you honestly believe that fooing is something that can only be done to ScoobyGang objects, but in fact the algorithm is much more general, and will work with any TeenSuperHeroTeam. It doesn't matter: your code will still work. To achieve the same effect in Java or Haskell, you'd have to have defined a Fooable interface explicitly, and users of your library would have to start their code with a raft of
instance Fooable TheThem where ...
instance Fooable AquaTeenHungerForce where ...
instance Fooable PowerRangers where ...
type stuff. This is in fact how most of my Haskell code ends up looking: I find typing it almost physically painful. Until the RSI starts to kick in, at which point it becomes actually physically painful. And that's in the good case, where the author of the library I'm trying to use has thought about the kind of generalisation I want to make and defined the relevant interfaces.

OK, so far so standard. Now, most duck-typed languages are dynamic, which means that we only try to determine if bar has a spoffle method at runtime, and die with an error message when it doesn't (possibly after trying some error recovery, e.g. by calling an AUTOLOAD method or similar). But it occurred to me that in simple cases (which is to say, the majority), we could work out most of this stuff statically. For each function definition, see what methods are called on its arguments. Recurse into functions called from that function. Now, each time a function is called, try to work out what methods the arguments will support, and see if that includes the interface required by the function. Issue an error if not. Thus we get the code reuse benefits of duck typing, and the correctness benefits of static checking. If the static checking is slowing down your development cycle too much, drop back to fully dynamic checking, and only run the static checks on your nightly builds or something.

This also cleared up something else I'd been vaguely wondering about. In his splendid Drunken Blog Rant Rich Programmer Food, Steve Yegge says
Another problem is that they believe any type "error", no matter how insignificant it might be to the operation of your personal program at this particular moment, should be treated as a news item worthy of the Wall Street Journal front page. Everyone should throw down their ploughshares and stop working until it's fixed. The concept of a type "warning" never enters the discussion.
I'd wondered at the time what a type warning would mean. When is a type error mild enough to only warrant a warning? Here's one idea: a type warning should be issued when it cannot be proved that an argument to a function implements the interface that function needs; a type error should be issued when it can be proved that it doesn't.

This all seemed very interesting, and struck me as potentially a fun and reasonably easy hacking project, at least to get something workable going. But if it's occurred to me, it has probably occurred to someone else, so I asked the Mythical Perfect Haskell Programmer if he was aware of any work that had been done on static duck-typed languages. "Oh yes," he said, "O'Caml's one." Buhuh? Really? Well, that's O'Caml moved a couple of rungs up my "cool languages to learn" stack...
Sunday, August 19th, 2007 01:07 am (UTC)
In Haskell, you're right, you do this by putting every primitive function in its own typeclass.

And you're also right that generally you can figure that stuff out statically.

But in Haskell (or rather GHC), sadly, you generally don't get that static analysis, and instead you end up with functions passing dictionaries/virtual-function-table-pointers around all the time.
Sunday, August 19th, 2007 08:34 am (UTC)
Interesting - I wasn't really thinking about the code generation end of things, just the correctness-checking end. I find it amusing, actually, that while the kind of optimizations that one could do are often cited as benefits of the Static Way, very few compilers seem to actually do them :-)

[Though dynamic/duck-typed languages could also be a lot faster (http://www.avibryant.com/2006/09/ruby-and-strong.html), apparently.]
Thursday, January 17th, 2008 11:42 pm (UTC)
Seen any of the profiling bits of PyPy? That kind of runtime profiling and optimization seems to be entirely what they're after...
Monday, August 20th, 2007 08:01 am (UTC)
In Haskell, you're right, you do this by putting every primitive function in its own typeclass.

I think it's a bit harder than that: you'd need a typeclass for every function's interface and every intersection of those interfaces that occurs. So if you had
foo buffy = spoffle buffy (winnow buffy)
bar angel = spodify angel (winnow angel)
then you'd need typeclasses as follows:
class Winnow where
    winnow :: a -> Winnowed a

class Fooable where
    spoffle :: Winnow a => a->(Winnowed a)->(Foo a)

class Barable where
    spodify :: Winnow a => a->(Winnowed a)->(Bar a)
Monday, August 20th, 2007 09:45 pm (UTC)
Assuming "spoffle", "spodify", and "winnow" are all primitive functions (that is, they need to be created for all combinations of types that can be passed as parameters):

class Winnow a b | a -> b where
    winnow :: a -> b

class Spoffle a b c | a b -> c where
    spoffle :: a -> b -> c

class Spodify a b c | a b -> c where
    spodify :: a -> b -> c

foo :: (Spoffle whedon winnowt ans, Winnow whedon winnowt) => 
       whedon -> ans
foo buffy = spoffle buffy (winnow buffy)

bar angel = spodify angel (winnow angel)
-- GHCi infers type of bar as
-- bar :: (Spodify a b c, Winnow a b) => a -> c

If instead, winnow was required to return an object that had a particular interface:

-- class Winnowed a where
--   winnowInterface1 :: a -> String
--   winnowInterface2 :: (Num b) => a -> [b]
-- or, more generally:

class WinnowInterface1 a where
  winnowInterface1 :: a -> String
class WinnowInterface2 a where
  winnowInterface2 :: (Num b) => a -> [b]
class (WinnowInterface1 a, WinnowInterface2 a) => Winnowed a

-- instance declaration needs -fallow-undecidable-instances
-- alternatively, you can manually instantiate for all types that
-- support the interface
instance (WinnowInterface1 a, WinnowInterface2 a) => Winnowed a

class Winnowed b => Winnow a b | a -> b where
  winnow :: a -> b

class Spoffle a c | a -> c where
  spoffle :: Winnowed b => a -> b -> c

class Spodify a c | a -> c where
  spodify :: Winnowed b => a -> b -> c

foo :: (Winnow whedon t, Spoffle whedon ans) => whedon -> ans
foo buffy = spoffle buffy (winnow buffy)

bar angel = spodify angel (winnow angel)
-- GHCi infers type of bar as
-- bar :: (Winnow a b, Spodify a c) => a -> c

Monday, August 20th, 2007 09:58 pm (UTC)
The point I was trying to make, if it wasn't clear, was that foo & bar don't need their own typeclasses since they are wholly derived from primitive functions.

And for more fun:
s x y z = x z $ y z
bar = s spodify winnow

typechecks just fine to the correct type, which surprised me!

Monday, August 20th, 2007 11:32 pm (UTC)
Ah, I see. Clever!
(Anonymous)
Sunday, August 19th, 2007 11:23 am (UTC)
Have you ever played with the edit-time type checking offered by modern IDEs such as Eclipse and NetBeans? It's one of the reasons why I've stuck with Java as my preferred language for such a long time...
Sunday, August 19th, 2007 11:38 am (UTC)
sigh @ cookie expiry
Monday, August 20th, 2007 08:03 am (UTC)
No, but I hear great things about them. There's a fair bit of envy in the scripting world over the quality of the development tools available to Java types. I've only used things like Visual Studio's code completion, which I remember as being very handy.
Sunday, August 19th, 2007 12:49 pm (UTC)
VB.NET has some options for type warnings vs type errors, as described here:
http://msdn2.microsoft.com/en-us/library/3y20cc1z(VS.80).aspx

For instance, if I say something like:

Dim x as Integer = 0
MessageBox.Show(x)

then I'm doing implicit type conversion from Integer to String, so that can be a warning or an error depending on your preferences. (I use "Option Strict On", so it's an error for me.)

More generally, I think the key advantage of interfaces is for separate libraries, where you may not know the specific one until runtime. For instance, I've written n-tier applications which have a choice of data layers, so there's just one point in the code where I choose which DLL to load (e.g. SQL Server vs Oracle) and all the rest of my code refers to the interfaces rather than the specific classes. That means that the type checking is done at compile time.

There are also issues if I'm working on a library and I want to change some behaviour, e.g. by no longer supporting a particular method. With interfaces, I can say "I no longer implement IScoobyGang but I do implement IPowerRangers", then make whatever changes I like as long as I stick to those interface definitions. The calling code should check whether a given object implements a particular interface at runtime, and if not then it can handle that appropriately, rather than assuming that my code has a particular method and then crashing mid-function.
Wednesday, August 22nd, 2007 11:43 am (UTC)
Not sure where to start with this. The whole point of duck typing is to enable the kind of code reuse you describe, without having to think about virtualising either the library or the calling code explicitly. With duck typing, you don't have to say "I no longer implement IScoobyGang but I do implement IPowerRangers" - remove the method, and it's done.

I understand the rationale behind explicit interfaces (you make a good point when you say "The calling code should check whether a given object implements a particular interface at runtime, and if not then it can handle that appropriately, rather than assuming that my code has a particular method and then crashing mid-function" - this can be done in dynamic languages, but it's a bit more of a faff), but for the kind of code I write, they're overkill. And I have a very low tolerance for writing unnecessary lines of code.
Tuesday, August 21st, 2007 12:59 pm (UTC)
Pychecker and associated tools do a limited form of type checking, but they only take it so far, because the python type system, when you come down to it, is Turing-complete.

Consider the following interactive session:

>>> class scary(object):
... 	def __getattr__(self, name):
... 		try:
... 			return getattr(self, name)
... 		except:
... 			setattr(self, name, self._genmethod(name))
... 			return getattr(self, name)
... 	def _genmethod(self, name):
... 		def _amethod(*args, **kw):
... 			sargs = [repr(arg) for arg in args]
... 			sargs += ["%s=%s" % (key, repr(val)) for (key, val) in kw.items()]
... 			print "called %s(%s)" % (name, ", ".join(sargs))
... 		return _amethod
...
>>> sc = scary()
>>> [method for method in dir(sc) if not method.startswith('_')]
[]
>>> sc.foo('a', 'b')
called foo('a', 'b')
>>> sc.foo('a', 'b', 'c')
called foo('a', 'b', 'c')
>>> [method for method in dir(sc) if not method.startswith('_')]
['foo']
>>> sc.bar('d')
called bar('d')
>>> [method for method in dir(sc) if not method.startswith('_')]
['bar', 'foo']
>>> sc.baz(harry="potter", ron="weasley", hermione="granger")
called baz(hermione='granger', ron='weasley', harry='potter')
>>> [method for method in dir(sc) if not method.startswith('_')]
['bar', 'baz', 'foo']
>>> 


The method for method bit removes all the attributes on the class that are built-in (start with __) or internal (start with _) by convention. Also, never use a straight except: in real code.

How do you type check that without actually running it?

You can't. So why bother?

The whole point is to ignore the type of the object. Your test are supposed to catch the common errors, and likely as not, the uncommon cases are being done by developers getting thing wrong or testing your code anyway. And they should test their code to catch these common errors.

Given that it's impossible to prove a python program's type correctness, we prefer to not bother, and get on with stuff that actually matters. Yes, having type warnings from pychecker et al can sometimes help, but normally they are more of a hindrance, and the sort of type errors that actually worry you won't be picked up by such tools.
Wednesday, August 22nd, 2007 11:39 am (UTC)
PyChecker sounds like the kind of thing I had in mind, and I was really only thinking of this as an academic/hobby exercise - I'm a confirmed drinker of the dynamic Kool-Aid (http://pozorvlak.livejournal.com/tag/type+systems) myself :-)

And as for
The whole point is to ignore the type of the object. Your test are supposed to catch the common errors, and likely as not, the uncommon cases are being done by developers getting thing wrong or testing your code anyway. And they should test their code to catch these common errors.

Given that it's impossible to prove a python program's type correctness, we prefer to not bother, and get on with stuff that actually matters. Yes, having type warnings from pychecker et al can sometimes help, but normally they are more of a hindrance, and the sort of type errors that actually worry you won't be picked up by such tools.
I can only say: "well said!". The trouble is that this is such a horribly polarised debate: those who believe that compilation is just a weak form and frustrating of testing on one hand (you, me, Steve Yegge, etc) and those who believe that testing is just a weak and labour-intensive form of compilation on the other (the Haskell, Java etc communities - and have you ever tried to argue with Haskell users? The dead, staring eyes, the moans of "monadsss... monadssss..." *shudder*. Almost as bad as Mac users :-) )
Tuesday, August 21st, 2007 05:37 pm (UTC)
An example that would be more to the point would be an object that will only have an attribute added to it when going through a certain code path whose execution you can't guarantee. That really is impossible to typecheck at compile-time.

If you want to see some truly scary python type mangling, I reccommend a peek at some of the Django ORM.
Wednesday, August 22nd, 2007 11:32 am (UTC)
Cool, I'll take a look. Actually, if it's an ORM, I believe Perl's Class::DBI (and probably Ruby on Rails' ORM layer) does something pretty similar.
(Anonymous)
Thursday, January 17th, 2008 04:46 pm (UTC)
Sound to me more like Class::Contract, which allows you to write runtime assertions à la Eiffel.
Thursday, January 17th, 2008 11:39 pm (UTC)
Seconded -- I've glanced through the Django ORM (when I was thinking of writing my own) and that's some pretty involved type mangling all right!
Thursday, August 23rd, 2007 11:26 pm (UTC)
I believe SML, Scala and JavaFX do type inference.

Personally I like to see type information where I can read it, not in the compilers head.
Friday, August 24th, 2007 12:02 am (UTC)
Well, a bunch of languages do type inference at compile-time, but that's not quite what I was talking about - Haskell does type inference, but it uses nominal subtyping (interfaces must be declared explicitly, but the compiler's smart enough to work out which interfaces a function's arguments must implement), as opposed to structural subtyping, where interfaces are formed ad-hoc as described above. I'm not entirely clear on the difference between structural subtyping and duck typing, beyond one name being used by academics and the other by hackers.

I'm a bit confused by type inference too - Haskell's implementors have had to leave out or limit a whole raft of useful features because it would make type inference Too Hard, and yet preferred Haskell style is to give explicit type declarations for all top-level functions (at least). If it makes desirable things impossible, and you're not actually meant to use it, what's it good for besides impressing newbies? I asked [livejournal.com profile] totherme about this, and he said something about type annotations being machine-checkable documentation, but I'm not entirely convinced.

Personally, I like to ignore the way my data is represented, and focus on what it means :-)

Anyway, welcome. I take it you're here via [livejournal.com profile] firefliesinjune?
Friday, August 24th, 2007 01:58 am (UTC)
I guess this is a distinction between types that statically declare themselves to be a Duck, and types that are just an alias for a collection of member signatures.

So to take an example in Java (java.nio.channels specifically): In the former I could pass a Selector to a method requiring a Channel. In the latter, the compiler would detect that I've screwed up (or rather, that I'm being malicious), because although Selector has all the methods necessary to be a Channel, it certainly isn't a Channel. [Channel just has isOpen and close methods.]

Polymorphism and encapsulation sufficiently hide how data is represented as far as I am concerned.

And yes, I was just nosing about having seen a comment of yours on firefliesinjune.
Thursday, May 15th, 2008 02:47 pm (UTC)
Yes, OCaml is one. So is C++!

In particular, the C++ template system is duck-typed. Suppose I write a template with a parameter T, and in the body of the template, it has a method that adds two T's with +, and another method that calls the method void quack(bool). C++ doesn't check this at all until you apply the template, and then it will check that the actual parameter you give for T supports those operations.

OCaml is duck-typed in the sense that it supports structural subtyping for both objects and open variants (tagged sums). The idea, called row types, is actually due to Mitch Wand (http://www.citeulike.org/user/tov/article/1628251) and Didier Remy (http://www.citeulike.org/user/tov/article/600741). These original type systems supported both record subtyping, in the sense that you can pass a record with more fields to a function that knows about only some of them, and record update, in the sense that you can polymorphically append a new field to a record, preserving all the old fields. OCaml sadly did away with the latter. OCaml objects have duck-subtyping, but you can't extend an OCaml object with new methods.

Matthias Blume et al. implemented MLPolyR (http://www.citeulike.org/user/tov/article/1652932), which supports extensible, structurally-subtyped records and sums. I found it really impressive.
Thursday, May 15th, 2008 03:44 pm (UTC)
Cool, thanks! And thanks for your helpful comments on Reddit.