January 2018

S M T W T F S
  123456
78910111213
14151617181920
21222324252627
28293031   

Style Credit

Expand Cut Tags

No cut tags
Sunday, March 30th, 2008 08:40 pm
For the past couple of years (longer, actually), I've been thinking a lot about this business of static and dynamic typing in programming languages. Here's my current position on the matter.

Static and dynamic typing encourage and reward quite remarkably different approaches to the creation of software: everything from the nitty-gritty, (sit and think?)-edit-(compile?)-run-(test?)-(debug?)-(repeat?) cycle to the design (at all levels) of the software to be created, taking in every aspect of the tools and process used to support this design and implementation. Also, the choice of static or dynamic typing is best understood as part of a larger attitude towards development. Unfortunately, the use of a dynamic mindset with a static language not only means that you can't take advantage of the tools available (or even see why they'd be useful), it actively hinders success. The same is true for approaching a dynamic language with a static mindset.

I would therefore like to propose Pozorvlak's Conjectures:
  1. If you find that a modern dynamic type system causes more problems than it solves, you're probably doing it wrong.
  2. If you find that a modern static type system causes more problems than it solves, you're probably doing it wrong.
For instance, I find Haskell's type system causes me many more problems than it solves. Discussions with more expert Haskell users suggest that I am indeed doing it wrong. This guy, on the other hand, finds that "the dynamic languages stick out like sore thumbs because of the whole class of dynamic run time errors that show up that don't even appear in the statically typed languages". If you find this, I claim that it's because your code is not polymorphic enough1. You can get away with having code which only accepts data of very restricted types when the compiler does all the checks for you statically. When it doesn't, you're better off writing code that behaves well for a wider class of inputs, and dynamic languages give you much more facility to do this. Once you learn to code like this, your problems largely go away. As a bonus, your code becomes more flexible, and whole classes of techniques become possible that would have been literally unthinkable before. Similar claims, it appears, can be made for static languages: at the higher levels of mastery, one can use a static type system as a theorem-prover to establish some quite nontrivial properties of your program automatically and across the entire codebase. This allows for some excitingly different approaches to program development.

A corollary is that if you ever finding yourself saying that your port of Feature X to Language Y is better than the original Feature X solely because it's (statically|dynamically) typed and the original Feature X was the other one, you should probably save your breath. It will probably also be worth your while to go back and determine what advantages the opposite choice of typing regimen gave to Feature X's users.

1 A less interesting, but probably valid conjecture is that you're also not testing enough, or at least testing the wrong things. But this can't be the only answer. Dynamic programmers, in general, are not idiots; they are usually also Lazy, in the good sense. They're smart enough to work out that writing the equivalent isa_ok() test every time they would have written a type declaration in Java or whatever is no time-saver at all. Hence, they must need less type information overall for their code to be correct.
Sunday, March 30th, 2008 08:35 pm (UTC)
Agreed.
Sunday, March 30th, 2008 09:32 pm (UTC)
Excellent :-)
Sunday, March 30th, 2008 10:22 pm (UTC)
Hmmm. Sounds extremely sensible, though I can't really say much from my own experiences as my statically-typed language knowledge is limited to C and C++ (plus, if you're generous enough to consider it a programming language, T-SQL).
Sunday, March 30th, 2008 11:34 pm (UTC)
Hey, it's Turing-complete, that works for me :-)

You could make a reasonable case that the C and C++ type systems do indeed cause more problems than they solve - C's for allowing, nay, forcing you to do too many unsafe things on the bare metal, and C++s for its sheer complexity and allowing you to do unsafe/bare-metal things. For "modern static type system" you should probably read "ML-style or better", but really there's a cunning circularity in the definition of "modern", which acts as a get-out-of-jail card for the Conjectures :-)
Monday, March 31st, 2008 10:24 am (UTC)
All sounds pretty sensible to me :)

Further to you corollary, I'd note that making something (statically|dynamically) typed is a huge advantage if everything else in your system is going to be (statically|dynamically) typed. This is automatable going in the static->dynamic direction (just insert a bunch of runtime checks at the interfaces), but not necessarily the other way around (possible if the dynamic code happens to have been written in a pretty typey way to begin with - but if it was, then the author should probably have been using a static system in the first place. This is the subject of ongoing research by Smart People).

So, some obvious questions include "Is one style better for any clearly definable problem domain?" and the related "Why do people favour one style over the other?"

I think the first question is much harder to answer, and the second one potentially more interesting :)
Monday, March 31st, 2008 11:03 am (UTC)
Corollary: that's a really good point. Yes, library authors should be entitled to put as a bullet point "Library X' is a good citizen of $type_discipline", but that doesn't automatically make Library X' better than the original Library X, which was presumably a good citizen of its own language and type discipline :-)

I agree that it would be possible to automate static->dynamic library conversion, but you're unlikely to end up with a very natural-feeling library like that. I'm thinking about the Ruby DOM libraries there - weren't they later supplanted by more native-feeling interfaces?

I suspect a lot of the answer to your second question is "experience and training", but no doubt a component is "temperament" - and that's the interesting bit.
Monday, March 31st, 2008 11:47 am (UTC)
Related to the library porting issue, it's interesting to compare Lisp macros with Template Haskell. Lisp macros can be explained in a sentence: "they output a sexpr which is spliced into your code and then executed", whereas TH introduces a complex API for describing the structure of Haskell code. Macros in TH are more complex than their Lisp equivalents, but can be typechecked; they can only be run at compile-time, but provide stronger guarantees. Lispers get simplicity and power, and Haskellers get static checks and guarantees: everybody feels like they've got the better half of the bargain :-)
(Anonymous)
Tuesday, April 1st, 2008 09:21 am (UTC)
In this case, you're right, but TH does syntactic transformations just like lisp macros, and there's no guarantee that TH's transformations are type-preserving. If there were, then you'd _really_ be in static land. But the reason that lisp macros are simple is because lisp's syntax is simple; the reason TH is complicated is because Haskell's syntax is (comparatively) complicated. If you gave lisp a polymorphic lambda calculus type system, I'm pretty sure macros would carry over almost unscathed. That is, if there weren't a whole culture of lisp idioms that are not well-typed.

I certainly agree, though, that there is no answer to whether static or dynamic typing is better. Or no, I should revise that statement. I agree that currently there is no answer. But I believe in the future static could be better; never dynamic. If someone invents a type system which can support almost all dynamic idioms, still catch errors, and perform inference, then I will leave all my dynamics behind. From my experience with studying type systems, this is highly unlikely :-)

-Luke (http://luqui.org/blog)
Tuesday, April 1st, 2008 10:25 am (UTC)
Well, TH does all its transformation at compile-time, and the output is fed to the type-checker then and there: if your macro produces ill-typed code, you learn about it quickly. Whereas in Lisp, as I understand it, macros can be invoked at runtime, causing type errors while the program's actually running.

I remember reading somewhere that some well-known theorem, possibly Rice's Theorem (http://en.wikipedia.org/wiki/Rice's_Theorem), implies that
  • it's impossible to write a static type system that catches all bugs;
  • the stronger you make your type system, the more valid code will be rejected.
. Hence the perfect static type system you want is actually impossible. It also seems to be the case that inference imposes a large cost on a type system: if you want your compiler to infer types, you're greatly limited in the features your type system can offer.
(Anonymous)
Wednesday, April 2nd, 2008 11:58 pm (UTC)
I think you have those points backwards!

1. Rejecting all buggy programs, for some class of bugs, at the cost of also rejecting some valid programs is precisely what most (published) type do. "soft type systems" are the ones that let you run any valid program you want, and only catch some bugs. A type system should
  • Reject all buggy programs
  • Accept all bug-free programs (an untyped program is "accepted" if there is any way of assigning types to get a well-typed program)
  • Have decidable type inference (starting from a completely untyped program)
The full theorem says you get two. It's actually possible to get two. Say you start with the untyped lambda calculus, and want to avoid expressions that don't terminate.
  • The untyped lambda calculus already trivially accepts all terminating expressions and has decidable type inference (but also accepts all diverging expressions).
  • The simply typed lambda calculus rejects all non-terminating expressions and has decidable type inference (but rejects plenty of good programs too).
  • System F accepts all terminating expressions and rejects all nonterminating expressions (but inference is undecidable).
2 "Stronger" doesn't always mean the same thing, but lots of the work on already-sound type systems are about accepting more valid programs while still completely excluding some class of bugs. If you are just worried about keeping the wrong primitive types aways from case statements then simple types, Hindley-Milner and MLF are all safe, but the stronger ones accept more programs.

There's a "full-employment" theorem here, saying you can improve a type system on one of the axes above without making it worse on the other two. Trivially, just hardcode some program mishandled by the existing system - maybe there's a more interesting result? (You can quantify decidability by looking at how much type information you have to add for your algorithm to infer the rest).

A perfect type system (scratch the static!) is impossible, but that means any type system can be improved.

-Brandon (brandon_m_moore on Yahoo)
Monday, March 31st, 2008 11:08 am (UTC)
Found you via a second-order reddit link.

This has to be one of the best notes on S/D I've seen, basically because it transcends the two-ignorant-camps default answer.

My background is C#, but I'm playing with python and ruby and lisp for small stuff. My problem is that I can't see how to scale dynamic languages to big stuff. You're making me think I'm probably going to need a seriously different approach. Any advice on how to learn large-scale dynamic development?

Here's the kind of problem I'm worried about. I have a function I use throughout my codebase. Let's say I'm using python, and I've added a logMessage(msg) function. Now let's say I need to add an extra parameter (say, severity). How do I go about adding that parameter to every call? Static typing makes it easy. Is it possible to do in a dynamic language?
Monday, March 31st, 2008 11:33 am (UTC)
OK, the first thing to note is that dynamic languages mostly deal with the problem of large codebases by keeping them small in the first place. So yes, the kind of thing you mention is a bit of a problem. Some partial solutions that spring to mind:
  1. Use of a refactoring browser, like Python's Bicycle Repair Man (http://bicyclerepair.sourceforge.net/) (disclaimer: I've never used it).
  2. grep. Problematic if you have many different functions called logMessage, but why would you want to do that? :-) In this case, I think you'd be OK.
  3. Making the severity parameter optional, and giving it a sensible default.
  4. Depending on your logging needs, you might be able to isolate your calls to logMessage in some sort of aspect/hook/advice/thingy.


OTOH: you're unit testing, right? And your tests have 100% statement coverage, as measured by eg coverage (http://nedbatchelder.com/code/modules/coverage.html)? Then any broken calls will be picked up :-)
Monday, March 31st, 2008 12:40 pm (UTC)
Hi. Thanks for answering! I'm going to look at Bicycle Repair Man -- looks interesting. I've been writing python extensions to a text editor (http://www.sublimetext.com) and BRM may be very useful for that.

I'm really enjoying python. It seems very well designed and I can knock stuff out quickly. Thing is, I feel like I'm only doing small things, and that the benefits of static typing only really kick in for big projects. The article you linked to on artima.com makes explicit what I'm worried about ("The initial productivity gain of working with a dynamic language can decline as a project's codebase grows, and as refactoring becomes increasingly a chore.")

Unit tests; yes, but not 100% coverage. I've found 100% coverage to be too much. Getting to 100% can involve test code more complex than the situation you are testing, at which point the test becomes most suspicious, and you have to test it...

The problem I've chosen is one that favours static typing, I know, and I think a perfectly reasonable answer is 'this particular task takes more time. However, solving problem X, difficult for static languages, is easy for dynamic languages, and that'll solve more problems over the long term.'

Anyway, thanks for an interesting discussion.
Monday, March 31st, 2008 01:30 pm (UTC)
Glad you're enjoying Python :-)

It's possible that static typing is more useful for larger codebases: it's very hard to tell, since the debate's so polarised, and because of the major complicating factor that different languages take different amounts of space to express the same program, in a way that's a nonlinear function of program size. On the other hand, people have successfully written large projects in dynamic languages (Steve Yegge was talking about his 10,000-line Emacs extension (http://steve-yegge.blogspot.com/2008/03/js2-mode-new-javascript-mode-for-emacs.html) today, and mentioned other extensions three times the size - and that's an extension to Emacs!).

I think the hard part of the adjustment is probably realising that there's an adjustment to be made - after that, it's just conscious practice.
Monday, March 31st, 2008 01:49 pm (UTC)
You're right, it's tricky to judge, and too often there's a bigendian/littleendian feel to arguments. ;)
Monday, March 31st, 2008 01:31 pm (UTC)
As for coverage: are we talking about statement coverage, path coverage or state coverage? The last two are often pretty hard in a stateful language, but getting statement coverage close to 100% shouldn't be that hard...
Monday, March 31st, 2008 02:01 pm (UTC)
Even statement coverage can be tricky if you're trying for real just-test-this-unit unit tests. At least, in static languages they can.

For example; say you want to test a class which parses text fields in a database. You add exception handlers for problems while connecting to and reading from the db. To get a proper unit test, you need to create a mock db object with Connect() and Read() methods, which throws all the different flavours of exception you'd want to handle. Writing that mock object may well be much more complicated than the object you're trying to test, and more error prone.

My experience has been that full-fat unit tests can take an awfully long time, and it's not clear to me that a developer always gets the right bang for his buck over the course of the project.
Edited 2008-03-31 02:01 pm (UTC)
Monday, March 31st, 2008 11:41 am (UTC)
As for learning dynamic programming in the large, I can't help all that much, because the only large projects (more than a couple of thousand lines) I've worked on have been in static languages (and I spent much of my time fighting the type system. Alas). I suspect that the best thing to do would be to find a large project implemented in a dynamic language (Emacs springs to mind, but Rails/Django/Catalyst would probably be big enough) and have a look at that.

On the other hand, you can do an awful lot in ten lines of (say) Perl, particularly if you take advantage of library modules, CPAN, etc. So even if large-scale programming is easier in static languages, that doesn't mean dynamic languages aren't worth learning :-)

I came across this post (http://www.artima.com/weblogs/viewpost.jsp?thread=217080) while answering your other question - I'm not sure I agree, but it's an interesting way to think about the problem.
Tuesday, April 1st, 2008 05:09 am (UTC)
I'm currently engaged in a project attempting to get the best of both worlds – implementing Arc (or a significant fraction thereof), a very dynamic language, in Active Oberon, probably one of the most dynamic statically typed languages around: it separates interfaces from implementations quite nicely, allowing for multiple inheritance of interfaces in a basically single-inheritance object system. It also has very nice concurrency constructs - there's really sweet pipelined version of the Sieve of Eratosthenes in the appendices of the language report(and like any Wirthian language, its power/complexity ratio is very high).
Tuesday, April 1st, 2008 10:27 am (UTC)
Sounds cool! I've never used Oberon,but it sounds interesting from your description. I'll add it to the list...
Tuesday, April 1st, 2008 10:29 am (UTC)
Further to "why choose?": it struck me as I was writing this post that Common Lispers have the choice of using either typing regimen, or mixing-and-matching as they see fit, and they've mostly standardised on using dynamic typing by default, and static typing as a performance hack. Of course, things might be different if CL's type engine did inference and generics...
(Anonymous)
Sunday, April 13th, 2008 09:12 pm (UTC)
Just imagine: here I am, reading this thread and wondering whether I should post a comment and talk about Active Oberon, and some fellow has beaten me to it! Well done, I say! I have written a realtime voxel raytracer in this language. soren dot renner at g ma i ldot com