A History of the Groovy Programming Language
2020-07-19
This is a series of posts where I jot down my learnings from computer science papers I’ve read. Previously, Epigrams on Programming.
This paper was published as part of HOPL. I have written about other papers from HOPL as well:
Groovy in a a few paragraphs
Key characteristics of Groovy, good for writing scripts, terse.
Highlights of Groovy (see Listing 2 in paper):
- regex match
=~
like bash (also Perl), and is sugar for Java’sMatcher
. def
is the keyword to name things- destructuring/multi-assignment (now present in many languages, Python, JS etc)
From Fig 1 in paper:
- @Grab annotation, automatically download dependency and add to classpath 1
- operator overloading
- custom transformation script for output to GroovyConsole
Arguably, every program is trying to solve a problem in a particular domain; i.e. every program is a DSL. We aren’t advocating writing such output and compiler customization scripts for every program. But perhaps thinking carefully about our nouns and verbs, and pondering whether to grow our language just a little to meet the problem at hand, might perhaps be a useful technique in striving for clear maintainable code. - 76:8
Declarative way to define immutable classes using AST transforms.
In 2004, Groovy was submitted as a Java Specification Request (JSR). Developers behind Groovy worked to prepare for this, trying to finish the deliverables required for JSR approval. This included Groovy Language Specification (GLS), a complete compatibility test suite.
Operator overloading
Does not support arbitrary custom operators - concerns that the language would be too Perl or APL-like. Fixed set of operators that can be overridden (C++), simple ASCII-based operators plus a few.
Unicode operators rejected to keep barriers to language low. Unicode characters allowed in method, field, local variable names. (JavaScript)
Haskell allows for custom operators, did it lead to proliferation of ASCII art operators that makes code hard to read and use? I’ve heard about how Lens exposes some pretty tricky to learn operators, but am not familiar with it at all.
Q: What modern languages support Unicode operators? If you were creating a new language today, is it a good idea to support this? Capability to misuse does not mean it will be misused. Maybe we can trust the community to exercise good judgement and keep readability in mind.
Operator overloading syntax works a bit different from C++, each symbol maps to a named method, +
is a.plus
. C++ uses the operator+
member function.
Compound operators cannot be separately overridden, if +
is overridden, +=
will automatically use it.
Command chains
Chain method calls without brackets and dots, a b c d
is parsed as a(b).c(d)
, useful for expressions with an even number of terms, with special conventions for comma separated or named arguments. Last method/argument pair can be a single property access term, to allow for odd number of terms.
This makes for a more natural language style, and simple DSL creation, e.g.:
move forward at 3.km/h -> move(forward).at(3.getKm().div(h))
This is the one of the biggest cause of ambiguities in the language grammar, and adds complexity to the parser.
Command chains make calls more English-like, and readable in that sense. But if we take the meaning of “readable” to be “understand what the machine does”, it might not be. I would have to run a tiny parser in my head to figure out if I am calling the functions in the way I expect.
Perhaps this is unavoidable as you try to make a language close to English, since English is very ambiguous. Perhaps limiting the programming language to a strict subset of English (adjective-subject-verb-object) can be a nice middle ground. Or it could drop into the uncanny valley of languages.
Builders
Intercept calls to missing methods on the builder class, then use the name of the missing method to build XML structures.
builder.root {
firstLevel
}
root
and firstLevel
are not methods defined on builder
, instead the calls are intercepted, and nodes with those names are emitted instead.
Contrast this to:
builder.node("root") {
node("firstLevel")
}
Where node
is a method defined on builder
, and the names of the XML nodes are retrieved from the arguments, rather than intercepted via the call to missing methods.
Elvis operator
I’ve seen this operator around, and finally know which langauge pioneered its use: x ?:y
is shorthand for x ? x : y
.
Bracket-free named parameter calls
copy from: srcDir, to: destDir
is copy(from: srcDir, to:destDir)
, note that the comma is required, otherwise it will be a syntax error, parsed as copy(from: srcDir).to:destDir
. This reminds me a lot of labelled arguments (in various languages, OCaml, Swift).
I’m not sure if I like the Groovy style (with whitespace) compared to the parentheses one, which is more common. One thing for sure, these keyword/labelled arguments are extremely useful for making code more readable. In fact, Swift has support for both argument label (used when calling the function) and parameter name (used within the function), which is really useful for writing more readable code
Mixing with Java
Groovy recognised Java classes on the classpath and produced standard class files that could be made available to Java, also via the classpath. Hence Groovy and Java could be used together in hybrid projects but with some limitations.
Similar to Clojure, being able to play nice with Java ecosystem is important.
Method names
In Listing 16, we see a method defined with the name def "maximum of #a and #b is #c"
. Method names can be arbitrary strings.
Do method calls then look like "a method name"()
? Does it work with the brackets-free call, "a method name" arg
?
AST transforms
Groovy’s compiler is a multi-phase pipeline, each phase is made up of numerous internal transformations. AST transforms provides standard hooks for user-written transformations.
Annotations can be added to class properties to trigger a local AST transformation (like @Property
).
This declarative approach is nice, but limited. Annotations can only be placed at certain places. Also the interaction of two transforms that modify AST can lead to unexpected results.
AST transforms requires knowledge of the AST that Groovy uses. With macros, you write Groovy code, and it is transformed into AST and inserted. Macros can also use to pattern match Groovy code, instead of matching by AST structure.
Feature interaction
Suppose a @Trace
annotation logs every method call by injecting a print statement at the start of every method. And a @ToString
annotation that injects a toString()
method into a class. If we add both annotations to a class, should the toString()
method call be traced? It depends on which annotation is applied first.
Order of AST transforms not guaranteed by the compiler, but usually in order of appearance in source.
Some frameworks add a notation of priority to annotations. Transforms can communicate with one another using node metadata as well. Making transforms idempotent, and running to fixpoint might be a way to fix things, but needs to keep in mind divergence and compile speed.
This reminds me of a talk on Julia’s multiple dispatch, which makes it really easy for packages to work well together. One example was a package to track numerical errors, and a package to plot graphs. Combining both of them was seamless due to multiple dispatch, and allowed for plotting graphs with error bars.
Static typing
Groovy started out as a dynamic-only counterpart of Java. Groovy uses liked Groovy’s succinct syntax, but wanted better checking of errors.
Also pressure from languages like Kotlin, even Java itself, and use of Groovy in places where type checking will be valued.
Users preferred not to change how they wrote their code, even with static typing.
Some users were more interested in the possibility of performance improvements, rather than the error checking benefits. Without dynamic behavior, the bytecode can be a direct method call instead of going through Groovy’s runtime.
Gradual typing was discarded because it wasn’t backwards compatible with existing Groovy code. Groovy had the concept of optional typing, which indicated that the type would be checked at runtime, rather than duck typing.
Compiler modes to turn on type checking was also rejected because it was too coarse grained. It also made it non-obvious whether a particular snippet of code was static or dynamic. Furthermore, a compiler mode cannot support mixing static and dynamic code within a class.
The mode could be applied per method via annotations, and on a class as a shorthand for all methods. This will also allow a method level annotation to override the class level one.
When in static typing mode, should it only do type checks, or perform optimizations for better code generation. Bytecode generation was done in a separate part of the compiler, so that would need to change if the latter was chosen.
Eventually, two annotations:
@TypeChecked
to enable type checking only@CompileStatic
for static type check and generate optimized bytecode (call methods directly without using Groovy runtime)
Type inference required so that static typing can be allowed without changing the way users write Groovy.
Flow sensitive typing (or flow typing) to track type changes to a name. A variable can change it’s type, but each use of the variable is type-safe.
More common now, see Go and Deno.↩︎