Block Translators - parsing magic
If you need to parse expressions in Pharo you have the choice between a few parser frameworks. E.g.:
All these parser generators are great if you need to parse a given textual input (a string). In some cases however this is complete overkill. Especially if you need to (dynamically) translate a Smalltalk expression into something "different".
Translating Smalltalk expressions into "something different" is exactly the usecase for "Block Translators" described in this chapter.
During this chapter we'll develop a simple Translator which is able to translate Smalltalk expressions like
(customer joinDate year is: Date today year) into an equivalent SQL-like Expression like
(YEAR(customers.joinDate) = 2014).
We'll be guided by the debugger.
I.e. will implement just enough code to address the current issue in the Debugger.
So it's run, debug, implement, repeat.
The Translator will neither be complete in terms of operations nor neatly refactored like you would expect for production code. But it should be able to show the general idea how to create Translators which convert a Smalltalk expression into something different.
2. Smalltalk collection messages as SQL Expression
Smalltalk's collection messages like
#detect:ifNone are one of the best features of the class library.
Most SQL/ORM Frameworks for Smalltalk include a feature to express SQL expressions as Smalltalk code. So something like script 2.1 should be translated into something like script 2.2.
One way would be to hook into the Smalltalk compiler and build the SQL-like expression from the AST. Another would be to ignore the Smalltalk side completely and parse Strings via a Parser into those expressions (again using graphs/ASTs). But in some cases a simpler approach with "Message Recording" objects is more than sufficient.
3. Blocks as Parsers/Translators
Let's start with the previous expression from above.
What happens if we wrap it into a "select Block" (i.e. for a
#select: message or similar) and call
#value: with an arbitrary value?
nil for now.
If we execute the Script 3.1 we'll get an error message
MessageNotUnderstood: UndefinedObject>>surname (Figure 4.1).
And it's clear why: Executing the block binds
And the first message sent to
This of course raises an error because
UndefinedObject) does not understand
But what would happen if we use another object (let's call it a
SQLTable would understand
#surname and respond with something useful - i.e. a
SQLColumn named accordingly in this case.
If we keep up resolving to "useful" objects we'll end up with a graph of objects expressing the original expression!
The "hard" parsing work is done by the Smalltalk compiler itself. Our job is only to record any (useful) message sent to our Translator objects and respond with other useful objects to continue the proccess until everything is parsed. Once we're finished we can then use this graph of objects to create our "translated" language.
The following code snippets should be enough to build some working code (Copy&Paste should work). If you want to see the complete code you can find it in the BlockParsers project under http://www.smalltalkhub.com/#!/~UdoSchneider/BlockParser.
4. SQL Translator
4.1. SQL tables
We'll add bits and pieces of code along the chapter. Always just enough to hit the next Debugger. This will give us enough clues about how to proceed:
The first class we need to create is
SQLTable to bind to the
Make it a subclass of
It also needs to store the table name in an instance variable.
So we need to add instance creation methods to set the
name of the table (Script 4.1).
Try the new class and call the block with it (Script 4.2):
Executing this snippet will result in an error because (again)
#surname is not understood (Figure 4.1):
customer in the block is an
SQLTable instance (or to be more specific a table row) then the semantic meaning of
customer surname is to get its `surname` property - or to stick with SQL; to get a column with that name.
4.2. SQL columns
Because columns can participate in relations we'll create an
SQLColumn class as subclass of
SQLTerm (Script 4.3).
We also add methods to set the owning
We also need to add behaviour to
SQLTable to return an
SQLColumn instance when it recieves an unknown unary message.
To make things easier we'll intercept each unary message sent to a
SQLTable instance and return an
SQLColumn instance which knows its originating table and its name.
So we'll add that behavior do
#doesNotUnderstand (Script 4.4):
In a "real" implementation you might want to check the selector name.
If its a known column name (you have the schema? Don't you?) you'd return the column.
super (Figure 4.2).
Running the snippet now yields an "
SQLColumn(Object)>>doesNotUnderstand: #is:" error (Figure 4.2).
#is: is an equality check.
In a generalized way equality is an operation with equality (
=) as operator and two (left, right) terms.
4.3. SQL expressions
Every SQL term (columns included) might be combined with a constant or another term by using an operator.
SQLExpression stores the operand (like
*, ...), a left and right a term (Script 4.5).
We are sending
#asSQLComponent to both terms here.
The left term should always be a subclass of
The right side however might also be a constant (originating from Smalltalk code).
#asSQLComponent provides the possibility to wrap constants in a
SQLConstant (sub-)class (Script 4.6).
Now we need to implement
#asSQLComponent in some classes which might appear in expressions (Script 4.7):
For now we only implement
In production you might want to use different
SQLConstant subclasses for different kind of constants like
Dates to deal with the target expressions formatting.
4.4. Equality (
#is: as an comparison operator in
SQLTerm to return an
SQLExpression (Script 4.8).
Why do we use
#is: instead of
#= instead of implementing
#is: is a double edged sword.
Especially in our case because we'd change the semantics of the message.
We won't return a
Boolean any longer - we'll return something different!
#= to answer a non-
Boolean leads to interesting effects down the line ... you have been warned ...
Let's see how far we get now:
We'll get an Error message
MessageNotUnderstood: SQLExpression>>or: (Figure 4.3).
4.5. Boolean Operators
SQLTerms can be combined using Boolean Operators. So let's implement
SQLTerm>>#and: (Script 4.9).
Our implementation does not use regular blocks as arguments.
You can use blocks in your implementation though.
Just be warned that the compiler/VM might inline sends of
#or: if the argument is a block!
#not is not an expression - not an operator "between" to terms.
It's an Operator applied to one term.
So it's best expressed as a function!
Running the code snippet complains about an
SQLColumn instance not understanding
#year (Figure 4.4).
Semantically I'd say that something like
tableName columnName year is like calling a function:
4.6. SQL functions
Every unary message sent to an
SQLTerm should result in a
SQLFunction wrapping it (Scripe 4.10):
We'll also implement
SQLTerm>>#doesNotUnderstand: to return
#doesNotUnderstand: is the quick and dirty solution here.
If you have a limited number of functions you can also implement them as methods directly.
Running the script we now get an Error message
MessageNotUnderstood: SQLExpression>>gt: (Figure 4.5).
So the next method we need is greater than.
We'll implement these using similar to
SQLTerm>>#is: (Script 4.12):
Executing the expressions again raises no Error. We made it! The expression parses (Figure 4.6)! Inspecting the result of our snippet in the inspector shows a nice graph of objects which we'll use in the next step to create the SQL String.
5. SQL Generator
Now that we have a nice graph (Figure 4.7) of objects let's try to create the SQL string from it:
Implement the messages
SQLComponent (Script 5.1).
#printSqlOn: is a subclass responsibility and should be implemented by all subclasses:
Now let's try our "implement until next error" approach again using the next Workspace code (Script 5.2).
We'll get an error
SubclassResponsibility: SQLExpression had the subclass responsibility to implement #printSqlOn: (Figure 5.1):
So Pharo is telling us exactly what to do next.
From now on we'll simply implement
#printSqlOn: in all the classes until we finally get the string without error (Script 5.3).
As you can see we simply output the information either directly or by delegating
#printSqlOn: to child nodes.
Finally our translator works and yields the expected result (Script 5.4).
Hopefully this chapter was able to show you (in an understandable way?) how to use "Block Parsers/Translators" to parse Smalltalk expressions and translate them into something "different". This example is neither comprehensive nor production ready. In a production setup you'd have to think a lot more about different subclasses e.g. for constants, functions ... even if it's "just" for printing constants correctly. But the skeleton should be the same.
7.1. Method names
Overriding some methods (esp.
#=) is a pretty bad idea.
customer name = 'Schneider' is easier to read and write than
customer name is: 'Schneider'.
#= with different semantics is a sure recipe for disaster!
You should also be careful with "
Boolean-ish" methods like
These methods are sometimes inlined by the compiler and you'll get warnings about one of the operands being a non-
7.2. Order of expressions
The whole approach bases on the idea of intercepting messages sent to an object (to be able to respond with "another" intercepting object). So make sure that in each and every expression the objects you put into the block (or derivates thereof) are always the recieving objects (left side in operations). Everything else will fail.
Two expressions might be semantically identical/equal in Smalltalk yet yield different results when used with Block Parsers (Script 7.1).
7.3. Expressions only! ... mostly ...
This approach does work fine if you want to translate an expression - even a compound one.
Expressions (e.g. for filtering) are traditionally used for Collection messages like
Expressions with temporal variables (Script 7.2) do work.
Expressions with multiple statement (Script 7.3) do not!
Only the expression for the second use
surname is: 'Mueller' is returned an can be translated.
You can of course use a builder in the background and record "new" expressions - i.e. if the initial object passed in receives a message.
But that's not completely safe - especially if you didn't refactor all temp variables.
But if you stick to expressions in Blocks (although it also works fine for expressions in methods!) it's more likely to not hit that limitation.
7.4. Prior Art
The method presented in thie chapter is nothing "new". At least two frameworks are/were using are similar approach to create SQL query strings from Smalltalk blocks.