"Block Translators - parsing magic" by Udo Schneider on at UdoSchneider/6bqsrq7ut85ddotzjy2xqrtth under #Smalltalk, #Pharo, #Parser   2 thanks

Block Translators - parsing magic

1. Introduction

If you need to parse expressions in Pharo you have the choice between a few parser frameworks. E.g.:

All these parser generators are great if you need to parse a given textual input (a string). In some cases however this is complete overkill. Especially if you need to (dynamically) translate a Smalltalk expression into something "different".

Translating Smalltalk expressions into "something different" is exactly the usecase for "Block Translators" described in this chapter. During this chapter we'll develop a simple Translator which is able to translate Smalltalk expressions like (customer joinDate year is: Date today year) into an equivalent SQL-like Expression like (YEAR(customers.joinDate) = 2014). We'll be guided by the debugger. I.e. will implement just enough code to address the current issue in the Debugger. So it's run, debug, implement, repeat.

The Translator will neither be complete in terms of operations nor neatly refactored like you would expect for production code. But it should be able to show the general idea how to create Translators which convert a Smalltalk expression into something different.

2. Smalltalk collection messages as SQL Expression

Smalltalk's collection messages like #do:, #select: or #detect:ifNone are one of the best features of the class library. Most SQL/ORM Frameworks for Smalltalk include a feature to express SQL expressions as Smalltalk code. So something like script 2.1 should be translated into something like script 2.2.

	(customer surname is: 'Schneider') or: (customer surname is: 'Mueller')
) and: (
	(customer bonusPoints gt: customer joinDate year) or: (customer joinDate year is: Date today year)
2.1. Sample SQL query in Smalltalk

(((customers.surname = 'Schneider') | (customers.surname = 'Mueller')) & ((customers.bonusPoints > YEAR(customers.joinDate)) | (YEAR(customers.joinDate) = 2014)))
2.2. Sample SQL query string

One way would be to hook into the Smalltalk compiler and build the SQL-like expression from the AST. Another would be to ignore the Smalltalk side completely and parse Strings via a Parser into those expressions (again using graphs/ASTs). But in some cases a simpler approach with "Message Recording" objects is more than sufficient.

3. Blocks as Parsers/Translators

Let's start with the previous expression from above. What happens if we wrap it into a "select Block" (i.e. for a #select: message or similar) and call #value: with an arbitrary value? Let's try nil for now.

| selectBlock |
selectBlock := [ :customer | 
((customer surname is: 'Schneider') or: (customer surname is: 'Mueller'))
	and: ((customer bonusPoints gt: customer joinDate year) or: (customer joinDate year is: Date today year)) ].
expression := selectBlock value: nil.	"Inspect-it"
3.1. Workspace with nil

If we execute the Script 3.1 we'll get an error message MessageNotUnderstood: UndefinedObject>>surname (Figure 4.1). And it's clear why: Executing the block binds nil to customer. And the first message sent to customer is #surname. This of course raises an error because nil (an UndefinedObject) does not understand #surname.

But what would happen if we use another object (let's call it a SQLTable) instead? This SQLTable would understand #surname and respond with something useful - i.e. a SQLColumn named accordingly in this case. If we keep up resolving to "useful" objects we'll end up with a graph of objects expressing the original expression!

The "hard" parsing work is done by the Smalltalk compiler itself. Our job is only to record any (useful) message sent to our Translator objects and respond with other useful objects to continue the proccess until everything is parsed. Once we're finished we can then use this graph of objects to create our "translated" language.

The following code snippets should be enough to build some working code (Copy&Paste should work). If you want to see the complete code you can find it in the BlockParsers project under http://www.smalltalkhub.com/#!/~UdoSchneider/BlockParser.

4. SQL Translator

4.1. SQL tables

We'll add bits and pieces of code along the chapter. Always just enough to hit the next Debugger. This will give us enough clues about how to proceed:

The first class we need to create is SQLTable to bind to the customer Variable. Make it a subclass of SQLComponent. It also needs to store the table name in an instance variable. So we need to add instance creation methods to set the name of the table (Script 4.1).

Object subclass: #SQLComponent
	instanceVariableNames: ''
	classVariableNames: ''
	category: 'BlockParser-SQL'

SQLComponent subclass: #SQLTable
	instanceVariableNames: 'name'
	classVariableNames: ''
	category: 'BlockParser-SQL'

SQLTable class>>#named: aString
	^ self new
		setName: aString;

SQLTable>>#setName: aString
	name := aString
4.1. Defintion of SQLTable

Try the new class and call the block with it (Script 4.2):

| selectBlock table |
selectBlock := [ :customer | 
((customer surname is: 'Schneider') or: (customer surname is: 'Mueller'))
	and: ((customer bonusPoints gt: customer joinDate year) or: (customer joinDate year is: Date today year)) ].
table := SQLTable named: 'customers'.
selectBlock value: table.	"Inspect-it"
4.2. Workspace with a SQLTable

4.1. Wallback: MessageNotUnderstood: SQLTable>>surname

Executing this snippet will result in an error because (again) #surname is not understood (Figure 4.1): If customer in the block is an SQLTable instance (or to be more specific a table row) then the semantic meaning of customer surname is to get its `surname` property - or to stick with SQL; to get a column with that name.

4.2. SQL columns

Because columns can participate in relations we'll create an SQLColumn class as subclass of SQLTerm (Script 4.3). We also add methods to set the owning table and name:

SQLComponent subclass: #SQLTerm
	instanceVariableNames: ''
	classVariableNames: ''
	category: 'BlockParser-SQL'

SQLTerm subclass: #SQLColumn
	instanceVariableNames: 'table name'
	classVariableNames: ''
	category: 'BlockParser-SQL'

SQLColumn class>>#table: aSQLTable name: aString
	^ self new
		setTable: aSQLTable name: aString;

SQLColumn>>#setTable: aSQLTable name: aString
	table := aSQLTable.
	name := aString
4.3. Defintion of SQLColumn

We also need to add behaviour to SQLTable to return an SQLColumn instance when it recieves an unknown unary message. To make things easier we'll intercept each unary message sent to a SQLTable instance and return an SQLColumn instance which knows its originating table and its name. So we'll add that behavior do #doesNotUnderstand (Script 4.4):

SQLTable>>#doesNotUnderstand: aMessage
	| selector |
	selector := aMessage selector.
	selector isUnary
		ifTrue: [ ^ SQLColumn table: self name: selector asString ].
	^ super doesNotUnderstand: aMessage
4.4. Defintion of SQLTable>>#doesNotUnderstand:

In a "real" implementation you might want to check the selector name. If its a known column name (you have the schema? Don't you?) you'd return the column. Otherwise forward #doesNotUnderstand: to super (Figure 4.2).

Running the snippet now yields an "SQLColumn(Object)>>doesNotUnderstand: #is:" error (Figure 4.2). #is: is an equality check. In a generalized way equality is an operation with equality (=) as operator and two (left, right) terms.

4.2. Wallback: MessageNotUnderstood: SQLColumn>>is:

4.3. SQL expressions

Every SQL term (columns included) might be combined with a constant or another term by using an operator. An SQLExpression stores the operand (like =, <, >, +, -, *, ...), a left and right a term (Script 4.5).

SQLTerm subclass: #SQLExpression
	instanceVariableNames: 'left operand right'
	classVariableNames: ''
	category: 'BlockParser-SQL'

SQLExpression class>>#left: leftTerm operand: aSymbol right: rightTerm
	^ self new
		setLeft: leftTerm operand: aSymbol right: rightTerm;
SQLExpression>>#setLeft: leftTerm operand: aSymbol right: rightTerm
	left := leftTerm asSQLComponent.
	operand := aSymbol.
	right := rightTerm asSQLComponent
4.5. Defintion of SQLExpression

We are sending #asSQLComponent to both terms here. The left term should always be a subclass of SQLComponent already. The right side however might also be a constant (originating from Smalltalk code). So sending #asSQLComponent provides the possibility to wrap constants in a SQLConstant (sub-)class (Script 4.6).

SQLTerm subclass: #SQLConstant
	instanceVariableNames: 'value'
	classVariableNames: ''
	category: 'BlockParser-SQL'
SQLConstant>>#value: aValue
	^ self new
		setValue: aValue;
SQLConstant>>#setValue: aValue
	value := aValue
4.6. Defintion of SQLConstant

Now we need to implement #asSQLComponent in some classes which might appear in expressions (Script 4.7):

	^ self

	^ SQLConstant value: self
4.7. Defintion of #asSQLComponent

For now we only implement #asSQLComponent in Object and SQLComponent. In production you might want to use different SQLConstant subclasses for different kind of constants like Strings, Numbers, Dates to deal with the target expressions formatting.

4.4. Equality (#is:)

We'll implement #is: as an comparison operator in SQLTerm to return an SQLExpression (Script 4.8).

SQLTerm>>#is: anObject
	^ SQLExpression left: self operand: #= right: anObject
4.8. Defintion of SQLTerm>>#is:

Why do we use #is: instead of #=? Overriding #= instead of implementing #is: is a double edged sword. Especially in our case because we'd change the semantics of the message. We won't return a Boolean any longer - we'll return something different! Overwriting #= to answer a non-Boolean leads to interesting effects down the line ... you have been warned ...

Let's see how far we get now: We'll get an Error message MessageNotUnderstood: SQLExpression>>or: (Figure 4.3).

4.3. Wallback: MessageNotUnderstood: SQLExpression>>or:

4.5. Boolean Operators

SQLTerms can be combined using Boolean Operators. So let's implement SQLTerm>>#or: and SQLTerm>>#and: (Script 4.9).

SQLTerm>>or: anObject
	^ SQLExpression left: self operand: #| right: anObject

SQLTerm>>#and: anObject
	^ SQLExpression left: self operand: #& right: anObject
4.9. Defintion of Boolean operators in SQLTerm

Our implementation does not use regular blocks as arguments. You can use blocks in your implementation though. Just be warned that the compiler/VM might inline sends of #and:, #or: if the argument is a block!

Logical #not is not an expression - not an operator "between" to terms. It's an Operator applied to one term. So it's best expressed as a function!

Running the code snippet complains about an SQLColumn instance not understanding #year (Figure 4.4). Semantically I'd say that something like tableName columnName year is like calling a function: YEAR(tableName.column).

4.4. Wallback: MessageNotUnderstood: SQLColumn>>year

4.6. SQL functions

Every unary message sent to an SQLTerm should result in a SQLFunction wrapping it (Scripe 4.10):

SQLTerm subclass: #SQLFunction
	instanceVariableNames: 'name term'
	classVariableNames: ''
	category: 'BlockParser-SQL'

SQLFunction class>>#name: aString term: anSQLTerm
	^ self new
		setName: aString term: anSQLTerm;

SQLFunction>>#setName: aString term: anSQLTerm
	name := aString.
	term := anSQLTerm

We'll also implement SQLTerm>>#doesNotUnderstand: to return SQLFunctions.

SQLTerm>>#doesNotUnderstand: aMessage
	| selector |
	selector := aMessage selector.
	selector isUnary
		ifTrue: [ ^ SQLFunction name: selector asString asUppercase term: self ].
	^ super doesNotUnderstand: aMessage
4.11. Defintion of SQLTerm>>#doesNotUnderstand:

#doesNotUnderstand: is the quick and dirty solution here. If you have a limited number of functions you can also implement them as methods directly.

Running the script we now get an Error message MessageNotUnderstood: SQLExpression>>gt: (Figure 4.5). So the next method we need is greater than.

4.5. Wallback: MessageNotUnderstood: SQLExpression>>gt:

4.7. Comparisons

We'll implement these using similar to SQLTerm>>#is: (Script 4.12):

SQLTerm>>#gt: anObject
	^ SQLExpression left: self operand: #> right: anObject

SQLTerm>>#gte: anObject
	^ SQLExpression left: self operand: #>= right: anObject

SQLTerm>>#lt: anObject
	^ SQLExpression left: self operand: #< right: anObject

SQLTerm>>#lte: anObject
	^ SQLExpression left: self operand: #<= right: anObject
4.12. Defintion of Comparisons in SQLTerm

Executing the expressions again raises no Error. We made it! The expression parses (Figure 4.6)! Inspecting the result of our snippet in the inspector shows a nice graph of objects which we'll use in the next step to create the SQL String.

4.6. Parsed SQLExpression in Inspector

4.7. Graph of SQLExpression

5. SQL Generator

Now that we have a nice graph (Figure 4.7) of objects let's try to create the SQL string from it: Implement the messages #sqlString and #printSqlOn: in SQLComponent (Script 5.1). #printSqlOn: is a subclass responsibility and should be implemented by all subclasses:

	^ String streamContents: [ :stream | self printSqlOn: stream ]
SQLComponent>#printSqlOn: aStream
	^ self subclassResponsibility
5.1. Defintion of SQLComponent>#sqlString and SQLComponent>#printSqlOn:

Now let's try our "implement until next error" approach again using the next Workspace code (Script 5.2).

| selectBlock table expression |
selectBlock := [ :customer | 
((customer surname is: 'Schneider') or: (customer surname is: 'Mueller'))
	and: ((customer bonusPoints gt: customer joinDate year) or: (customer joinDate year is: Date today year)) ].
table := SQLTable named: 'customers'.
expression := selectBlock value: table.	"Inspect-it"
expression sqlString	"Inspect-It"
5.2. Workspace to create SQL String

We'll get an error SubclassResponsibility: SQLExpression had the subclass responsibility to implement #printSqlOn: (Figure 5.1):

5.1. SubclassResponsibility: SQLExpression had the subclass responsibility to implement #printSqlOn:

So Pharo is telling us exactly what to do next. From now on we'll simply implement #printSqlOn: in all the classes until we finally get the string without error (Script 5.3). As you can see we simply output the information either directly or by delegating #printSqlOn: to child nodes.

SQLExpression>>#printSqlOn: aStream
	aStream nextPut: $(.
	left printSqlOn: aStream.
		nextPutAll: operand;
	right printSqlOn: aStream.
	aStream nextPut: $)

SQLColumn>>#printSqlOn: aStream
	table printSqlOn: aStream.
		nextPut: $.;
		nextPutAll: name

SQLTable>>#printSqlOn: aStream
	aStream nextPutAll: name

SQLConstant>>#printSqlOn: aStream
	aStream print: value

SQLFunction>>#printSqlOn: aStream
		nextPutAll: name;
		nextPut: $(.
	term printSqlOn: aStream.
	aStream nextPut: $)
5.3. Defintion of #printSqlOn:

Finally our translator works and yields the expected result (Script 5.4).

(((customers.surname = 'Schneider') | (customers.surname = 'Mueller')) | ((customers.bonusPoints > YEAR(customers.joinDate)) | (YEAR(customers.joinDate) = 2014)))
5.4. Final SQL String

6. Summary

Hopefully this chapter was able to show you (in an understandable way?) how to use "Block Parsers/Translators" to parse Smalltalk expressions and translate them into something "different". This example is neither comprehensive nor production ready. In a production setup you'd have to think a lot more about different subclasses e.g. for constants, functions ... even if it's "just" for printing constants correctly. But the skeleton should be the same.

7. Limitations/Notes

7.1. Method names

Overriding some methods (esp. #=) is a pretty bad idea. Of course customer name = 'Schneider' is easier to read and write than customer name is: 'Schneider'. ButoOverriding #= with different semantics is a sure recipe for disaster!

You should also be careful with "Boolean-ish" methods like #and:, #or:, #ifTrue:. These methods are sometimes inlined by the compiler and you'll get warnings about one of the operands being a non-Boolean.

7.2. Order of expressions

The whole approach bases on the idea of intercepting messages sent to an object (to be able to respond with "another" intercepting object). So make sure that in each and every expression the objects you put into the block (or derivates thereof) are always the recieving objects (left side in operations). Everything else will fail.

Two expressions might be semantically identical/equal in Smalltalk yet yield different results when used with Block Parsers (Script 7.1).

table := SQLTable named: 'customers'.
"Both blocks are semantically equal in Smalltalk ..."
block1 := [ :customer | customer age > 23 ].
block2 := [ :customer | 23 > customer age ].

"But not when used to parse!"
String streamContents: [ :stream | (block1 value: table) printSqlOn: stream ].  '(customers.age > 23)' .
String streamContents: [ :stream | (block2 value: table) printSqlOn: stream ]. "Error: SQLColumn(Object)>>doesNotUnderstand: #adaptToNumber:andSend:"
7.1. Order of Expressions: Different semantics Smalltalk/Parsed Expression

7.3. Expressions only! ... mostly ...

This approach does work fine if you want to translate an expression - even a compound one. Expressions (e.g. for filtering) are traditionally used for Collection messages like #select:. Expressions with temporal variables (Script 7.2) do work.

Expressions with multiple statement (Script 7.3) do not! Only the expression for the second use surname is: 'Mueller' is returned an can be translated. You can of course use a builder in the background and record "new" expressions - i.e. if the initial object passed in receives a message. But that's not completely safe - especially if you didn't refactor all temp variables.

But if you stick to expressions in Blocks (although it also works fine for expressions in methods!) it's more likely to not hit that limitation.

| selectBlock table expression |
selectBlock := [ :customer | 
	| surname joinDate |
	surname := customer surname.
	joinDate := customer joinDate.
	((surname is: 'Schneider') or: (surname is: 'Mueller'))
		and: ((customer bonusPoints gt: joinDate year) or: (joinDate year is: Date today year)) ].
table := SQLTable named: 'customers'.
expression := selectBlock value: table.	"Inspect-it"
expression sqlString	"Inspect-It"
7.2. Temporal variables in expressions

| selectBlock table expression |
selectBlock := [ :customer | 
	| surname |
	surname := customer surname.
	surname is: 'Schneider'.
	surname is: 'Mueller' ].
table := SQLTable named: 'customers'.
expression := selectBlock value: table.	"Inspect-it"
expression sqlString	"Inspect-It" " (customers.surname = 'Mueller')'"
7.3. Multiple statements in expressions

7.4. Prior Art

The method presented in thie chapter is nothing "new". At least two frameworks are/were using are similar approach to create SQL query strings from Smalltalk blocks.

blog comments powered by Disqus