Building ODataURI Parser with Scala Parser Combinators

Objective

Open Data Protocol (ODATA) facilitates end-users to access the data-model via REST-based data services by utilizing Uniform Resource Identifiers (URIs). In this post, we present the result of our recent experiment to build an abstraction on ODATA URIs to generate AST (Abstract Syntax Tree).

Note that this experiment is in its initial stage; hence, the implementation does not support complete feature-set of ODATA URI specification outlined at odata.org. It states a set of recommendations to construct these URIs to effectively identify data and metadata exposed by ODATA services.

To give an example of ODATA URI, consider following URI:

http://odata.org/service.svc/Products?$filter=Price ge 10

It in essence refers to a service request to return all the Product entities that satisfies the following predicate: Price greater than or equal to 10.

Motivation

Primary motivation of building such abstraction is to promote separation-of-concern and consequently, to allow the underlying layers of ODATA service implementation to process query expression tree and yield the result-set in a more efficient manner.

Approach

To implement this parser, we use Parser Combinators, which is in essence a higher-order function that accepts a set of parsers as input and composes them, applies transformations and generates more complex parser. By employing theoretical foundations of function composition, it allows constructing complex parser in an incremental manner.

Scala facilitates such libraries in its standard distribution (see scala.util.parsing). In this implementation, we in particular, use JavaTokenParsers along with PackratParser.

class ODataUriParser extends JavaTokenParsers with PackratParsers {
//...
}

Implementation

ODATA URI contains three fundamental parts, namely Service Root URI, Resource Path and Query Options as below and as per the documentations at [1].

http://host:port/path/SampleService.svc/Categories(1)/Products?$top=2&$orderby=Name
\______________________________________/\____________________/ \__________________/
| | |
service root URL resource path query options
view raw odata_uri.txt hosted with ❤ by GitHub

If we consider the ODATA URI mentioned previously, following illustrates the stated three parts of ODATA request:

odata-uri-parts

Hence, we can construct this parser by building combinators for the three sub-parts in a bottom-up manner and then compose them to construct the complete parser as listed below.

lazy val oDataQuery: PackratParser[SourceNode]={
serviceURL ~ resourcePath ~ opt(queryOperationDef) ^^ {
case uri ~ path ~ None => ODataQuery(uri, path, QueryOperations(Seq.empty))
case uri ~ path ~ Some(exp) => ODataQuery(uri, path,QueryOperations(exp))
}
}
view raw main.scala hosted with ❤ by GitHub

To gets started with an example, lets consider following URI:

http://odata.io/odata.svc/Schema(231)/Customer?$top=2&$filter=concat(City, Country) eq 'Berlin, Germany'

and we are expecting an expression tree based on a pre-defined model as follows:

ODataQuery(
URL("http://odata.io/odata.svc"),
ResourcePath("Schema",Number("231"),ResourcePath("Customer",EmptyExp(),EmptyExp())),
QueryOperations(
List(Top(Number("2")),
Filter(
EqualToExp(
CallExp(
Property("concat")
, List(Property("City"), Property("Country"))
)
, StringLiteral("'Berlin, Germany'"))))))
view raw expressionTree hosted with ❤ by GitHub

Building a parser combinator for Service Root and Resource Path are considerably simpler compared to that of Query Options (the third part). Let’s build them first.

We are using this convention (see ODATA specification) that a ODATA service root should always be ended by .svc. The following snippet can parse for instance http://odata.io/odata.svc to URL("http://odata.io/odata.svc").

lazy val serviceURL: PackratParser[Expression] =
"""^.*.svc""".r ^^ {
case s => URL(s)
}

Next we are defining a resource path which can parse for instance Schema(231) to ResourcePath("Schema",Number("231"),ResourcePath("Customer",EmptyExp(),EmptyExp())) expressions. A compound resource path can be augmented with multiple resources.

lazy val resourcePath: PackratParser[Expression] =(
"/" ~> idn ~ ("(" ~> predicate <~ ")") ~ opt(resourcePath) ^^ {
case Property(name) ~ keyPredicate ~ None => ResourcePath(name, keyPredicate, EmptyExp())
case Property(name) ~ keyPredicate ~ Some(expr) => ResourcePath(name, keyPredicate, expr)
}
| "/" ~> idn~ opt(resourcePath) ^^ {
case Property(e)~None => ResourcePath(e, EmptyExp(), EmptyExp())
case Property(e)~Some(expr) => ResourcePath(e, EmptyExp(), expr)
}
)

After that we have reached to the crux of the problem: to build a parser that can handle the query operators defined in the OData specification. To solve it, we apply bottom up approach in conjunction with top-down realization.

First we define a basic parser that can parse arithmetic expressions as follows.

lazy val expression: PackratParser[Expression] =
expression ~ ("add" ~> termExpression) ^^ {case l ~ r => PlusExp(l, r)} |
expression ~ ("sub" ~> termExpression) ^^ {case l ~ r => MinusExp(l, r)} |
termExpression
lazy val termExpression: PackratParser[Expression] =
termExpression ~ ("mul" ~> factor) ^^ {case a ~ b => MultiplyExp(a, b)} |
termExpression ~ ("div" ~> factor) ^^ {case a ~ b => DivideExp(a, b)} |
termExpression ~ ("mod" ~> factor) ^^ {case a ~ b => ModExp(a, b)} |
factor
lazy val factor: PackratParser[Expression] =
("not" ~> factor) ^^ NotExp |
factor ~ ("(" ~> expressionList <~ ")") ^^ {case id ~ param => CallExp(id, param)} |
number |
boolean |
string |
idn |
"(" ~> predicate <~ ")"
lazy val expressionList: PackratParser[Seq[Expression]] = repsep(predicate, ",")
lazy val propertyList: PackratParser[Seq[Property]] = repsep(idn, ",")
lazy val idn: PackratParser[Property] = ident ^^ Property
lazy val number: PackratParser[Number] = floatingPointNumber ^^ Number
lazy val string: PackratParser[StringLiteral] = ("\'" + """([^"\p{Cntrl}\\]|\\[\\/bfnrt]|\\u[a-fA-F0-9]{4})*""" + "\'").r ^^ StringLiteral | stringLiteral ^^ StringLiteral
lazy val boolean: PackratParser[Expression] = "true" ^^^ TrueExpr() | "false" ^^^ FalseExpr()

Then we incrementally augment support for handling relational operators, and thus can handle logical and, or and similar operation.

lazy val predicate: PackratParser[Expression] =
predicate ~ ("and" ~> relExpression) ^^ {case l ~ r => AndExp(l, r)} |
predicate ~ ("or" ~> relExpression) ^^ {case l ~ r => OrExp(l, r)} |
relExpression
lazy val relExpression: PackratParser[Expression] =
relExpression ~ ("gt" ~> expression) ^^ {case l ~ r => GreaterThanExp(l, r)} |
relExpression ~ ("lt" ~> expression) ^^ {case l ~ r => LessThanExp(l, r)} |
relExpression ~ ("eq" ~> expression) ^^ {case l ~ r => EqualToExp(l, r)} |
relExpression ~ ("ne" ~> expression) ^^ {case l ~ r => NotEqualToExp(l, r)} |
relExpression ~ ("ge" ~> expression) ^^ {case l ~ r => GreaterOrEqualToExp(l, r)} |
relExpression ~ ("le" ~> expression) ^^ {case l ~ r => LessOrEqualToExp(l, r)} |
expression

The above two code listings form the basis to provide support for the query operations such as $filter and $select. See below.

lazy val queryOperationDef: PackratParser[Seq[Expression]] =
"?" ~> repsep(filter | select | top | skip | orderBy, "&")
lazy val filter: PackratParser[Expression] =
"$filter" ~> "=" ~> predicate ^^ Filter
lazy val top: PackratParser[Expression] =
"$top" ~> "=" ~> number ^^ Top
lazy val skip: PackratParser[Expression] =
"$skip" ~> "=" ~> number ^^ Skip
lazy val orderBy: PackratParser[Expression] = (
"$orderby" ~> "="~> propertyList <~ "asc" ^^ OrderByAsc
| "$orderby" ~> "="~> propertyList <~ "desc" ^^ OrderByDesc
| "$orderby" ~> "="~> propertyList ^^ OrderByAsc
)
lazy val select: PackratParser[Expression] =
"$select"~>"="~> propertyList ^^ Select

Thus, it allows to parse the URI to expression tree as shown below.

test("Parse /Customers?$top=2&$filter=concat(City, Country) eq 'Berlin, Germany'"){
val uri = "http://odata.io/odata.svc/Schema(231)/Customer?$top=2&$filter=concat(City, Country) eq 'Berlin, Germany'"
val actual = p.parseThis(mainParser,uri).get
println(uri + "=>" + actual)
val expectedAst=
ODataQuery(
URL("http://odata.io/odata.svc&quot;),
ResourcePath("Schema",Number("231"),ResourcePath("Customer",EmptyExp(),EmptyExp())),
QueryOperations(
List(Top(Number("2")),
Filter(
EqualToExp(
CallExp(
Property("concat")
, List(Property("City"), Property("Country"))
)
, StringLiteral("'Berlin, Germany'"))))))
assert(actual == expectedAst)
}

Or, as follows:

test("Parse /Products?$select=Name"){
val uri = "http://services.odata.org/OData.svc/Products?$select=Name,Price"
val actual = p.parseThis(mainParser,uri).get
val expectedAst =
ODataQuery(
URL("http://services.odata.org/OData.svc"),
ResourcePath("Products",EmptyExp(),EmptyExp()),
QueryOperations(List(Select(List(Property("Name"), Property("Price"))))))
assert(actual == expectedAst)
}

Conclusion

The complete source of this project is available at github repository. Please feel free to browse and if there is any question, please post.

See More:

  1. OData URI Specification
  2. External DSLs made easy with Scala Parser Combinators
  3. DSLs in Action