Persistate

Grammar Notation in Syntax Sections

Hide Navigation Pane

Grammar Notation in Syntax Sections

Previous topic Next topic No directory for this topic  

Grammar Notation in Syntax Sections

Previous topic Next topic Topic directory requires JavaScript JavaScript is required for the print function Mail us feedback on this topic!  

Statement syntax is presented in this documentation as a formal grammar using a dialect of Extended Backus Naur Form (EBNF).  It is in fact the exact grammar which is used to parse the definition files in the Persistate application.

The grammar consists of a set of productions of the form

name = element list ;

The name is that of the production being defined, and the element list is one or more elements separated by white space.  The elements define what the production stands for in the grammar.  Each element can be one of the following

A terminal.  This is a single symbol within double quotes, and it matches the same symbol in the source text.  The definition of a symbol is quite complex, but in general it is a string of characters with no white space.  In this example, "classes" "have" and "functions" are terminals.

ClassCategory = CategoryName "classes" "have" "functions" FunctionNames ;

A token.  This is a term within angle brackets, and each token matches a different range of items in the definition file.  There are a fixed set of tokens, as follows:
o<symbol>.  This token matches any single symbol in the definition file.
o<string>.  This token matches any text within double quotes in the definition file.
o<integer>.  This token matches an integer literal in the definition file.
o<floating>.  This token matches a floating point literal in the definition file.
o<datetime>.  This token matches a datetime literal in the definition file.

This example shows the use of a number of tokens.

RightOperand = ( <string> | <integer> | <floating> | <datetime> 
               | RightOperandValue | Parameter ) ;

A production.  This is the name of another production, and matches any section of the definition file which matches that production.  In this example, LeftOperand Comparator and RightOperand are productions.

SimpleCondition = LeftOperand Comparator RightOperand ;

An option.  This takes the format [ element list ] where element list is one or more elements separated by white space.  this matches optional items in the definition file.  In other words, if the definition file contains items matching each element in the list, this is a match, and if it contains none of them this also matches.  In this example, a constraint can optionally start with the terminal symbol "not".

Constraint = [ "not" ] ( SimpleCondition | "(" CompoundCondition ")" ) ;

A choice.  This takes the format ( element list | element list ... ) where element list is one or more elements separated by white space, and ... means that the | element list part can be repeated any number of times.  Each of the element lists constitutes a choice and this matches a set of items in the definition file which matches any of the choices.  In this example, the PanePosition production will match any of the terminal symbols "top", "bottom", etc..

PanePosition = ( "top" | "bottom" | "central" | "left" | "right" | "tabbed" ) ;

A repeat.  This takes the format { element list }  where element list is one or more elements separated by white space.  This matches a set of items in the definition file which match the element list repeated zero or more times.  Note the zero - it means that in the following example, the WorkspaceName matches a symbol followed by zero or symbols, so in other words one or more symbols.

WorkspaceName = <symbol> { <symbol> } ;

Differences from "standard" EBNF

This section is included for those wishing to know more about the exact details of the dialect of EBNF used.  This is for more than curiosity's sake, as the Persistate API contains a namespace called Persistate.Language which provides an LALR syntax parser which uses grammars expressed in this dialect to parse arbitrary languages.  This parser is also used to parse the EBNF grammar definitions themselves, which means that the EBNF must itself have a grammar.  Here it is.

Start = Syntax ;
Syntax = ProductionRule { ProductionRule } ;
ProductionRule = Production "=" Elements ";" ;
Elements = Element { Element } ;
Element = ( Token | Terminal | Production | Option | Choice | Repeat ) ;
// not EBNF:  for strict EBNF should be ?symbol? etc 
Token = ( "<symbol>" | "<string>" | "<integer>" | "<floating>" | "<datetime>" ) ;
Terminal = <string> ;
Production = <symbol> ;
Option = "[" Elements "]" ;
// not EBNF: always include parentheses in choice elements 
Choice = "(" Elements { "|" Elements } ")" ;   
Repeat = "{" Elements "}" ;

The above grammar effectively describes itself.  The differences between standard EBNF and the Persistate dialect described by the above grammar are as follows:

Exclusions are not supported.
Numbered repeats are not supported.
Comments using (*...*) sections are not supported, though comment lines starting with // are.
Comma is not used for concatenation - elements simply follow each other, separated by white space.
Parentheses are used only to group choices, and must be used thus.
The tokens <symbol>, <string> etc. are non standard - the standard syntax for such elements would be ?symbol?, ?string? etc..  See below for the exact matching criteria for these tokens.

Token matching criteria

This section is here for completeness, as this is effectively part of the difference of the Persistate EBNF from the standard.  However, these details will really only be of concern to developers using the API in the Persistate.Language namespace.

The <symbol> token is matched by default in source code by a sequence of characters which match the following .Net regular expression.

(?:\.|,|;|%|=|!=|>=|<=|>|<|\(|\)|\{|\}|\[|\]|\|)|(?:\p{L}|\p{Nl}|_)(?:\p{L}|\p{Nl}|\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc}|\p{Cf})*

This is a bit of a handful.  It essentially defines any sequence which is either one of a selection of operator or punctuation symbols, or a sequence with first character which is any letter or underscore, followed by a sequence of letters, digits, non space marks, combining marks, connector punctuation or other format characters.  This expression is the one used for symbols in the Persistate definition file language.  You can set your own regular expression for symbol matching using the SyntaxParser.SymbolPattern property.

The <string> token is matched in source code by a sequence of characters which match the following .Net regular expression.

"(?:[^"]|"")*"

The <integer> token is matched in source code by a sequence of characters which match the following .Net regular expression.

[\+-]?\d+

The <floating> token is matched in source code by a sequence of characters which match the following .Net regular expression.

[\+-]?\d+\.\d+(?:[Ee][+\-]?\d+)?

The <datetime> token is matched in source code by a sequence of characters enclosed in single quotes which are parsed successfully by the .Net DateTime.TryParse method for the current culture.