Mercurial > ~dholland > hg > ag > index.cgi
view anagram/guisupport/helpdata.src @ 24:a4899cdfc2d6 default tip
Obfuscate the regexps to strip off the IBM compiler's copyright banners.
I don't want bots scanning github to think they're real copyright
notices because that could cause real problems.
author | David A. Holland |
---|---|
date | Mon, 13 Jun 2022 00:40:23 -0400 (2022-06-13) |
parents | 13d2b8934445 |
children |
line wrap: on
line source
Accept Action The accept action is one of the four actions of a traditional �parsing engine�. The accept action is performed when the �parser� has succeeded in identifying the goal, or �grammar token� for the �grammar�. When the parser executes the accept action, it sets the �exit_flag� field in the �parser control block� to AG_SUCCESS_CODE and returns to the calling program. The accept action is thus the last action of the parsing engine and occurs only once for each successful execution of the parser. If the grammar token has a non-void value, you may obtain its value by calling the �parser value function� whose name is given by <parser name>_value, that is, by appending "_value" to the �parser name�. ## Parser Value Function, Return Value The value assigned to the �grammar token� in your parser may be retrieved by calling the parser value function after the parser has finished. The name of this function is given by <�parser name�>_value. The return type of the function is the type assigned to the grammar token. If you have set the �reentrant parser� switch, the parser value function takes a pointer to the �parser control block� as its sole argument. Otherwise, it takes no arguments. The value function is not defined if the grammar token has type "void". ## AG_PLACEMENT_DELETE_REQUIRED When the �wrapper� option is specified, the wrapper template class that AnaGram defines uses a "placement new" operator to construct the wrapper object on the �parser value stack�. The MSVC++ 6.0 compiler requires, in this situation, that a corresponding "placement delete" operator be defined. Other C++ compilers, notably MSVC++ 5.0, generate an error message if they encounter the definition of a "placement delete" operator. Accordingly, AG_PLACEMENT_DELETE_REQUIRED is used to determine whther a "placement delete" operator should be defined. AG_PLACEMENT_DELETE_REQUIRED is defined to be 1 if you are using MSVC++ 6.0 or greater, 0 otherwise. You can override the automatic definition of AG_PLACEMENT_DELETE_REQUIRED by defining it in the �C prologue� section of your grammar. Set it to a non-zero value to force the "placement delete" definition, zero to skip the definition. ## ag_tcv ag_tcv is an array AnaGram includes in your �parser�. Your parser uses ag_tcv to translate external codes to the internal token numbers that AnaGram uses. It uses the actual input code to index the ag_tcv array to fetch a �token number�. The token number is then used to identify the input token. ## Allow macros "Allow macros" is a �configuration switch� which defaults to on. When it is set, i.e., on, �reduction procedure�s will be implemented as macros if they are sufficiently simple. This makes your �parser� somewhat more compact but makes it somewhat more difficult to debug. It's a good idea to turn this switch off for debugging. ## Analyze Grammar The Analyze Grammar command will scan and analyze your �syntax file�, and create a number of tables summarizing your grammar. Analyze Grammar does not create any �output files�. To create a �parser�, use the �Build Parser� command. You would probably use Analyze Grammar, rather than Build Parser, during initial development of your �grammar�. You can use �File Trace� and �Grammar Trace� as soon as you have analyzed your grammar. It is not necessary to build a parser first. ## Attribute Statement Attribute statements are used in �configuration sections� of your �syntax file� to specify certain properties for �token�s, �character set�s, or other units of your grammar. The attribute statements available are: �disregard� �distinguish keywords� �enum� �extend pcb� �hidden� �left� �lexeme� �nonassoc� �rename macro� �reserve keywords� �right� �sticky� �subgrammar� �wrapper� ## Auto init Auto init is a �configuration switch� which defaults to on. It controls the initialization of any �parser� that it is not �event driven�. When it is set to on, your parser is automatically initialized every time it is called. This is the situation you will normally use. On occasion, however, it is desirable to call a parser several times without reinitializing it. In this case, you may set the auto init parameter to off and then call the �initializer� yourself whenever it is appropriate. ## Auto resynch "Auto resynch" is a �configuration switch� which defaults to off. You may use it to specify �automatic resynchronization� as an �error recovery� mechanism. Setting the "auto resynch" switch causes AnaGram to include an automatic �resynchronization� procedure in your �parser�. The resynchronization procedure will be invoked when your parser encounters a �syntax error� and will skip over input until it finds input characters or �tokens� consistent with its state at the time of the error. An alternate technique, �error token resynchronization�, uses an �error token� which you include in your grammar. ## Automatic Resynchronization Automatic �resynchronization� is one of several �error recovery� options available as part of parsers built by AnaGram. You enable automatic resynchronization by setting the �auto resynch� �configuration switch�. If your parser includes automatic resynchronization it will incorporate a heuristic procedure which will skip over input tokens until it finds a token which makes sense with respect to one or another of the �production�s active at the time of the �syntax error�. The purpose of the resynchronization procedure is to provide a simple way for your parser to proceed in the event of syntax errors so that it can find more than one syntax error on a given pass. The resynchronization procedure uses a heuristic based on your own syntax. AnaGram itself uses this technique to resynchronize after syntax errors in its input. A disadvantage to using this resynchronization technique is that the resynchronization procedure turns off all �reduction procedure�s. Because of the error, a number of reduction procedures, which normally would be executed, will be skipped. The parameters for any reduction procedures that might be called later would be suspect and could cause serious problems. It seems more prudent simply to shut them down. If you use the automatic resynchronization procedure, you must also specify an �eof token� so that the synchronizer doesn't inadvertently skip over the end of file. An alternative technique for resynchronization is called �error token resynchronization�. ## Auxiliary Trace An Auxiliary Trace is a pre-built grammar trace which you may select from the �Auxiliary Windows� popup menu for most windows which display parser state information. The Auxiliary Trace provides a path to the state specified in the highlighted line of the primary window. When obtained from the Parser Stack pane of the �File Trace� or �Grammar Trace�, the Auxiliary Trace is simply a copy of the current status of these traces so you can explore your alternatives while still retaining the status of the original trace for reference. ## Auxiliary Windows From most AnaGram windows you can pop up an Auxiliary Windows menu by clicking the right mouse button or by pressing Shift F10. Auxiliary Windows may have Auxiliary Windows of their own. Windows with a cursor bar (highlighted line): The windows available in the Auxiliary Windows menu depend on the grammar elements identified by the cursor bar in the parent window. If the cursor bar identifies a �parser state�, there will be windows that describe the state. If the cursor bar identifies a �grammar rule�, there will be windows that describe the rule. If the cursor bar identifies a �token�, there will be windows that describe the token. In the case of a �marked rule�, token windows will describe the marked token, if any. In some cases, specialized pre-built grammar traces such as the �Conflict Trace� or �Auxiliary Trace� are on the menu. Help windows: For Help windows, the Auxiliary Windows menu will show all the available links to other �Help topics� from this window. �Using Help� is always available. ## Backtrack If your �parser� does not continue after encountering a �syntax error�, you can speed it up and make it a little smaller by turning off the backtrack �configuration switch�. If backtrack is on, AnaGram configures your parser so that in case of syntax error it can undo any �default reductions� it might have made as a consequence of the erroneous input. The purpose of such an undo function is to identify the proper �error frame� and to maximize the probability of being able to recover gracefully. ## Empty Recursion This warning message tells you that the recursive step of the specified �recursive rule� can be completely matched by �zero length� tokens, i.e., by nothing at all. The result is potentially an infinite loop in the generated �parser�. The specified rule is an expansion rule of the specified token. Because of the possibility of encountering an infinite loop while parsing, AnaGram turns off its �keyword anomaly� analysis if empty recursion is found. The �File Trace� function is also disabled for the same reason. The �circular definition� of a token has the same effect as an empty recursion, in that no additional input is required to match the recursive rule. ## Keyword Anomaly analysis aborted: empty recursion The �keyword anomaly� analysis has been turned off, since the presence of �recursive rule�s with �empty recursion� can cause infinite loops in the analysis. ## Keyword Anomaly analysis aborted: circular definition The �keyword anomaly� analysis has been turned off, since the presence of a �circular definition� can cause infinite loops in the analysis. ## File Trace disabled: empty recursion Because of the presence of �recursive rule�s with �empty recursion� in this grammar and the infinite loops that can ensue, the �File Trace� function has been disabled. ## File Trace disabled: circular definition Because of the presence of a �circular definition� in this grammar and the infinite loops that can ensue, the �File Trace� function has been disabled. ## Both Error Token Resynch and Auto Resynch Specified This �warning� message indicates that your �grammar� defines an �error token� and also requests �automatic resynchronization�. AnaGram will ignore the request for automatic resynchronization and will provide �error token resynchronization�. If you named a token "error" but do not wish �error token resynchronization�, you can either rename "error", or, in a �configuration section�, you may explicitly specify the error token to be something you don't otherwise use in your grammar: [ error token = not used ] ## Bottom Margin "Bottom margin" is an �obsolete configuration parameter�. ## Bright Background "Bright background" is a �configuration switch� which was used in the DOS version of AnaGram. It is no longer used, but is still recognized for the sake of upward compatibility with old �configuration file�s. ## Build Parser You use the Build Parser command to create a �parser� based on your �grammar�. The parser is a C file consisting of the �embedded C� (which may include C++) code in your �syntax file�, your �reduction procedure�s, a number of tables derived from your grammar specification, and a �parsing engine� customized to your requirements. If you only wish to investigate your grammar and do not wish to create �output files�, use the �Analyze Grammar� command. ## Build <file name> This item on the �Action Menu� is available when you have analyzed a �grammar� but you have not yet built it. It builds the grammar without reloading the �syntax file� from the disk. ## Cannot Make Wrapper for Default Token Type This �warning� message occurs when AnaGram finds a token type that has been previously defined as the �default token type� listed in a �wrapper� statement. If a wrapper is needed for a particular type, you must specify the �data type� explicitly for each relevant �token�. As a result, a wrapper class has not been created for the specified token type. ## Token with Wrapper cannot be Default Token Type This �warning� message indicates that an attempt has been made to specify a class that has previously been listed in a �wrapper� statement as the �default token type�. If a wrapper is needed for a particular type, you must specify the �data type� explicitly for each relevant �token�. As a result, the default token type has not been set. ## Case Sensitive "Case sensitive" is a �configuration switch� which defaults to on. When it is on, it instructs AnaGram to build a parser for which all input is case sensitive. When it is off, the AnaGram builds a parser which ignores case for all input. If the �iso latin 1� configuration switch is turned off, case conversion will be limited to characters in the normal ascii range. When it is on, case conversion will be done for all iso latin 1 characters. If you have other requirements for case conversion, you may provide your own definition in your �embedded c� for the �CONVERT_CASE� macro which is invoked to perform case conversion on input characters. Note that the value of an input token is unaffected by the case sensitive switch. When case sensitive is off, 'a' and 'A' will be treated as the same input token by the parser, but the �token value�s will nevertheless be different. ## C Prologue If you include a block of �embedded C� code at the very beginning of your syntax file, it is called the "C prologue". It will be copied to your �parser file� before any of the code generated by AnaGram. You can use the C prologue to ensure that copyright notices, #include directives, or type definitions, for example, occur at the very beginning of your parser file. If you specify a C or C++ type of your own definition, you must provide a definition in the C prologue. ## CHANGE_REDUCTION CHANGE_REDUCTION(t) is a macro which AnaGram defines in your �parser file� if your �parser� uses �semantically determined productions�. In your �reduction procedure�, when you need to change the �reduction token� you can easily do so by calling CHANGE_REDUCTION with the name of the desired token as the argument. If the token name has embedded spaces, replace the embedded spaces with underline characters. ## Character Constant You may represent single characters in your �grammar� by using character constants. The rules for character constants are the same as in C. The escape sequences are as follows: \a alert (bell) character \b backspace \f formfeed \n newline \r carriage return \t horizontal tab \v vertical tab \\ backslash \? question mark \' single quote \" double quote \ooo octal number \xhh hexadecimal number AnaGram treats a single character as a �character set� which contains only the specified character. Therefore you can use a character constant in a �set expression�. ## Character Map The Character Map table shows you the mapping of input characters to �token numbers�. The �ag_tcv� table in your parser is based on the information in this table. The fields in this table are: character code display character, if any (what Windows displays for this code) �partition set number� �token number� �token representation� The display character will be what Windows displays for the character code in the Data Tables font you have chosen. ## Character Range A "character range" is a simple way to specify a �character set�. There are two ways to represent a character range in an AnaGram �syntax file�. The first way is like a �character constant�: 'a-z'. The second way allows somewhat greater freedom: 'a'..'z' 'a'..255 ^Z..037 -1..0xff Here you use two arbitrary �character representations� separated by two dots. If the two characters are out of order, AnaGram will reverse the order, but will give you a �warning�. More complex �character sets� may be specified by using �union�, �difference�, �intersection�, or �complement� operators. ## Character Representation In an AnaGram �syntax file� you may represent a character literally with a �character constant� or numerically using decimal, octal or hexadecimal representations following the conventions for C. Thus 'A', 65, 0101, and 0x41 all represent the same character. Control characters can be represented using the '^' character and either an upper or lower case letter. Thus ^j and ^J are acceptable representations of the ascii newline code. The rules for character constants are identical to those in C, and the same escape sequences are recognized. ## Character Set In AnaGram grammars you can conveniently specify whole sets of characters at a time. This avoids needless repetition and complexity. Sets of characters may be defined in an AnaGram �syntax file� in any of a number of ways. A single character is taken to represent a character set consisting of a single element. (See �character representation�.) You can also specify a set consisting of a range of characters (see �character range�) and perform the familiar set operations, union, intersection, difference and complement. All the sets you define in your syntax file are summarized in the �Character Sets� window. The �union� of two character sets, represented by a '+', contains all characters that are in one or another of the two sets. Thus, 'A-Z' + 'a-z' represents the set of all upper and lower case letters. The �intersection� of two character sets, represented by a '&', contains all characters that are in both sets. Thus, suppose you have the �definitions� letter = 'A-Z' + 'a-z' hex digit = '0-9' + 'A-F' + 'a-f' Then (letter & hex digit) contains precisely upper and lower case a to f. The �difference� of two character sets, represented by a '-', contains all characters that are in the first set but not in the second set. Thus, using the same definitions as above, (letter - hex digit) contains precisely upper and lower case g to z. The �complement� of a character set, represented by a preceding '~', represents all characters in the �character universe� which are not in the given set. Suppose you have defined a set, �eof�, which consists of the characters which represent end of file. Then, in your grammar where you wish to accept an arbitrary character, what you really want is anything but an end of file character. You can define it thus: anything = ~eof ## Character Sets This window lists all of the distinct �character set�s which you defined, implicitly or explicitly, in your �grammar�. Each line in the table describes one such set. The description takes the form of the internal set number and the defining �expression�. The �Auxiliary Windows� menu will allow you to see the �Partition Sets� which cover the character set, and the �Set Elements� which it comprises, as well as the �Token Usage�. ## Character Universe, Universe The character universe, or set of all expected input characters to your parser, is defined as all characters in the range given by a particular lower bound and a particular upper bound, as described below. The character universe is used for two things in AnaGram. The first use is for calculating the �complement� of a character set. The second use is in the input processing of your parser. Input characters will be used to index a �token conversion� table to convert character codes to token numbers. The length of this table will be given by the size of the character universe. If you have set the �test range� �configuration switch� you parser will verify that the input character is within the range of the conversion table. Otherwise, the character code will not be checked for validity. In this case, an out-of-range character will lead to undefined behavior. If you have not used any characters with negative codes in your grammar, the lower bound is zero. Otherwise, it is the most negative such character. If the highest character code you have used is less than or equal to 255, the upper bound will be 255. If you have used a character code greater than 255, the upper bound will be the largest such code which appears in your syntax file. ## Characteristic Rule Each �parser state� is characterized by a particular set of �grammar rules�, and for each such rule, a marked token which is the next �token� expected. The combination of a grammar rule and its marked token is often called a �marked rule�. A marked rule which characterizes a state is called a "characteristic rule". In the course of doing �grammar analysis�, AnaGram determines the characteristic rules for each �parser state�. After analyzing your grammar, you may inspect the �State Definition Table� to see the characteristic rules for any state in your parser. ## Characteristic Token Every state in a �parser�, except state 0, can be characterized by the one, unique �token� which causes a jump to that state. That token is called the �characteristic token� of the state, because to get to that �parser state� you must have just seen precisely that token in the input. Note that several states could have the same characteristic token. When you have a list of states, such as is given by the �parser state stack�, it is equivalent to a list of characteristic tokens. This list of tokens is the list of tokens that have been recognized so far by the parser. ## Circular Definition If the �expansion rule�s for a �token� contain a �grammar rule� that consists only of the token itself, the definition of the token is circular. A circular definition is an extreme case of �empty recursion�. As in cases of empty recursion, the generated parser may contain infinite loops. When such a condition is detected, therefore, �keyword anomaly� analysis the �File Trace� option are disabled. ## column "column" is an integer field in your �parser control block� used for keeping track of the column number of the current character in your input. Line and column numbers are tracked only if the �lines and columns� �configuration switch� has been set. ## Command Line If you provide the name of a syntax file on the command line when you start AnaGram, it will open the file and run either �Analyze Grammar� or �Build Parser� depending on the setting of the �Autobuild� switch. ## Command Line Version, agcl.exe The command line version of AnaGram, agcl.exe, can be used in make files. It takes the name of a single syntax file on the command line. Error and �warning� messages are written to stdout. Normally you would only use the command line version once you have finished developing your �parser� and are integrating it with the rest of your program. The command line version of AnaGram is not included with trial copies. ## Comment You may incorporate comments in your syntax file using either of two conventions. The first is the normal C convention for comments which begin with "/*" and end with "*/". Such comments may be of arbitrary length. By setting or resetting the �nest comments� switch, you may control whether they may be nested or not. The second convention for comments is the C++ comment convention. In this case the comment begins with "//" and ends with a newline. When writing a �grammar�, you may wish to allow a user to comment his input freely without your having to explicitly allow for comments in your grammar. You may accomplish this by using the �disregard� statement. ## Compile Command "Compile command" is a �configuration parameter� which takes a string value. This parameter was used in the DOS version of AnaGram, but is ignored in the Windows version. ## Complement In set theory, the complement of a set, S, is the set of all elements of the �universe� which are not members of the set S. In AnaGram, the complement operator for �character sets� is given by '~' and has higher precedence than �difference�, �intersection�, or �union�. In AnaGram, the most useful complement is that of the end of file character set. For ordinary ascii files it is often convenient to read the entire file into memory, append a zero byte to the end, and define the end of file set thus: eof = 0 + ^Z. Then, ~�eof� represents all legitimate input characters. You can then use set differences to specify certain useful sets without tedious enumeration. For example, a comment that is to be terminated by the end of line then consists of characters from the set comment char = ~'\n' & ~eof This set could also be written comment char = ~('\n' + eof) ## Completed Rule A "completed rule" is a �characteristic rule� which has no �marked token�. In other words, it has been completely matched and will be reduced by the next input. If there is more than one completed rule in a state, the decision as to which to reduce is made based on the next input token. If there is only one completed rule in a state, it will be reduced by default unless the �default reductions� switch has been reset, i.e., turned off. ## Configuration File If it can find them, AnaGram reads two configuration files to set up �configuration parameter�s. At program initialization, it will first attempt to read a configuration file in the directory that contains the AnaGram executable file you are running. Then it will read a configuration file in your working directory. Both files should have the name "AnaGram.cfg" if they exist. Neither is necessary. If a parameter is specified in both files, the specification in the file from the working directory takes precedence. The effect of this two stage process is to allow you to set your standard preferences in the principal directory, with specific overrides in your working directories. The values for configuration parameters in �syntax files� override those read from configuration files. AnaGram does not save configuration parameters in the Windows registry, nor does it provide any mechanism for setting or changing the values of configuration parameters within AnaGram itself. ## Configuration Parameter Configuration parameters may be specified either in �configuration files� or in your �syntax file�. In your syntax files, configuration parameters are specified, one per line, in a �configuration section�. AnaGram ignores case when identifying a configuration parameter, so that "ALLOW MACROS", "Allow Macros", and "allow macros" are all equivalent forms. There may be any number of configuration sections in a �syntax file�. Any parameter may be specified any number of times. Since AnaGram maintains only one value in storage for these parameters, whenever it refers to one it will see the most recently specified value. Every configuration parameter has a default value which has been chosen to correspond to a standard if it exists, customary usage if such can be determined, or otherwise to the most likely usage. Before executing an Analyze Grammar or Build Parser command, AnaGram resets configuration parameters to their initial values, as determined by the built in defaults and the configuration files read at program initialization. The �Configuration Parameters Window� shows the current settings of all of the configuration parameters. When this window is active you may press �F1� or click with the �help cursor� to pop up a help window describing the parameter under the cursor bar. There are several varieties of configuration parameters. Some simply set or reset a condition. These need simply be stated to set the condition or negated with the tilde (~) to reset the condition. Thus [ nest comments ] causes AnaGram to allow nested comments, and [ ~nest comments ] causes AnaGram to disallow nested comments. If you prefer you may explicitly specify a switch value as on or off: [ nest comments = on] A second kind of configuration parameter takes a value which is the name of a token. Thus [ grammar token = c grammar] specifies that the token, c grammar, is the �grammar token� which is to be analyzed. A third variety of configuration parameter takes a value which is a C data type. Thus [ default token type = unsigned char *] signifies that the �semantic value� of a token, unless otherwise specified is a pointer to an unsigned char. A fourth variety of configuration parameter takes a string value to set some ascii string used by AnaGram. Thus [ header file name = "widget.h" ] signifies that the header file created by AnaGram should be called "widget.h". In string-valued parameters used to specify the names of output files or the name of your parser, you may use the '#' character to indicate the name of your syntax file: When the string is actually used, AnaGram will substitute the syntax file name for the '#'. In string-valued parameters used to specify the names of functions or variables that AnaGram generates, you may use '$' to specify the name of your parser. When the string is actually used, AnaGram will substitute the name of your parser for the '$'. In the "�enum constant name�" configuration parameter you may use '%' to specify where a token name is to be substituted. The final variety of configuration parameter takes a numeric value. The value may be decimal, octal or hexadecimal, following the C conventions, and may have an optional sign. Thus [parser stack size = 50] tells AnaGram to allocate space for at least fifty stack entries when it creates your parser. ## Configuration Parameters Window The Configuration Parameters window lists the �configuration parameter�s AnaGram accepts with their current values, as set by the �configuration files� it has read and by the most recent �syntax file� it has analyzed. Configuration parameters cannot be changed from within AnaGram. ## Configuration Section A configuration section is one of the main divisions of your �syntax file�. It begins with a left square bracket on a fresh line. It then contains definitions of �configuration parameter�s, �configuration switch� settings and �attribute statement�s. These specifications must each start on a new line. The configuration section is closed with a right bracket. Any further component of your syntax file, other than a �comment�, must start on a fresh line. There can be any number of configuration sections in a syntax file. ## Configuration Switch A configuration switch is a �configuration parameter� which can take on only the two values true and false, or on and off. You set a configuration switch, or turn it on, by simply naming it in your �configuration file� or in a �configuration section� of your �syntax file�. You turn it off, or "reset" it, by use of the tilde: "~nest comments", for example, resets, or turns off, the �nest comments� switch. If you prefer, you may assign the value "on" to set the switch, or "off" to reset it. For example: nest comments = on ## Conflict "Conflicts" arise during the �grammar analysis� when AnaGram cannot determine how to treat a given input token. There are two sorts of conflicts: �shift-reduce conflicts� and �reduce-reduce conflicts�. Conflicts may arise either because the grammar is inherently ambiguous, or simply because the grammar analyzer cannot look far enough ahead to resolve the conflict. In the latter case, it is often possible to rewrite the grammar in such a way as to eliminate the conflict. In particular, �null productions� are a common source of conflicts. When AnaGram analyzes your grammar, it lists all unresolved conflicts in the �Conflicts� window. A number of �Auxiliary Windows� available from the Conflicts window provide help in identifying the source of the conflict. There are a number of ways to deal with conflicts. If you understand the conflict well, you may simply choose to ignore it. When AnaGram encounters a shift-reduce conflict while building parse tables it resolves it by choosing the �shift action�. When AnaGram encounters a reduce-reduce conflict while building parse tables, it resolves it by selecting the �grammar rule� which occurred first in the grammar. A second way to deal with conflicts is to set �operator precedence� parameters. If you set these parameters, AnaGram will use them preferentially to resolve conflicts. Any conflicts so resolved will be listed in the �Resolved Conflicts� window. A third way to resolve a conflict is to declare some tokens as �sticky�. This is particularly useful for �production�s whose sole purpose is to skip over uninteresting input. A fourth way to resolve conflicts is to declare a token to be a �subgrammar�. When you do this, AnaGram does not look beyond the definition of the subgrammar token itself for reducing tokens. This is not a particularly selective way to resolve conflicts and should be used only when the subgrammar token is naturally defined only by internal criteria. The tokens identified by lexical scanners are prime examples of this genre. The fifth way to deal with conflicts is to rewrite the grammar to eliminate them. Many people prefer this approach since it yields the highest level of confidence in the resulting program. Please refer to the AnaGram User's Guide for more information about dealing with conflicts. ## Conflicts If there are �conflict�s in your grammar which are not resolved by �precedence rules�, they will be listed in the Conflicts window. The Conflicts window will also be listed in the �Browse Menu�. Conflicts which have been resolved by �precedence rules� are listed in the �Resolved Conflicts� window. The Conflicts window lists the conflicts, or ambiguities, which AnaGram found in your grammar. The table identifies the �parser states� in which it found conflicts, the �conflict token�s for which it had more than one option, and the �marked rules� for each such option. If one of the rules for a particular conflict has a �marked token�, the conflict is a �shift-reduce conflict�. The marked token is the token to be shifted. If none of the rules has a marked token the conflict is a �reduce-reduce conflict�. AnaGram provides a number of �Auxiliary Windows� to help you find and fix the source of the conflict. The �Conflict Trace� window is a pre-built �Grammar Trace� window which shows you one of perhaps many ways to encounter the conflict. The �Reduction Trace� window shows the result of reducing a particular ambiguous rule. In addition, the �Rule Derivation� and �Token Derivation� windows show you why the conflict token is a �reducing token�. They are particularly useful for shift-reduce conflicts. The �Expansion Chain� window is helpful for understanding reduce-reduce conflicts. Other Auxiliary Windows which are often useful are the �State Definition� window, the �Reduction States� window, and the �Problem States� window. Please refer to the AnaGram User's Guide for more information on how to deal with conflicts. ## Conflicts Resolved by Precedence Rules This �warning� message indicates that AnaGram has resolved conflicts in your grammar by using �precedence rules�: guidelines you supplied either by explicit �precedence declarations�, by using a �sticky� statement or �distinguish lexemes� statement, or implicitly by using a �disregard� statement. These conflicts are listed in the �Resolved Conflicts� window, and are not listed in the �Conflicts� window. ## Conflict Token In any given �conflict�, there is a �token� for which an unambiguous �parser action� cannot be determined. This token is called the "conflict token". ## Conflict Trace The Conflict Trace is a ready-made �Grammar Trace� which shows you one of perhaps many ways to get to the state which has the �conflict� selected by the cursor bar. The Conflict Trace window is an option in the �Auxiliary Windows� menu for the �Conflicts� window and the �Resolved Conflicts� window. ## Const Data The const data �configuration switch� controls the use of CONST qualifiers in generated code. If the switch is set, all fixed data arrays in the �parser file� will be qualified as CONST, unless the �old style� switch is set. The default setting is ON. Other configuration switches which control declaration qualifiers in the parser file are �near functions� and �far tables�. ## CONTEXT "CONTEXT" is a macro which AnaGram defines for you if you have defined a �context type�. It provides access to the top value of the �context stack�. Your �GET_CONTEXT� macro may store the current context by assigning a value to CONTEXT. Suppose your parser uses �pointer input�, and you wish to know the value of the �pointer� for every production. You could define GET_CONTEXT thus: #define GET_CONTEXT CONTEXT = PCB.pointer In �reduction procedure�s, you may use the CONTEXT macro to find the context for the rule you are reducing, that is to say, the value the context variables had when the first token in the rule was encountered. ## Context Stack It is often convenient, when writing �reduction procedure�s, to know the actual context of the �grammar rule� your procedure is reducing. To do this you need to know the values that certain variables, such as stack pointers, or input pointers, in your program had at various stages as your parser matched the rule. You can accomplish this by maintaining a context stack. If you wish, AnaGram will keep track, on a stack, of any context variables you wish. To do so, define a structure which can hold all the values you need to stack. Use the �context type� �configuration parameter� to tell AnaGram how to declare the stack. Then define the �GET_CONTEXT� macro to gather the appropriate values and store them on the stack. The �CONTEXT� macro evaluates to the proper location into which the GET_CONTEXT macro should store the context value. AnaGram will invoke the GET_CONTEXT macro whenever necessary to make sure the right values are stacked. In a reduction procedure, you can then use the macro �RULE_CONTEXT� to find the value of the context structure as of the beginning of each token in the rule you are reducing. If your parser is �event driven�, store the context of the input token in PCB.input_context. The default version of GET_CONTEXT will stack the context as appropriate. If your parser should encounter an error, you may use �ERROR_CONTEXT� to determine the values of the context variables at the beginning of the aborted grammar rule. ## context type "Context type" is a �configuration parameter� whose value is a C type name, possibly as defined by a typedef statement. By default, "context type" is undefined. If you define it, AnaGram will set up a �context stack� in your �parser control block� so you can track the context of �production�s. Each time your parser pushes values onto the state stack and value stack it will invoke the �GET_CONTEXT� macro to store the current context on the context stack. The macro �CONTEXT� names the current stack location. In your GET_CONTEXT macro you can use it as the destination for the current context. In a �reduction procedure�, CONTEXT names the context as of the beginning of the production. Two other macros are available to inspect the values of the context stack. In a reduction procedure, you may use �RULE_CONTEXT�[k] to determine the value of the context variable as it was as of the (k+1)th token in the rule. In particular, RULE_CONTEXT[0] is the value the context variable had when the first token in the rule was seen. If you enable the �error frame� �configuration switch�, you may use �ERROR_CONTEXT� to determine the context of the production your parser was trying to identify at the time of the error. ## CONVERT_CASE CONVERT_CASE is a user definable macro which AnaGram invokes to convert the case of input characters when the �case sensitive� switch has been turned off. If you do not define the macro yourself, AnaGram will provide a macro which will convert case correctly for characters in the ASCII character range and also for �ISO latin 1� characters if the corresponding �configuration switch� is on. ## Coverage File Name If you have set the �rule coverage� �configuration switch� to include coverage analysis in your parser, AnaGram uses the value of the coverage file name �configuration parameter� to find the results of your testing. The value of the parameter is a string. The default value is "#.nrc", where '#' represents the name of your syntax file. ## cs cs is a field in a �parser control block� which contains your �context stack�. cs will be defined only if you have defined the �configuration parameter� �context type�. ## Current Grammar The Current Grammar is the �grammar� you presently have loaded. Its name is displayed on the title bar of each AnaGram window. A status field at the right center of the �Control Panel� indicates the state of processing that has been carried out on the grammar. "Loaded" means that the �syntax file� has been read into memory, but that syntax errors have been found. "Parsed" means that AnaGram has tried to analyze the grammar, but got into some kind of difficulty and did not complete the job. The explanation should be apparent from the messages in the �Warnings� window. "Analyzed" means that a �grammar analysis� has been completed, but no �output files� have been written. "Built" means that an analysis has been completed and output files have been written. ## Data Type The �tokens� in your �parser� usually have �semantic values�. The data types for these values will be determined by the �default input type� and �default token type� �configuration parameter�s unless you explicitly provide �token declarations� in your grammar. You may also define the data type for any �nonterminal� token by preceding the token name with an ordinary C cast when you write a production. For example: (int) integer -> '0-9':d =d-'0'; -> integer:n, '0-9':d =10*n + d - '0'; The data type may be any simple C or C++ data type, with arbitrary indirection and qualification. You may also use any type you have defined by means of typedef, struct or class definitions. Template classes may also be used. If you specify a type of your own definition, you must provide a definition in the �C prologue� at the beginning of your �syntax file�. A token may have the type "void" if its value has no interest for the parser. Since your parser will not stack a value for a void token, your parser may run somewhat faster when tokens are declared as void. ## Declare pcb "Declare pcb" is a �configuration switch� that defaults to on. If this switch is set when you invoke the �Build Parser� command, AnaGram will automatically declare a �parser control block� for you, at the beginning of your parser file. If you have used data types that you define yourself, the typedef statements need to precede the parser control block declaration. In this case, you should turn "declare pcb" off and declare it yourself. For more information, see the AnaGram User's Guide. ## Default Input Type The default input type is a �configuration parameter� which determines the �data type� for the �semantic value�s of �terminal tokens� if they are not explicitly declared. Normally, you would explicitly declare terminal tokens only when you have set the �input values� �configuration switch�. If you do not set the default input type, it will default to "int". The default data type for the values of �nonterminal tokens� is given by the �default token type� configuration parameter. ## Default Reduction "Default reductions" is a �configuration switch� which defaults to on. A "default reduction" is a �parser action� which may be used in your parser in any state which has precisely one �completed rule�. If a given �parser state� has, among its �characteristic rules�, exactly one completed rule, it is usually faster to reduce it on any input than to check specifically for correct input before reducing it. The only time this default reduction causes trouble is in the event of a �syntax error�. In this situation you may get an erroneous reduction. Normally when you are parsing a file, this is inconsequential because you are not going to continue semantic action in the presence of error. But, if you are using your parser to handle real-time interactive input, you have to be able to continue semantic processing after notifying your user that he has entered erroneous input. In this case you would want default reductions to have been turned off so that �production�s are reduced only when there is correct input. ## Default reduction value If a �grammar rule� does not have a �reduction procedure� the �semantic value� of the first token in the rule will be taken as the semantic value of the token on the left hand side. If these tokens do not have the same �data type� a �warning� will be given. ## Default Token Type "Default token type" is a �configuration parameter� which determines the �data type� for the �semantic value� of a �nonterminal token� if no other type is explicitly specified. It defaults to void. Therefore, if any �reduction procedure� returns a value, you must either explicitly set the type of the �reduction token� or you must set default token type to an appropriate value. The default token type cannot have a �wrapper� class defined. The default data type for the value of a �terminal token� is given by the �default input type� configuration parameter. ## Definition, Definition Statement AnaGram syntax files may contain definition statements which assign new names to �character sets�, �virtual productions�, �keyword strings�, �immediate actions�, or �tokens�. Definitions have the form name = <character set> name = <virtual production> name = <keyword string> name = <immediate action> name = <token name> For example, letter = 'a-z' + 'A-Z' statement list = statement?... include = "include" The symbols thus defined may be used anywhere the expression on the right hand side might be used. Such definitions, in and of themselves, do not define tokens. Tokens are defined only by their usage in productions. ## DELETE_WRAPPERS If your parser uses �wrapper�s and exits with an error condition, there may be objects remaining on the �parser value stack�. The DELETE_WRAPPERS macro can be used to delete any remaining objects on the stack. If you have enabled �auto resynch�, DELETE_WRAPPERS will be invoked automatically. ## Diagnose Errors "Diagnose errors" is a �configuration switch� which defaults to on. When this switch is on, AnaGram includes a function, ag_diagnose(), in your parser which provides simple syntax error disgnoses. When your parser encounters a syntax error, this function will be called immediately prior to the invocation of the �SYNTAX_ERROR� macro. A pointer to the message will be stored in the �error_message� field of the �parser control block�. If you wish to implement your own �error diagnosis�, you should turn this switch off, and include a call to your own diagnostic procedure in your SYNTAX_ERROR macro. ag_diagnose() provides three possible error messages, governed by three macros: �MISSING_FORMAT�, �UNEXPECTED_FORMAT�, and �UNNAMED_TOKEN�. You may override the definitions of these macros with your own definitions if you wish to provide diagnostics in another language If you have set the �error frame� switch it will also set the �error_frame_token� field. The "error_frame_token" is the non-terminal token which the parser was trying to complete when the error was encountered. When the "diagnose errors" switch is set, AnaGram also includes the a �token names� table in the parser which contains the ascii names of the tokens in the grammar, including entries for character constants and keywords. Use the �token names only� switch to limit the table to explicitly named tokens only. ## MISSING_FORMAT MISSING_FORMAT is a macro that is used by the error diagnositic function created by the �diagnose errors� switch. If you do not define it in your parser, AnaGram will define it thus: #define MISSING_FORMAT "Missing %s" This format is used when the diagnostic function can identify a unique terminal or nonterminal token that would satisfy the syntactic rules and is named in the �token names� table. ## UNEXPECTED_FORMAT UNEXPECTED_FORMAT is a macro that is used by the error diagnositic function created by the �diagnose errors� switch. If you do not define it in your parser, AnaGram will define it thus: #define UNEXPECTED_FORMAT "Unexpected %s" This format is used when the diagnostic function cannot identify a named, unique terminal or nonterminal token that would satisfy the syntactic rules and finds an incorrect token, the name of which can be found in the �token names� table. ## UNNAMED_TOKEN UNNAMED_TOKEN is a macro that is used by the error diagnositic function created by the �diagnose errors� switch. If you do not define it in your parser, AnaGram will define it thus: #define UNNAMED_TOKEN "input" This macro is used as argument for the �UNEXPECTED_FORMAT� macro when the actual, erroneous input cannot be identified. ## Difference In set theory, the difference of two sets, A and B, is defined to be the set of all elements of A that are not elements of B. In an AnaGram �syntax file�, you represent the difference of two �character sets� by using the '-' operator. Thus the difference of A and B is A - B. The difference operator is �left associative�. ## Disregard The purpose of the "disregard" statement is to skip over uninteresting �white space� and comments in your input file. It allows you to specify a token that should be passed over in the input to your parser. The statement takes the form: disregard ws where "ws" is a token name or character set. Disregard statements, like other �attribute statement�s, may be placed in any �configuration section�. You may have more than one disregard statement in your �grammar�. If you do, AnaGram will create a shell production. For example, suppose you write: [ disregard alpha disregard beta ] AnaGram will proceed as though you had written: gamma -> alpha | beta [ disregard gamma ] It frequently happens that you wish your �parser� to disregard blanks or comments, except that �white space� within names, numbers, strings, and other elementary constructs is subject to special rules and thus should not be disregarded blindly. In this case, you can use the "�lexeme�" statement to declare these constructs off limits for the disregard statement. Within these constructs, the disregard statement will be inoperative and the admissibility of white space is determined solely by the productions which define these constructs. Outside those productions which define lexemes, you should not generally use a token which is supposed to be disregarded. If you do, your grammar will have �conflict�s, since the token could satisfy both the explicit usage, as well as the implicit rules set up by the disregard statement. Such conflicts, however, are resolved automatically in favor of your explicit use of the token. The conflicts will appear in the �Resolved Conflicts� window. If you have "open ended" lexemes in your grammar such as variable names or numeric constants, your grammar will detect a conflict if one of these lexemes may follow another such lexeme immediately. To deal with these conflicts, you should turn on the "�Distinguish Lexemes�" configuration switch. It will cause white space to be required as a separator between the lexemes. In order to implement the "disregard" statement AnaGram will redefine some tokens in your grammar. For example, '+' may be redefined to consist of a simple plus sign followed by optional white space: '+' -> '+'%, white space?... The �percent sign� is used to indicate the original, simple plus without the optional white space attached. You will probably notice the percent sign appearing in some windows and traces. ## distinguish keywords "distinguish keywords" is an �attribute statement� which you may include in a �configuration section�. It is used to tell AnaGram how to distinguish �keyword�s from similar sequences of characters in your input stream. For example, you may want your parser to recognize "int" as a keyword when it appears in the following context: int x; but not when in appears in the middle of such words as "integral" and "intolerant". The operand of "distinguish keywords" is a list of character set �expression�s separated by commas and enclosed in braces ({ }). Once AnaGram has read your entire syntax file, it evaluates all of these character sets and tests each keyword string against the character sets in the order in which they were encountered in the program. If all the characters which constitute a particular keyword are members of the specified set, the keyword logic is set up so that it will recognize the keyword only if the immediately following character is not in the set. In the example above, [distinguish keywords {'a-z'} ] will do the trick. The "�sticky�" statement also affects the recognition of keywords. ## Distinguish Lexemes The "distinguish lexemes" �configuration switch� is used in conjunction with the "�disregard�" statement and the "�lexeme�" statement to resolve the �shift-reduce conflict�s which often crop up when suppressing white space. The difficulty with suppressing white space is that you wish it to be optional in cases like "x+y", where it is not necessary in order to parse correctly, but you want to require it in situations such as "mytype x", where it is necessary to separate otherwise indistinguishable constructs. If the white space were optional, it would be necessary to allow for "mytypex", but it would be impossible to determine if this were to be interpreted as "mytype x", "mytyp ex", or any of the many other possibilities. The distinguish lexemes switch causes AnaGram to make the white space optional where doing so causes no ambiguity and makes it mandatory where to make it optional would lead to ambiguity. In the example given above, "mytypex" would be treated as a single name, and another name would have to follow separating white space. The default value for distinguish lexemes is OFF. It is anticipated that this will be changed to ON in future releases of AnaGram. ## Duplicate Production This �warning� message appears when a �production� appears twice in your �grammar�. You will have a number of �reduce-reduce conflict�s as a consequence. Eliminate the duplicate, and the conflicts it caused will go away. ## Edit Command "Edit command" is a �configuration parameter� which accepts a string value. It is no longer used and is retained only for file compatiblity with the DOS version of AnaGram. ## Embedded C You may encapsulate pieces of C or C++ code in your �syntax file� more or less arbitrarily. Such pieces of code will simply be copied to the �parser file� in the order in which they are encountered. Each such piece of code must be enclosed with braces({}). The left brace must be on a new line, and nothing except comments may follow the right brace. AnaGram does not inspect the interior of such a piece of C code except to identify character constants, strings, comments and blocks surrounded with braces so that it does not identify the end of the embedded C prematurely. Note that AnaGram will use the status of the �nest comments� �configuration switch� in effect at the beginning of the embedded C. AnaGram, of course, can be confused by unterminated strings, unbalanced brackets, and unterminated comments. The most likely outcome, in such a situation, is that AnaGram will encounter an end of file looking for the end of the embedded C. Should this happen, AnaGram will identify the beginning of the piece of embedded C which caused the problem. If your syntax file begins with a block of embedded C, called the "�C prologue�", it will be copied to the very beginning of the parser file, preceding all of AnaGram's output. You may use such an initial block of embedded C to guarantee that program title comments, copyright notices and important definitions are at the very beginning of your parser file. The code you include as embedded C, of course, has to coexist with the code AnaGram generates. In order to keep the potential for name conflicts to a minimum, all variables and functions which AnaGram defines begin with the letters "ag_". You should avoid variable names which begin with these letters. If AnaGram finds no embedded C in a syntax file, and you ask it to build a parser, it will automatically generate a main program that calls your parser. If you don't want it to do this, you may turn off the �main program� �configuration switch�. ## Empty Keyword String This �warning� appears when you have a keyword string that contains no characters whatsoever. �Keyword strings� must contain at least one character. If you wish a null match, use a �null production� instead. ## Enable Mouse "Enable mouse" is a �configuration switch� that defaults to on. It is not used in the Windows version of AnaGram and has been retained only for file compatibility with the DOS version. ## Enum Constant Name The "enum constant name" �configuration parameter� allows you to select the name AnaGram will use for the set of enumeration constants it defines in the �parser header� file for your �parser�. The value of "enum constant name" should be a string containing the '%' character. AnaGram will substitute each token name in turn into this template as it creates the list of enumeration constants. If it finds a '$' character it will substitute the name of your parser. The default value of "enum constant name" is "$_%_token". ## Enumeration Constants In your �parser header� file, AnaGram includes a typedef enum statement which provides enumeration constants corresponding to all the named constants in your grammar. The names of the enumeration constants themselves are defined by the �enum constant name� �configuration parameter�. These constants are useful when dealing with �semantically determined productions�. ## Enum Within a �configuration section�, you may use an "enum" statement to define numeric values for any number of tokens just as you define enumeration constants in C. The syntax is effectively the same as the enum statement in C: [ enum { first = 60, second, third, fourth = 'a', fifth, } ] is exactly equivalent to first = 60 second = 61 third = 62 fourth = 'a' fifth = 'b' ## eof "eof" is a quasi reserved word in AnaGram, used to specify an end of file token. You may use another token as an end of file delimiter by setting the �Eof Token� �configuration parameter�. eof is not required unless you use �automatic resynchronization� in your �parser�. If you have not defined eof or specified an Eof Token parameter, �File Trace� may show a syntax error when it encounters the end of a test file. There are various ascii values that are commonly used to represent an end of file. The end of a string in memory is commonly 0, DOS uses ^Z, Unix uses ^D, and Unix style stream I/O uses -1. It is often convenient then to define eof = -1 + 0 + ^D + ^Z ## Eof Token "Eof token" is a �configuration parameter� which accepts a token name as a value. There is no default value. AnaGram does not need a specification for the eof token unless you are using its �automatic resynchronization� facility. If you use the �automatic resynchronization� capability of AnaGram, you must specify explicitly an end of file token. You can do this either by defining a �terminal token� in your �grammar� called eof or by using the "eof token" parameter to identify some other terminal token to be used as the end of file marker. You would do this only if you must use the name "�eof�" for some other purpose. Note that "eof" is case sensitive. Neither Eof nor EOF will qualify as end of file tokens unless you explicitly specify them using the eof token parameter. ## Eof Token Not Defined This �warning� appears if you have requested either �error token resynchronization� or �automatic resynchronization� and you have not defined an �eof token�. The resynchronization procedure will not work correctly at end of file. ## Error Action The error action is one of the four �parser action�s of a traditional �parsing engine�. The error action is performed when the parser has encountered an input token which is not admissible in the current state. The further behavior of a traditional parser is undefined. ## Error Defining "Error defining TXXX: <token representation>" is a �warning� message which appears if errors are encountered while attempting to evaluate the �character set� for the specified �token�. This warning is always generated in addition to more detailed warnings that are made when the actual errors are encountered. ## Error frame "Error frame" is a �configuration switch� which defaults to off. You use this switch to specify the �error diagnosis� capabilities of your parser. If this switch is set and the �diagnose errors� switch is set, i.e., on, your parser will include a function which will determine the "context" of any �syntax error�, that is, the token the parser was trying to complete. To determine the context of an error, your parser will scan backwards through the �parser state stack�, examining �characteristic rules� until it finds a state which can accept a unique �nonterminal� reduction token that you have not marked as �hidden�. It will then set PCB.�error_frame_ssx� to the �parser stack index� for that level. ## ERROR_CONTEXT ERROR_CONTEXT is a macro AnaGram defines for you. If your parser encounters a �syntax error�, you have enabled the �error frame� �configuration switch�, and you have defined a �context type�, ERROR_CONTEXT will enable you to access the �context� as of when the parser encountered the beginning of the �error_frame_token�. ## Error Diagnosis "Error diagnosis" and �error recovery� are the two aspects of �error handling�. If in the �embedded C� portion of your syntax file you define a macro called �SYNTAX_ERROR�, it will be invoked by the parser when a �syntax error� is encountered. If you have set the �diagnose errors� �configuration switch�, the �error_message� field of the �parser control block� will contain a pointer to a string containing a diagnostic message. The diagnostic is of the form "Missing <token name>" or "Unexpected <token name>". If you do not define SYNTAX_ERROR it will be automatically defined so that a message will be written to stderr. If the �lines and columns� switch has been set you will have the current line number and column number available for your diagnostic message. If you have set the �error frame� switch as well as the diagnose errors switch, the variable PCB.�error_frame_token� will identify the �nonterminal token� the parser was trying to recognize when the error was encountered. Of course, if your parser is controlling direct keyboard input, a diagnosis might be unnecessary. In this case you might define SYNTAX_ERROR so that it simply beeps at the user and let it go at that. ## Error Handling Rarely is a parser built to read an arbitrary input file. The normal situation is that the parser is built to read files that conform to the rules specified in a grammar, rules that describe a class of input files rather than all possible input files. If the input file does not conform to the grammar, the parser will detect a �syntax error�. There are two aspects to error handling in your parser: �error diagnosis� and �error recovery�. Error diagnosis consists in informing your user that something unexpected has happened. Error recovery consists in either aborting the parse, or getting it started again in some reasonable manner. AnaGram provides several options for both error diagnosis and error recovery. When a syntax error is encountered, first your error diagnosis option is executed and then your error recovery option is executed. ## error_message error_message is a field in a �parser control block� to which your �error handling� procedures may refer. If you have set the �diagnose errors� �configuration switch�, on encountering a �syntax error� your �parser� will create a string containing an appropriate diagnostic message and store a pointer to it into PCB.error_message. ## Error Trace "Error Trace" is both a �configuration switch� and the name of an option in the �Action Menu�. If the switch is on, AnaGram adds code to your parser to capture state information to a file in case of a �syntax error�. The Error Trace option can then read this information and prepare a pre-built �Grammar Trace� showing you the state of the parser at the time of the error. The name of the file is determined by the macro �AG_TRACE_FILE_NAME�. AnaGram will provide a default definition for the macro consisting of the name of your �syntax file� plus the extension ".etr". You may override this definition by defining AG_TRACE_FILE_NAME in your �embedded C�. If error trace is enabled, AnaGram will also enable the Error Trace option on the �Action Menu�. If you select Error Trace AnaGram will initialize a �Grammar Trace� window from the error trace file you select. The parser stack of the trace will be as it was when the error occurred. The last line of the parser stack pane will show the �lookahead token� that caused the syntax error. You may then use the Grammar Trace to explore the nature of the syntax error your parser encountered. AnaGram will warn you if the error trace file is older than the syntax file, since under those conditions, the error trace file might be invalid. ## AG_TRACE_FILE_NAME AG_TRACE_FILE_NAME is a C macro used to determine the name of the file your parser will write when it encounters a �syntax error� if you have enabled the �error trace� �configuration switch�. You may define AG_TRACE_FILE_NAME in your �embedded C�. AnaGram provides a default definition given by the name of your �syntax file� with the extension ".etr". ## Error Recovery Error recovery is the process of continuing after a �syntax error�. AnaGram offers several options. These are controlled by �configuration parameter�s and by your grammar. If you do not specify any error recovery, your parser will simply return to the calling program when it encounters a syntax error. �PCB�.�exit_flag� will be set to two, to indicate termination on syntax error. If you wish your parser to simply ignore the erroneous token and continue, set PCB.exit_flag to zero in your �SYNTAX_ERROR� macro. You might use this option if your parser is dealing directly with keyboard input. You may wish to use YACC type error handling. To do this, simply incorporate a token called "error" in your grammar, or specify some other token as an �error token�. On syntax error, your parser will back up to the most recent state where "error" was acceptable input, treat the bad input as an instance of error, and then skip all input until it finds an acceptable input token. At that point it will proceed as though nothing had happened. AnaGram also provides an �automatic resynchronization� option, which uses a complex heuristic to compare input tokens against all stacked states in order to find the best state from which to continue. ## Error Token Resynchronization One of your options for �error recovery� after a �syntax error� is a technique similar to that provided in YACC. You include a terminal token called "error" in your grammar. (Or, use the �error token� configuration parameter to specify some other token to serve this purpose.) When the parser encounters an error in the input, after invoking the �SYNTAX_ERROR� macro, it backs up the �parser state stack� to the most recent state in which "error" was an acceptable input. It then shifts to the new state as though it had seen an actual "error" token. At this point, it skips over any character in the input which is not an acceptable input character for this state. Once it does find an acceptable input character, it continues processing as though nothing had happened. ## error_frame_ssx error_frame_ssx is a field in a �parser control block� to which your �error handling� routines may refer. When your �SYNTAX_ERROR� macro is called, if you have set both the �diagnose errors� and �error frame� configuration switches, error_frame_ssx will contain the value of the �parser stack index� at the beginning of the �error_frame_token�. For example, if in a syntax file, you fail to close a comment, AnaGram will encounter an illegal end of file in the comment. In this situation, error_frame_token is the token for a comment, and error_frame_ssx gives the parser stack depth at the beginning of the comment. ## error_frame_token error_frame_token is a field in a �parser control block� to which your �error handling� routines may refer. If you have set both the �diagnose errors� and �error frame� �configuration switch�es, when your �SYNTAX_ERROR� macro is called, it will contain the �token number� of the error_frame_token. ## error, Error Token "Error token" is a �configuration parameter� that takes a token name for a value. It has no default value. If you do not specify it, and your grammar has a terminal token called "error", it will be used as the error token. If you have an error token defined your parser will presume that you wish to use the �error token resynchronization� method of �error recovery�. ## Escape Backslashes "�Escape backslashes�" is a �configuration switch� that defaults to off. When turned on, the �line numbers� switch will write pathnames with doubled backslashes. The switch is no longer necessary, since AnaGram now uses forward slashes in the pathnames in #line directives rather than backslashes.switch. ## Event Driven It is often convenient to configure your parser to be "event driven". In this situation, instead of calling your parser once to process the entire input, you call an �initializer� to initialize the parser, and then you call the parser once for each input token. Each time you call it, the parser processes the single input token until it can do no more. You can interrogate the �exit_flag� field of the �parser control block� to determine whether the parse is complete or whether the parser encountered an error. Event driven parsers are especially convenient for dealing with terminal input or communications protocols. ## Event Driven Parser Cannot Use Pointer Input This �warning� message appears if you specify pointer input for your �parser� and also specify that it should be event driven. If you are going to use �pointer input�, you should not specify your �parser� as event driven. Conversely, if you really want an �event driven� parser, you cannot specify pointer input. ## Excessive Recursion This �warning� message appears if an internal stack in AnaGram overflows because of the complexity of an expression in your �grammar�. Simplify your grammar by using �definition� statements to name subexpressions. ## exit_flag exit_flag is a field in the �parser control block�. When your parser returns, PCB.exit_flag contains an exit code describing the outcome of the parse. Mnemonic values for the exit codes are defined in the parser header file AnaGram generates. These mnemonics, their values and their meanings are: AG_RUNNING_CODE = 0: Parse is not yet complete AG_SUCCESS_CODE = 1: Parse terminated successfully AG_SYNTAX_ERROR_CODE = 2: Syntax error encountered AG_REDUCTION_ERROR_CODE = 3: Bad reduction token encountered AG_STACK_ERROR_CODE = 4: Parser stack overflowed AG_SEMANTIC_ERROR_CODE = 5: Semantic error, user defined An AnaGram parser checks exit_flag on return from every �reduction procedure�. AnaGram will exit with the flag unchanged if it is non-zero. To halt a parse from a reduction procedure, then, you need only set the exit_flag to AG_SEMANTIC_ERROR_CODE, or any other unused value greater than zero that suits your needs. ## Expansion, Expansion Rule In analyzing a �grammar�, we are often interested in the full range of input that can be expected at a certain point. The expansion of a �token� or state shows us all the expected input. An expansion yields a set of �marked rule�s. The �marked token� in each rule shows us what input to expect. The set of expansion rules of a (�nonterminal�) token shows all the expected input that can occur whenever the token appears in the grammar. The set consists of all the �grammar rule�s produced by the token, plus all the rules produced by the first token of any rule in the set. A �marked token� for an expansion rule of a token is the first element in the rule. The expansion of a state consists of its �characteristic rule�s plus the expansion rules of the marked token in each characteristic rule. ## Expansion Chain You may select an Expansion Chain window from the �Auxiliary Windows� popup menu of most windows that contain �expansion rule�s. The Expansion Chain window is extremely useful for indicating why a particular �grammar rule� is an �expansion rule� in a particular state. To see a chain of productions that produces a desired expansion rule, select the expansion rule with the cursor bar, press the right mouse button for the Auxiliary Windows menu, and select Expansion Chain. The Expansion Chain window will then present a sequence of expansion rules, using the same format as the Expansion Rules window, but subject to the constraint that each rule is produced by the �marked token� in the previous line. The first rule in the window is a �characteristic rule� for the given state. The last rule in the window is the rule selected by the cursor bar in the window from which you chose the Expansion Chain. It should be noted that this expansion is not unique. There may be other derivations. ## Expansion Rules You may select an Expansion Rules window from the �Auxiliary Windows� popup menu of most windows which display �marked rules�. The Expansion Rules window shows the complete set of �expansion rule�s for the �marked token� in the highlighted rule. In other windows, including all trace windows, the Expansion Rules window shows the expansion of the token on the highlighted line. ## F1 Use the F1 key to bring up a context sensitive help window. Because of various peculiarities of the Windows API, there are a few contexts where the F1 key does not work; however, generally the �help cursor� works where F1 does not and vice versa. �Help� windows have hypertext links to related help windows. In a help window, the right mouse button pops up a menu of all the links for the window. ## extend pcb The "extend pcb" statement is an �attribute statement� that allows you to add declarations of your own to the �parser control block�. With this feature, data needed by �reduction procedure�s can be stored in the pcb rather than in global or static storage. This capability greatly facilitates the construction of �thread safe parsers�. The extend pcb statement may be used in any configuration section. The format is as follows: extend pcb { <C or C++ declaration>... } It may, of course, extend over multiple lines and may contain any C or C++ declarations. AnaGram will append it to the end of the parser control block declaration in the generated parser �header file�. There may be any number of extend pcb statements. The extensions are appended to the pcb in the order in which they occur in the syntax file. The extend pcb statement is compatible with both C and C++ parsers. Note that even if you are deriving your own class from the parser control block, you might want to use the extend pcb to provide virtual function definitions or other declarations appropriate to a base class. ## Far Tables "Far tables" is a �configuration switch� which defaults to off. If it is set, when AnaGram builds a �parser� it will declare the larger tables it builds as FAR. This can be a convenience when using some memory models with 8086 architecture. ## Fatal Syntax Errors This �warning� message occurs when AnaGram cannot complete the �Analyze Grammar� command on your �syntax file� because of errors in your syntax file. ## File Trace You can use the File Trace facility to verify your grammar, even before you have implemented �reduction procedures� or any other code. Thus you can defer writing procedural code until you have the grammar working to your specifications. To run File Trace, select File Trace from the �Action Menu� or click on the File Trace button. Select a test file. When the �File Trace Window� appears, double click at any point in the �test file pane�, or click the �Parse File� button to parse the entire file. AnaGram will parse up to the point you have selected according to the rules in your �grammar�. If the test file does not conform to the rules of the grammar, the parse will halt with a �syntax error�. You can then inspect the �Parser Stack pane� and the �Rule Stack pane� to get an idea of the nature of the problem. AnaGram uses different colors to distinguish the portion of the test file that has been parsed from the portion that has not been parsed, so the location of the error should be readily apparent. Since the syntax error often occurs somewhat downstream from the actual error, you may need to back the parse up and approach the error slowly. In the Test File pane, double click at any point prior to the error to back the parse up to that point. You can then click on the �Single Step� button to perform a single parser action. You may also use the cursor keys to control the parse. As long as no error is encountered, the parse is locked to the blinking cursor. If you cursor past the syntax error, however, the parse can no longer track the cursor so the cursor location will differ from the parse location . The cursor and parse locations will also differ after you single click at any point other than the current parse location. When the cursor and the parse location are thus out of synch, the Single Step button is replaced with a �Synch Parse� button. You can click on Synch Parse to get the parse back in synch with the cursor. The File Trace option will be greyed out on the �Action Menu� if your grammar has �empty recursion�, since such a grammar may cause infinite loops in the parser. Because a File Trace is based on character codes, it will also be greyed out on the �Action Menu� if your parser uses �token input� rather than character input. All parser actions performed by a File Trace update the �trace coverage� counts, enabling you to verify the extent to which your test files exercise your parser. Normally, AnaGram reads test files in "text" mode, discarding carriage return characters. If your parser needs to recognize carriage return characters explicitly, you should turn the "�test file binary�" switch on. ## File Trace Window The �File Trace� window normally consists of three panes: The �Parser Stack pane� The �Test File pane� The �Rule Stack pane� If your grammar uses �semantically determined productions�, the �Reduction Choices pane� will appear when necessary to allow you to select a �reduction token�. The choice that you make will be remembered and reused if you should back up the parse and parse past this point again. The remombered choice is not made automatically when you use �Single Step�. Thus, if you wish to change your choice, position the cursor at the location where the choice must be made and Single Step past the choice. If you �reload� the test file, the choices you have made will be discarded. The active pane has a distinctively colored title panel and cursor bar. You can use the tab key to tab among the panes. The function of other keyboard keys depends on which pane is active. Along the bottom of the File Trace Window is a toolbar with two status boxes: �Parse Location� �Parse Status� and five buttons: �Single Step� �Parse File� �Reset� �Reload� �Help� If the blinking cursor loses synch with the current parse location, the Single Step button is replaced with the �Synch Parse� button. ## Grammar Trace Window The �Grammar Trace� window normally consists of three panes: The �Parser Stack pane� The �Allowable Input pane� The �Rule Stack pane� If your grammar uses �semantically determined productions�, the �Reduction Choices pane� will appear when necessary to allow you to select a �reduction token�. The active pane has a distinctively colored column header and cursor bar. You can use the tab key to tab among the panes. The function of other keyboard keys depends on which pane is active. Along the bottom of the Grammar Trace Window is a toolbar with a �Parse Status� box, a �text entry� field and four buttons: �Proceed� �Single Step� �Reset� �Help� In the �Parser Stack pane� you can see a representation of the �parser state stack� and �parser state� as they might appear in the course of execution of your �parser�. You can examine the �allowable input� tokens and see the changes to the state and the state stack caused by any input token you choose. The �Rule Stack pane� shows the relationship between the contents of the parser stack and your �grammar�. If your grammar uses �semantically determined productions�, you can select the appropriate �reduction token� from the �Reduction Choices pane�. You can enter text characters directly in the �text entry� field. This means you can run a Grammar Trace like a �File Trace� where the test file is replaced by the characters you type in the text entry field. This is a very convenient way to check out your grammar. ## Test File, Test File Pane In the �File Trace�, the file under test is displayed in the upper right pane. To parse to a specific point, double click at that point. As long as the parse location and the cursor are synchronized, when you use the cursor keys to move the cursor, the parse will track the cursor. If the parse encounters a �syntax error�, it will not be able to go beyond the location of the error. In this situation, moving the cursor right or down will cause the cursor position to differ from the parse location. The parse and cursor positions can also differ if you single click anywhere in the Test File pane. If the parse location and the cursor are thus not synchronized, the �Single Step� button will be replaced with a �Synch Parse� button. Click on the Synch Parse button to get the cursor and the parse back in synch. Of course, the parse will still not be able to proceed past a syntax error. In the default color scheme, parsed text is shown on a lighter background than is unparsed text. If your grammar uses �semantically determined production�s, the parse will halt when one is encountered and the �reduction choices pane� will be displayed so you may select the appropriate �reduction token�. At any time you can click on the �Reset button� to reset the parse to the beginning of the test file. If you modify the test file, you can click on the �Reload button� to load the modified file and reset the parse. Normally, AnaGram reads test files in "text" mode, discarding carriage return characters. If your parser needs to recognize carriage return characters explicitly, you should turn the �test file binary� �configuration switch� on. Sample test files are provided with the FFCALC and FC �examples�. ## Parse Location The current location of the �File Trace� parser in the �test file pane�. The format is <line number>:<column number>. ## Parse Status The current state of the �File Trace� or �Grammar Trace� parser. Ready: The parser is ready for input. Running: The parser is processing input. Parse Complete: The parser has reached the end of the input. Click on �reset� or �reload� to restart the parse. Syntax error: A syntax error has been encountered. The parser cannot go any further. Unexpected end of file: The parser has reached the end of the actual input but the grammar still expects more. Select reduction token: The parser encountered a �semantically determined production�. Select a �reduction token� from the �Reduction Choices pane�. Selection error: The reduction token selected from the Reduction Choices pane was not allowable input in the present state. Select another reduction token. ## Parse File Use the Parse File button in the �File Trace� to parse all the way to the end of file. The parse will not stop until it encounters a �syntax error�, a �semantically determined production�, or the end of file. ## Reset Use the Reset button in the �File Trace� or �Grammar Trace� to reset the parse to its initial state. This is most convenient when using a �Conflict Trace�, �Error Trace�, or other �Auxiliary Trace� since these traces seldom begin at state 0. ## Reload The Reload button in the �File Trace Window� rereads the test file. This is convenient if you modify the test file while you are testing the �grammar�. ## Lookahead Token In an �LALR-1 parser� the "lookahead token" is the next token to be processed. For each �parser state� there is a list of tokens that may be seen in this state. For each token there is a corresponding �parser action�. The parser scans the list looking for the lookahead token and then performs the corresponding parser action. If the lookahead token cannot be found and there is no �default reduction�, the parser signals a �syntax error�. In File Trace, and in some circumstances in Grammar Trace, the lookahead token can be seen on the last line of the �Parser Stack pane�. ## GET_CONTEXT If you have defined a "�context type�" �configuration parameter�, and wish to maximize the performance of your parser, you should write a GET_CONTEXT macro which stores the context of the input token directly in �CONTEXT�, the current stack location. Otherwise, you can write your �GET_INPUT� macro so that it stores context into �PCB�.�input_context�. The default definition for GET_CONTEXT will then copy PCB.input_context to the �context stack� at the appropriate time. ## GET_INPUT GET_INPUT is a macro which you should define to control �parser input� if your parser is not �event driven� and you are not using �pointer input�. If you don't define it, AnaGram will define it by default to read a single character from stdin: #define GET_INPUT (PCB.input_code = getchar()) �PCB�.�input_code� is an integer field in the �parser control block� which is used to hold the current character code. You may also want GET_INPUT to set the values of �input_context� or �input_value�. It may call an input function, or it may execute in-line code when it is invoked. ## iso latin 1 The "iso latin 1" �configuration switch� controls case conversion on input characters when the �case sensitive� switch is set to off. When "iso latin 1" is set, the default �CONVERT_CASE� macro is defined to convert correctly all characters in the latin 1 character set. When the switch is off, only characters in the ASCII range (0-127) are converted. ## Dragon Book The "dragon book" is the classic reference on formal parsing: Compilers: Principles, Techniques, and Tools Aho, Sethi, and Ullman Addison-Wesley, 1986. It is called the "dragon book" because of its colorful cover illustration showing a knight in armour ("data flow analysis") armed with sword ("�LALR parser generator�") and shield ("syntax directed translation") at his PC attacking a bright red dragon ("complexity of compiler design"). ## LALR-1 Parser An LALR-1 parser is a �parser� created from a �grammar� by an �LALR parser generator�. ## LALR Parser Generator LALR(k) (LookAhead Left-to-right Rightmost derivation) parser generators are programs that create parsers algorithmically from formal grammars. The (k) refers to the number of lookahead symbols used to make parsing decisions. Normally, k = 1. LALR parsers are a subset of the class of so-called LR parsers. LALR parsers are generally more compact and less costly to create. These advantages are obtained at a slight sacrifice in generality. Although is possible to contrive an LR grammar which has �conflict�s when analyzed with the LALR algorithm, such situations rarely occur in practice, and can be easily resolved by rewriting a few rules. In the �dragon book�, section 4.7, the authors list the following attractive properties of LR parsing: LR parsers can be constructed to recognize virtually all programming-language constructs for which context-free grammars can be written. The LR parsing method is the most general nonbacktracking shift-reduce parsing method known, yet it can be implemented as efficiently as other shift-reduce methods. The class of grammars that can be parsed using LR methods is a superset of the class of grammars that can be parsed with predictive parsers. An LR parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input. ## Getting Started AnaGram is an �LALR parser generator�. Its input is a �syntax file�, which you prepare with an ordinary programming editor. Its output is a �parser file�. which you can compile with a C or C++ compiler on any platform and link into your program. To compile on Unix platforms, set the �no cr� �configuration switch�. AnaGram has extensive context-sensitive hypertext �help�. In any AnaGram window, press �F1� or select an item with the �Help Cursor�. Further documentation in HTML format, including documentation of examples, is found in the html subdirectory. AnaGram also has a comprehensive hard-copy manual, the AnaGram User's Guide. If you are new to AnaGram, you might begin by reviewing the Help Topics �How AnaGram Works� and �Program Development�, and looking at An Annotated Example and Summary of AnaGram Notation in the HTML documentation. If you are not already familiar with formal parsing techniques, you may want to read Introduction to Syntax Directed Parsing in the HTML documentation. Note also the Fahrenheit to Celsius conversion examples in the examples/fc directory, which comprise a graded sequence of syntax files illustrating most of the basic principles of �syntax directed parsing� in easy steps. Documentation is in html/fc.html. AnaGram has many features, many of which are not commonly found in parser generators: the �configuration section� �thread safe parsers� C++ support the �disregard� and �lexeme� statements �event driven� parsers �character sets� �virtual productions� �File Trace�, �Grammar Trace� �automatic resynchronization� �error token resynchronization� To familiarize yourself with the many options available for configuring your parsers, select �Configuration Parameters� from the �Browse Menu�. Use �F1� or the �Help Cursor� to pop up explanations of the various parameters. If you don't find the information you need, please visit the AnaGram web page at http://www.parsifalsoft.com for further information and support. ## How AnaGram Works AnaGram contains an �LALR Parser Generator� which creates a table driven �LALR-1 parser� from a �grammar� written in a variant of Backus-Naur Form. AnaGram works in two steps. In the first step, or analysis phase, it reads a �syntax file� and compiles a number of tables describing the grammar. In the second step, or build phase, it writes two output files: a �parser file� written in C or C++ and a �header file�. Syntax files normally have the extension .syn. The rules for writing syntax files are given in the AnaGram User's Guide and in the Summary of AnaGram Notation in the HTML documentation. The header file contains definitions and declarations, including the definition of a �parser control block�. The parser file consists of: The �C prologue�, if any. Definitions and declarations provided by AnaGram. �Reduction procedure�s. a customized �parsing engine�. a �parse function� to be called when input is to be parsed. The name of the parser file is controlled by the �parser file name� �configuration parameter�. The name of the parse function itself is controlled by �parser name�. In the default case, the parser file will have the same name as the syntax file, with the extension .c. The name of the parse function is given by the �parser name� parameter. It defaults to the name of the syntax file. ## Examples The EXAMPLES directory of the AnaGram distribution disk contains a number of examples to help you get started. Documentation for the examples, in HTML format, is located in the html directory (start at index.html or examples.html). The traditional Hello, World, in examples/hw, is a good example for getting familiar with the mechanical procedures of building both C and C++ parsers from �syntax file�s. The Fahrenheit/Celsius conversion examples in the examples/fc directory on your AnaGram diskette comprise a graded sequence of syntax files which illustrate most of the basic principles of �syntax directed parsing� in easy steps. In addition, these examples demonstrate many features of AnaGram which are not found in other parser generators: the �configuration section� �character sets� �virtual productions� �error token resynchronization� �File Trace� the �disregard� and �lexeme� statements �event driven� parsers The Four Function Calculator (examples/ffcalc) is used traditionally to demonstrate parser generators. If you are already familiar with �syntax directed parsing� this example will give you a good overview of the basics of AnaGram. An annotated version of this example may be found in AnaGram's HTML documentation. The FFCALC example illustrates the use of �precedence rules� to resolve �conflicts�. Other examples are available to demonstrate additional features of AnaGram. RCALC (examples/rcalc) is a simple four function calculator which accepts roman numeral input. It illustrates the following AnaGram features: �pointer input� �SYNTAX_ERROR� macro �context stack� DSL (examples/dsl) is a complete DOS script language, which provides capabilities well in excess of DOS batch files. DSL is a complete working program, used in the past to create AnaGram's install program. Some of the specific features of AnaGram which it illustrates are: �distinguish lexemes� �distinguish keywords� �far tables� MPP is a fully functional macro preprocessor for C or C++. Included with MPP are two C grammars, either of which may be incorporated into MPP. MPP uses several parsers that work together: TS.SYN is the primary token scanner parser that identifies tokens, and handles preprocessor commands. MAS.SYN is used to do macro argument substitution. CT.SYN is used to identify tokens that result from string concatenation during macro argument substitution. EX.SYN is used to evaluate constant expressions in #if preprocessor statements. Among the more powerful features of AnaGram that MPP illustrates are: �semantically determined productions� �event driven� parsers ## Goal, Goal Token, Start Token The �grammar token� is the token which represents the "top level" in your grammar. Some people refer to it as the "goal" or "goal token" and others as the "start token". Whichever it is called, it is the single token which describes the complete input to your parser. The most common way to specify a grammar token is as follows: grammar -> statements?..., eof This production tells AnaGram that the input to your parser consists of a (possibly empty) sequence of statements followed by an end of file token. There are a number of ways of specifying which token in your �syntax file� represents the top level of your grammar. You may simply name it "grammar", or you may tag it with a '$' character when you define it, or you may set the �grammar token� �configuration parameter�. If you should inadvertently tag several tokens with the '$' character and/or set the grammar token parameter, it is the last such specification in the file which wins. Some people develop their grammars bottom up, gradually adding new levels of complexity. In the course of development, they may specify a number of tokens as grammar tokens and forget to remove the old specifications. Notice that if you define the token "grammar" anywhere in your syntax and specify the grammar token otherwise, "grammar" will not be the grammar token. This is to keep "grammar" from being a reserved word. If you need to use it in your syntax for something other than the whole grammar, you are free to do so. ## Grammar Traditionally, a "grammar" is a set of �production�s which taken together specify precisely a set of acceptable input streams, in terms of an abstract set of �terminal tokens�. The set of acceptable input streams is often called the "language" defined by the grammar. In AnaGram, the term "grammar" also includes �configuration sections� as well as the �definitions� of �character sets� and �virtual productions� which augment the collection of productions. The term is often used in contrast to the term "�syntax file�" which is used to signify the complete AnaGram source file including reduction procedures and embedded C or the term "�parser�" which refers to AnaGram's output file. A grammar is often called a "syntax", and the rules of the grammar are often called syntactic rules. ## Grammar Analysis The major function of AnaGram is the analysis of context-free grammars written in a particular variant of Backus-Naur Form. The analysis of a grammar proceeds in four stages. In the first, the input grammar is analyzed and a number of tables are built which describe all of the �production�s and components of the �grammar�. In the second stage, AnaGram analyzes all of the character sets defined in the grammar, and where necessary, defines auxiliary tokens and productions. In the third stage, AnaGram identifies all of the states of the parser and builds the go-to table for the parser. In the fourth stage, Anagram identifies �reduction tokens� for each completed �grammar rule� in each state and checks for �conflict�s. Use the �Analyze Grammar� command to cause AnaGram to analyze your grammar. ## Grammar Is Ambiguous This �warning� message appears if your �grammar� contains �conflict�s. AnaGram will resolve �shift-reduce conflicts� by selecting the shift option. It will resolve �reduce-reduce conflicts� by selecting from the conflicting �grammar rule�s the one which appears first in the �syntax file�. ## Grammar Rule A "grammar rule" is the right hand side of a production. It is a sequence of �rule elements�. Each rule element identifies some token, which can be either a �terminal token� or �nonterminal token�. A grammar rule is "matched" by a corresponding sequence of tokens in the input stream to the parser. The rule elements in the grammar rule may be �token name�s, �set expressions�, �character constants�, �immediate action�s, �keyword strings�, or �virtual productions�. A grammar rule may be followed by an optional �reduction procedure�. The �semantic values� of the tokens that comprise the rule may be passed to the reduction procedure by using �parameter assignments�. A grammar rule always makes up the right hand side of a production. The left hand side of the production identifies one or more �nonterminal tokens�, or �reduction tokens�, to which the rule reduces when matched. If there is more than one reduction token, the production is called a �semantically determined production� and there should be a �reduction procedure� to select the correct reduction token at run time. ## Grammar Token The "grammar token" �configuration parameter� may be used to specify the �goal�, or "start" token for the syntax analyzer portion of AnaGram. Alternatively, you could simply call the token "grammar", or you could append a '$' character to it when you define it. Each grammar must have a grammar token specified before it can be analyzed or before a parser can be built. The grammar token is the single token to which the grammar finally condenses. When this token is identified by the parser, the parse is complete. ## Grammar Trace AnaGram's Grammar Trace facility lets you examine the workings of your �parser� in detail. You can use the Grammar Trace as soon as you have analyzed your �grammar�, even before you have written any �reduction procedure�s or other code. Thus you can defer writing procedural code until you have the grammar working to your specifications. Select the �Grammar Trace Window� from the �Action Menu� or click on the Grammar Trace button. In the �Parser Stack pane� you can see a representation of the �parser state stack� and �parser state� as they might appear in the course of execution of your �parser�. The �Rule Stack pane� shows the relationship between the contents of the parser stack and your �grammar�. If your grammar uses �semantically determined productions�, you can select the appropriate �reduction token� from the �Reduction Choices pane�. At any stage, the �Parser Stack� represents a parse in progress. It shows the sequence of �token�s that have been input so far and the states in which they were seen. When a production is complete and the grammar rule is reduced, the tokens that make up the rule are removed from the stack and replaced by the token on the left side of the production. Initially, the Parser Stack contains only a �lookahead line�. To explore your grammar, choose �token�s one by one from the �Allowable Input� pane. This pane shows the tokens allowable at the current state of the grammar, and the actions that result when the tokens are chosen. You can also enter text characters directly in the �text entry� field. This means you can run a Grammar Trace like a �File Trace� where the test file is replaced by the characters you type in the text entry field. This is a very convenient way to check out your grammar. Text entry is, of course, not appropriate for grammars that expect �token input�. In a �File Trace� you can advance the parse no matter which pane is active. In a Grammar Trace there is a question as to whether input is intended to come from the Allowable Input pane or the text entry field. Therefore the parse can only be advanced when one of these two is active to indicate that it is the source of input. Specialized prebuilt Grammar Traces such as the �Conflict Trace� and the �Auxiliary Trace� can be selected from �Auxiliary Windows� popup menus where appropriate. All Grammar Trace activity updates the �trace coverage� counts. ## Text Entry It is sometimes more convenient to enter text in the text entry box on the �Grammar Trace� toolbar than to select individual tokens from the �Allowable Input pane�. By entering text you can proceed quickly to a troublesome state without having to choose each individual token en route. After entering text, press Enter or click on the Proceed button to parse the text. Click on the single step button to work slowly through the text step by step. ## header file name The "header file name" parameter names the �parser header� file that AnaGram will generate when it builds your parser. This header file can be used with your parser or with other modules in your program. The header file contains a number of typedef statements and an number of macro definitions which are needed in your parser and may be useful in other modules. If the value of this parameter contains a '#' character, AnaGram will substitute the name of your syntax file for the '#'. The default value of "header file name" is "#.h". ## Help, Using Help There are 3 main ways to access AnaGram Online Help: Press F1 for context-sensitive help from most windows and menu items. Similarly, use the �Help Cursor� from most windows and menu items. From the Help menu, you can bring up �Help Topics� and choose a topic. You can also get fly-over help for the toolbar buttons on the �Control Panel�. File and Grammar Traces have a Help button. AnaGram's Help windows, unlike most others, remain on-screen until you dismiss them. This means you can refer to several topics at once. They have hypertext links to other Help topics. Also, right-clicking the mouse on a Help window or pressing F1 will pop up an Auxiliary Windows menu of all linked topics in the window. "Using Help" is always available from this popup menu. Note that, for the �Warnings�, �Configuration Parameter�s and �Help Topics� windows, F1 will give you help for the item on the highlighted line, whereas the Help Cursor allows you to select any line by clicking on it. AnaGram also has documentation in HTML format, indexed in the index.html file. This documentation covers Getting Started, examples, and some further topics mainly condensed from the User's Guide. Hard copy documentation is in the AnaGram User's Guide, which has the most detail. ## Hidden In a �configuration section� of your grammar you may use an �attribute statement� to declare one or more tokens to be "hidden". Tokens that are "hidden" do not appear in the �token names� table, and thus do not appear in syntax error diagnoses. When your parser attempts to determine the �error frame� of a �syntax error�, it will disregard the tokens that have been declared hidden. The hidden declaration consists simply of the keyword hidden followed by a list of tokens, separated by commas and enclosed in braces ({ }): [ hidden { widget, wombat, foo, bar } ] You would use the "hidden" attribute primarily for tokens whose name would not mean anything to your users. ## Immediate Action Immediate actions are snippets of C code which are to be executed in the middle of a �grammar rule�. Immediate actions are denoted by a '!' character followed by either a C expression, terminated by a semicolon; or a block of C code enclosed in braces. For example, in a simple desk calculator example one might write the following: transaction -> !printf('#');, expression:x =printf("%d\n",x); Notice that the only apparent difference between an immediate action and a �reduction procedure� is that the immediate action is preceded by '!' instead of '='. Notice that the immediate action must be followed by a comma to separate it from the following �rule element�. Immediate actions may also be used in �definition�s: prompt = !printf('#'); The above example, using this definition would then be: transaction -> prompt, expression:x =printf("%d\n",x); You could accomplish the same result by writing a �null production� and a reduction procedure: prompt -> =printf('#'); This is exactly how AnaGram implements immediate actions. ## Implementation Errors "Implementation errors" are errors your parser detects which are not the immediate result of bad input. When it encounters an implementation error, your parser will call a macro which you can define to deal with the problem in a manner suitable to your needs. If you don't provide these macros, AnaGram will make default definitions. There are two macros corresponding to two implementation errors: �PARSER_STACK_OVERFLOW� �REDUCTION_TOKEN_ERROR� ## Inappropriate Value This �warning� message appears when the value assigned to a �configuration parameter� is not appropriate to that parameter. Check the definition of the parameter, by opening the �Configuration Parameters Window�, selecting the parameter and pressing F1. ## Initializer For every �parser� it generates, AnaGram generates an "initializer" function to call the parser. AnaGram names the initializer by prefixing the �parser name� with "init_". If your parser is �event driven�, you must call the initializer before you call the parser. If your parser is not event driven, AnaGram will normally include a call to the initializer in the parser. If you wish to be able to call your parser more than once without its being re-initialized, you may turn off the �auto init� �configuration switch�. When you do this, you assume responsibility for calling the initializer. If your parser is event driven, you must always call the initializer function. If the �reentrant parser� switch is set, the initializer takes a pointer to the �parser control block� as its sole argument. Otherwise it takes no arguments. The initializer returns no value. All communication is by means of the �parser control block�. ## Input Character The actual unit of �parser input� is usually a single character. Note that you are not limited to eight-bit characters. Your parser will use the input character to index a translation table, �ag_tcv�, to determine the �token number� for that character. The �token number� identifies the actual syntactic token. The character code itself will be the �semantic value� of the token. Note that AnaGram groups together all input characters that are syntactically indistinguishable into a single input token. ## input_code input_code is a field in the �parser control block� which contains the current �input character�, or, if your �GET_INPUT� macro supplies �token number�s directly, the token number. If you write your own �GET_INPUT� macro, you must make sure that you store the input character, or token number, you get into �PCB�.input_code. ## INPUT_CODE(t) If you set both the �pointer input� and the �input values� �configuration parameter�s, you must provide an INPUT_CODE macro for your parser. In this situation, your parser will use the pointer to load the �input_value� field of the �parser control block� and uses the INPUT_CODE macro to extract the appropriate value for the �input_code� field. For example, if the input_value is a structure and the appropriate member field is called "id" you would write: #define INPUT_CODE(t) (t).id ## input_context "input_context" is a field which AnaGram adds to the definition of the �parser control block� structure when you define a �context type� �configuration parameter�. If you choose, you can write your GET_INPUT macro so that it stores the context value in �PCB�.input_context. The default definition for �GET_CONTEXT� will then stack the context value at the appropriate time. You can think of PCB.input_context as a sort of temporary "parking place" for the context value. ## Input Scan Aborted This �warning� message appears if AnaGram is unable to finish scanning your �syntax file� because of previous errors. ## input values "Input values" is a �configuration switch� which defaults to off. If your �parser input� includes explicit �token value�s which are not simply the ascii values of corresponding ascii input characters, you must set the "input values" switch to inform AnaGram. Unless your parser is �event driven� or uses �pointer input�, you must also provide your own �GET_INPUT� macro. If your parser uses pointer input, you must provide an �INPUT_CODE(t)� macro. The semantic value of an input token is to be stored in the �input_value� field of the parser control block. ## input_value input_value is a field in the �parser control block� which is used to store the semantic value of the input token. If you write your own �GET_INPUT� macro, and you have set the �input values� �configuration switch�, you should make sure that you store the value of the �input character� or token into �PCB�.input_value. ## Internal Error "AnaGram internal error: ..." is a �warning� message which appears if one of AnaGram's internal consistency tests fails. This message should never appear if AnaGram is working properly. Usually AnaGram will abort on encountering an internal error, although under a small set of circumstances it may continue. Should this happen, it would be wise to close AnaGram and restart it. If you do get an internal error, please note the complete message identifing the problem and file a bug report, following the directions posted on the AnaGram web page at http://www.parsifalsoft.com. A copy of the relevant syntax file and a summary of the circumstances surrounding the problem would be greatly appreciated. ## Intersection In set theory, the intersection of two sets, A and B, is defined to be the set of all elements of A which are also elements of B. In an AnaGram �syntax file�, the intersection of two �character sets� is represented with the '&' operator. The intersection operator has lower �precedence� than the �complement� operator, but higher precedence than the �union� and �difference� operators. The intersection operator is �left associative�. ## Keyboard Support AnaGram can be controlled entirely from the keyboard. In the Control Panel, you can tab to any button and press Enter to select it. In addition to the conventional Windows keyboard functions, the following keys have been implemented: Escape closes any AnaGram window except the Control Panel. F8 toggles between an active AnaGram window and the Control Panel F10 accesses the Control Panel menu from any AnaGram Window. Shift F10 pops up the Auxiliary Windows menu ## Keyword, Keyword String Keywords are a very important feature of AnaGram. They provide an easy way to pick up special character sequences in your input, thereby eliminating the need for a lot of tedious �production�s. If AnaGram finds, on the right hand side of one of your �grammar� productions, a string enclosed in double quotes, such as "IF", it automatically creates from the string a "keyword" which is incorporated into your parser. You may have any number of keywords. A keyword is treated as a single terminal token. Recognition of keywords is governed by the �case sensitive� switch. Your parser will look for a keyword in its input stream wherever you have defined this particular keyword to be legitimate input. It will do whatever lookahead is necessary in order to pick up the entire keyword. If several keywords match the input, such as IF and IFF, it will select the longest match, IFF in this example. Important points to notice about keywords: 1) Keywords take precedence over ordinary characters in the input stream - thus if the character I and the keyword IF are both legitimate input at some point, IF will be selected, if present, in preference to I. 2) Keywords are not reserved words. Your parser will only look for a keyword when it is in a state where that keyword is legitimate input. 3) Keywords do not participate in character sets and should not appear in definitions of character sets. In particular, they are not considered as belonging to the complement of a character set. Thus a keyword would not be considered legitimate input for the production next char -> ~( '/' + '*' ) 4) Keywords may appear in virtual productions. 5) Keywords may be named by means of a definition. AnaGram will list all the keywords in your grammar in the �Keywords� window. In addition, in numerous windows where the cursor bar selects a state, the �Auxiliary Windows� popup menu will list a Keywords option. This window will provide a list of the keywords acceptable in the selected �parser state�. On occasion, a kind of conflict, called a �keyword anomaly� may occur. If so, such conflicts will be listed in the �Keyword Anomalies� window. The "�sticky�" �attribute statement� is useful in dealing with keyword anomalies. ## Keyword Anomalies Found This �warning� message indicates that AnaGram has found at least one �keyword anomaly� in your �grammar�. Open the �Keyword Anomalies� window to see a list of those that have been found. ## Keyword Anomaly In �syntax directed parsing�, it is assumed that input �token�s can be uniquely identified. In the case of �keyword�s, however, there is the possibility that the individual characters making up the keyword, as well as the keyword taken as a whole, could constitute legitimate input under some circumstances. Thus �keywords�, though a powerful and useful tool, are not completely consistent with the assumptions that underlie �syntax directed parsing�. This can occasionally give rise to a type of conflict, diagnosed by AnaGram, called a "keyword anomaly". AnaGram is quite conservative in its diagnoses, so that many keyword anomalies it reports are actually innocuous and can be safely ignored. Basically, a keyword anomaly is a situation where a keyword is recognized, causes a reduction, and the parser arrives in a state where the keyword is not legal input. If the keyword, seen simply as a sequence of characters, might have been legal input in the original state, AnaGram notes the existence of a keyword anomaly. If you have a keyword that causes a keyword anomaly and it is actually a reserved word in your grammar, the anomaly is by definition innocuous. You should use the �reserve keywords� statement to inform AnaGram that the keyword is reserved and the anomaly need not be diagnosed. To help identify and correct any problems associated with keyword anomalies, AnaGram provides the �Keyword Anomalies� window to identify the anomalies, and the �Keyword Anomaly Trace� to help you understand a particular anomaly. ## Keyword Anomaly Trace A Keyword Anomaly Trace is a ready made �grammar trace� window which you may select from the �Auxiliary Windows� menu of the �Keyword Anomalies� window. The anomaly trace provides a path to a state which illustrates the �keyword anomaly�. In this state, the keyword is a reducing token, but after the reduction, it is not allowable input. ## Keyword Anomalies The Keyword Anomalies window is available only if your grammar has �keyword� anomalies. Each entry in the Keyword Anomalies window consists of two lines. The first line identifies the �parser state� at which the �keyword anomaly� occurs and the offending keyword. The second line identifies the �grammar rule� which the keyword may erroneously reduce. The �Auxiliary Windows� menu provides three auxiliary windows keyed directly to the anomaly to help you determine the nature of the problem: The �Keyword Anomaly Trace� window, the �Reduction Trace� window, and the �Rule Derivation� window. Three other windows provide supporting information: the �Reduction States� window, the �Rule Context� window and the �State Definition� window. ## Keywords The Keywords entry in the �Browse Menu� pops up a window which lists all of the keywords defined in your �grammar�. The �token number� is also specified. A Keywords window is also an option in the �Auxiliary Windows� popup menu for any window which distinguishes various states of your parser. The Keywords window will show all of the �keyword�s which will be recognized in the state selected by the cursor bar in the parent window. The �Auxiliary Windows� menu for a Keywords window provides a �Token Usage� option which will allow you to all the uses of a particular keyword in your grammar. ## left "left" controls a �precedence declaration�, indicating that all of the listed �rule elements� are to be considered �left associative�. ## Left Associative A binary operator is said to be left associative if an expression with repeated instances of the operator is to be evaluated from the left. Thus, for example, x = a/b/c is normally taken to mean x = (a/b)/c The division operator is said to be left associative. In �grammar�s with �conflict�s, you may use �precedence declaration�s to specify that an operator should be left associative. ## Lexeme The "lexeme" �attribute statement� is used to fine-tune the "�disregard�" statement. The lexeme statement takes the form: lexeme { T1, T2,....Tn } where T1,...Tn is a list of �nonterminal� tokens separated by commas. Lexeme statements may be placed in any �configuration section�, and there may be any number of them. When you specify that a �token� is to be disregarded, AnaGram rewrites your �grammar� so that the token will be passed over whenever it occurs at the beginning of a file or following a lexical unit, or "lexeme". If you have no lexeme statement, then the lexemes in your grammar are just the terminal tokens. The lexeme statement allows you to specify that certain nonterminal tokens are also to be treated as lexemes. This means that the disregard token will be skipped following the lexeme, but not between the characters that constitute the lexeme. Lexemes correspond to the tokens that a lexical scanner, if you were using one, would commonly identify and pass to a parser as single tokens. You don't usually wish to disregard �white space� within these tokens. For example, in a grammar for a conventional programming language where blank characters are to be disregarded, you might include: [ lexeme {string, character constant, name, number} ] since blank characters must not be overlooked within strings and constants, and should not be permitted within names or numbers. If your grammar allows for situations where successive lexemes could run together if they were not separated by space, a name followed by a number, for example, you may use the "�distinguish lexemes�" �configuration switch� to force a separation between the tokens. White space may be used explicitly within definitions of lexeme tokens in your grammar if desired, without causing conflicts. Thus, if you wish to allow embedded space in variable names, you might write: [ disregard space lexeme {variable name} ] space = ' ' + '\t' letter = 'a-z' + 'A-Z' digit = '0-9' variable name -> letter -> variable name, letter + digit -> variable name, space..., letter + digit ## line line is a field in your �parser control block� used for keeping track of the line number of the current character in your input. Line and column numbers are tracked only if the �lines and columns� �configuration switch� has been set. ## line length Line length is an �obsolete configuration parameter�. ## Line Numbers "Line numbers" is a �configuration switch� which defaults to off. If it is on, the �Build Parser� command will put "#line" directives into the generated C code file so that your compiler diagnostics will refer to lines in the �syntax file� rather than in the generated C code file. For more information on the "#line" directive, see Kernighan and Ritchie, second edition, section A12.6. If the "line numbers" switch is off, AnaGram will put comments into your parser file to help you find reduction procedures and embedded C in your syntax file. Prior to AnaGram 2.01, if your C or C++ compiler required that the backslashes in the pathname in the #line directive be doubled, you would have used AnaGram's �escape backslashes� switch to make this happen. Although you may still use �escape backslashes�, it should no longer be necessary because AnaGram now puts forward slashes into #line pathnames instead of backslashes. If you wish, you may specify the pathname in the #line directives explicitly by using the �Line Numbers Path� configuration parameter. You may also wish to change the "�parser file name�" parameter to provide a full path name for your parser file. ## Line Numbers Path "Line Numbers Path" is a �configuration parameter� which takes a string value. It defaults to NULL. When you have set the �Line Numbers� �configuration switch� and Line Numbers Path is not NULL, AnaGram uses it in the #line directive in place of the full path name of your �syntax file�. Note that Line Numbers Path should be the complete pathname for your syntax file. Line Numbers Path is useful when using AnaGram in cross platform development. When parsers are to be compiled and tested on a platform different from that used to run AnaGram, you may use Line Numbers Path to provide a pathname on the platform used for compiling and testing. ## Lines and Columns "Lines and columns" is a �configuration switch� which defaults to on. When set, i.e., on, it causes the �Build Parser� command to incorporate code into your parser which will automatically track the line number and column number of the input token. You would normally set the "lines and columns" switch when you are planning to build a parser which will read an input file and which will need to diagnose �syntax errors� with some precision. Your parser will store the line and column numbers in the �line� and �column� fields respectively in the �parser control block�. If the input to your parser includes tab characters, you should either set the �tab spacing� �configuration parameter� appropriately or provide a �TAB_SPACING� macro for your parser. Your parser will count line and column numbers beginning with one. ## Main Program The "main program" �configuration switch� determines what AnaGram does if you invoke the �Build Parser� command, but have no �embedded C� in your �syntax file�. If the switch is on and you have not specified �pointer input� or an �event driven� parser, AnaGram creates a main program which does nothing but call your �parser�. The "main program" switch defaults to on. This feature, along with the default definitions for �GET_INPUT� and �error handling�, makes it possible to write a grammar with no �embedded C� or �reduction procedure�s whatsoever and still get an executable program which will read input from stdin and parse it according to your grammar. ## Marked Rule A "marked rule" is a �grammar rule� together with a marked token that indicates how much of the rule has already been matched. The �marked token� and any tokens following it indicate the input that should be expected if the remainder of the rule is to be matched. When marked rules are displayed in AnaGram windows, the marked token is represented by a difference in the font. The token may be in bold face, underlined, italicized, shown with a different point size, or in a different font altogether. Since AnaGram allows you to change fonts to suit your own preferences, you should be careful that the font you choose for the marked tokens allows them to be readily distinguished from the other tokens in your grammar rules. An underlined font is often suitable. ## Max conflicts The "max conflicts" �configuration parameter� limits the number of �conflict�s AnaGram will record. Sometimes, a simple error editing your syntax file can cause hundreds of conflicts, which you don't need to see in gory detail. The default value of max conflicts is 50. If you have a grammar that is in serious trouble and you want to see more conflicts, you may change max conflicts to suit your needs. ## Missing The �warning� message Missing <element 1> in <element 2> indicates that AnaGram expects to see an instance of syntactic element 1 at the specified location, internal to an instance of syntactic element 2. AnaGram cannot reliably continue parsing its input after an error of this type. Therefore, it limits further analysis of your grammar to scanning for syntax errors. ## Missing Production "Missing production, TXXX: <token name>" is a �warning� message which indicates that the specified �token� appears to be defined recursively, but there is no initial �production� to get the recursion started. If you get this warning, check your �grammar� closely. ## Missing Reduction Procedure "Missing reduction procedure, RXXX" is a �warning� message which appears either when the �grammar rule� indicated specifies a �parameter assignment� but does not have a �reduction procedure� to use it, or when the rule has no reduction procedure but the value of the token on the left hand side is used in as an argument for some other reduction procedure and the �default reduction value� does not have the same type as the token on the left hand side. In this latter case, a reduction procedure may be needed to effect correct type conversion. This warning is provided in case the lack of a reduction procedure is an oversight. ## Multiple Definitions "Multiple definitions for TXXX: <token name>" is a �warning� message which indicates that the specified �token� has been defined both as a �character set� and as a �nonterminal token�. It cannot be both. ## Near Functions "Near Functions" is a �configuration switch� that defaults to off. It controls the use of the "near" keyword for static functions in your parser. If your parser is to run on an 80x86 processor you might wish to turn it on. Your parser will then be a slight bit smaller and will run a little bit faster. If you are going to run your parser on some other processor or use a C or C++ compiler that does not support the "near" keyword you should make sure "near functions" is set to off. ## Negative Character Code in Pointer Mode This �warning� message appears if your �grammar� defines negative character codes and uses �pointer input�. If your grammar uses the default definition for �pointer type� it will be reading unsigned characters so that the parser will never see the negative codes that have been defined. You may correct the problem by providing your own definition of pointer type. ## Nest Comments "Nest comments" is a �configuration switch� which defaults to off. It controls the treatment of �comments� while scanning your �syntax file�. It defaults to off, in accordance with the ANSI standard for C which disallows �nested comments�. Note that AnaGram scans comments in any �embedded C� code as well as in the grammar specification. You may turn this switch on and off as many times as necessary in a single file. ## Nested Comment As delivered, AnaGram treats C style �comments� according to the ANSI standard: They do not nest. For those who prefer nested comments, however, the �nest comments� �configuration switch� allows them to nest. ## Nesting too deep This �warning� message indicates that �set expression�s or �virtual productions� are nested so deeply they have exhausted the available stack space and AnaGram cannot continue its analysis. Use a �definition� statement to name an intermediate level. ## no cr "no cr" is a �configuration switch� which defaults to off. When this switch is set, it will cause the �parser file� and �header file� to be written without carriage returns. This is convenient if you wish to use the generated parser files in a Unix environment. ## No Grammar Token Specified This �warning� message appears if your �grammar� does not specify a �grammar token�. Edit your �syntax file� to specify one. ## No Productions in Syntax File This �warning� message appears if AnaGram did not find any �productions� at all in your �syntax file�. Check to see you have the right file. ## No Such Parameter This �warning� message appears when AnaGram does not recognize the name of a �configuration parameter� you have tried to set in your �syntax file�. Check the spelling of the parameter you wish to set in the �Configuration Parameters Window�. ## No Terminal Tokens in Expansion No terminal tokens in expansion of TXXX is a �warning� message indicating that there are no terminal tokens to be found in an expansion of the specified token. Although there are a few circumstances where this could be legitimate, it is more likely that there is a missing rule in the grammar. ## Not a Character Set "Not a character set, TXXX: <token name>" is a �warning� message which indicates that the specified �token� has been used both on the left side of a �production� and in a �character set� expression defining some other token. AnaGram will use an empty set in place of the specified token in evaluating the �character set�. You will get another warning, �Error defining� token, when AnaGram finishes its evaluation of the character set. ## Nothing Reduces "Nothing reduces TXXX -> RYYY" is a �warning� message which indicates that the �grammar� does not specify any input to follow an instance of the indicated �grammar rule�. In all probability, the grammar does not have any explicit end of file, or �eof token�. If the grammar does not have any conflicts with �token� T000, then an explicit end of file indicator is not necessary. Otherwise you should modify your grammar to require an explicit end of file. ## Null Character in String This �warning� message appears when AnaGram finds an explicit null character in a quoted string. If you must allow for a null in a �keyword string� you will have to rewrite your �grammar rule�. For instance, instead of widget -> "abc\0def" write widget -> "abc", 0, "def" ## nonassoc "nonassoc" controls a �precedence declaration�, indicating that all of the listed �rule elements� are to be considered non-associative. ## Nonterminal Token, Nonterminal A nonterminal token is one which is constructed from a series of other tokens as specified by one or more �production�s. Nonterminal tokens are to be distinguished from �terminal token�s, which are the basic input units appearing in your input stream. Terminal tokens most often represent single characters or a character belonging to a �character set� such as 'a-z'. ## Null Production A "null production" is one that has no tokens on the right hand side whatsoever. Null �production�s essentially are identified by the first following input token. Null productions are extremely convenient syntactic elements when you wish to make some input optional. For example, suppose that you wish to allow an optional semicolon at some point in your grammar. You could write the following pair of productions: optional semicolon -> | ';' Note that a null production can never follow a '|'. This could also be written on multiple lines thus: optional semicolon -> -> ';' You can always rewrite your grammar to eliminate null productions if you wish, but you usually pay a price in conciseness and clarity. Sometimes, however, it is necessary to do such a rewrite in order to avoid �conflict�s, to which null productions are especially prone. For example suppose you have the following production: foo -> wombat, optional semicolon, widget You can rewrite this as two productions: foo -> wombat, widget -> wombat, ';', widget This rewrite specifies exactly the same input language, but is less prone to conflicts. On the other hand, it may require significantly more table space in your parser. If you have a null production with no �reduction procedure� specified, your parser will automatically assign the value zero to �reduction token�. Null productions can also be generated by �virtual productions�. A token that has a null production is a "�zero length�" token. ## Old Style "Old Style" is a �configuration switch� which defaults to off. It controls the function definitions in the code AnaGram generates. When "old style" is off, it generates ANSI style calling sequences with prototypes as necessary. When "old style" is on, it generates old style function definitions. ## Output Files When you use the �Build Parser� command, to request output from AnaGram, it creates two files: a �parser file� and a �parser header� file. ## Page Length "Page length" is an �obsolete configuration parameter�. ## Obsolete Configuration Parameter, Obsolete Configuration Switch A number of �configuration parameter�s and �configuration switch�es which were used in the DOS version of AnaGram are no longer used, but are still recognized for the sake of upward compatibility. These parameters include: �bottom margin� �line length� �page length� �top margin� �quick reference� �video mode� ## Parameter "Parameter <name> has type void" is a �warning� message which appears when a �parameter assignment� is attached to a �token� that has been defined to have the void �data type�. ## Parameter Assignment In any �grammar rule�, the �semantic value� of any �rule element� may be passed to a �reduction procedure� by means of a parameter assignment. Simply follow the rule element with a colon and a C variable name. The C variable name can then be used in the reduction procedure to reference the semantic value of the token it is attached to. AnaGram will automatically provide necessary declarations. Here are some examples of rule elements with parameter assignments: '0-9':d integer:n expression:x declaration : declaration_descriptor ## Parameter Not Defined AnaGram does not have a �configuration parameter� with the specified name. Please check the spelling. ## Parameter Takes Integer Value The specified �configuration parameter� takes an integer value only. ## Parameter Takes String Value The specified �configuration parameter� takes a string value only. ## Parse Function To run your parser, you call the parse function. The name of the parse function is given by the �parser name� �configuration parameter� and defaults to the name of your parser file. If your parser uses �pointer input�, you should set the �pointer� field of the �parser control block� before calling the parser function. If your parser is �event driven�, you should first call the �initializer�, and then you should call the parser function for each input token you If the �reentrant parser� switch is set, the parse function takes a pointer to the �parser control block� as its sole argument. Otherwise it takes no arguments. The parse function returns no value. All communication is by means of the �parser control block�. To retrieve the value of the �grammar token�, once the parse is complete, use the �parser value function�. ## Parser A parser is a program or, more commonly, a procedure within a program, which scans a sequence of �input characters� or input tokens and accumulates them in an input buffer or stack as determined by a set of �production�s which constitute a �grammar�. When the parser discovers a sequence of tokens as defined by a �grammar rule�, or right hand side of a production, it "reduces" the sequence to a single �reduction token� as defined by the left hand side of the grammar rule. This �nonterminal token� now replaces the tokens which matched the grammar rule and the search for matches continues. If an input token is encountered which will not yield a match for any rule, it is considered a �syntax error� and some kind of �error recovery� may be required to continue. If a match, or �reduce action�, yields the �grammar token�, sometimes called the �goal token� or �start token�, the parser deems its work complete and returns to whatever procedure may have called it. The �Grammar Trace� and �File Trace� functions in AnaGram provide a convenient means for understanding the detailed operation of a syntax directed parser. �Tokens� may have �semantic values�. If the �input values� �configuration switch� is on, your parser will expect semantic values to be provided by the input process along with the token identification code. If the input values switch is off, your parser will take the ascii value of the input character, that is, the actual input code, as the value of the character. When the parser reduces a production, it can call a �reduction procedure� or �semantic action� to analyze the values of the constituent tokens. This reduction procedure can then return a value which characterizes the reduced token. ## Parser Control Block A "Parser Control Block" is a structure which contains all of the data necessary to describe the instantaneous state of a parser. The typedef statement which defines the structure is included in the �parser header� file for your parser. AnaGram creates the name of the data type for the structure by appending "_pcb_type" to the �parser name�. You may add your own declarations to the parser control block by using the �extend pcb� statement. If the �declare pcb� �configuration switch� is on, its normal state, AnaGram will declare a parser control block for you at the beginning of your parser file. AnaGram will determine the name of the parser control block by appending "_pcb" to the �parser name�. AnaGram will also define the macro PCB as a short hand notation for use within the parser. All references to the parser control block within the code that AnaGram generates are made using the PCB macro. If you wish to declare your own parser control block, you must include the �parser header� file for your parser before your declaration. Then you declare a control block and define PCB to refer to the control block you have declared. Suppose your grammar is called widget. You would then write the following statements in your �embedded C�: #include "widget.h" widget_pcb_type widget_control_pcb; #define PCB widget_control_pcb Alternatively, you could write the following: #include "widget.h" widget_pcb_type *widget_control_pcb_pointer; #define PCB (*widget_control_pcb) and then allocate storage for the structure when necessary. Some fields of interest in the parser control block are as follows: �input_code� �input_value� �input_context� �pointer� �token_number� �reduction_token� �ssx� �sn� �ss�[�parser stack size�] �vs�[parser stack size]; �cs�[parser stack size]; �line� �column� *�error_message� �error_frame_ssx� �error_frame_token� ## PCB "PCB" is a macro AnaGram defines for use in the code it generates to refer to the �parser control block� for your �parser�. Normally, AnaGram automatically declares storage for a parser control block and defines PCB for you. If you turn off the �declare PCB� switch, you may define PCB yourself. ## PCB_TYPE If you are writing your parser in C++, you may prefer to derive a class from the �parser control block� rather than use the �extend pcb� statement. In this case you may define the PCB_TYPE macro in your syntax file to specify your derived class. For instance, you have defined class MyPcb : public parser_pcb_type {...}; You would then add the following line: #define PCB_TYPE MyPcb If you do not define PCB_TYPE, AnaGram will define it as the type of your parser control block. ## Parser File The "parser file" is the C (or C++) file output by AnaGram when you execute the �Build Parser� command. It contains all of the �embedded C� from your �syntax file�, all of the �reduction procedure�s defined in your �grammar�, syntax tables which represent, in a condensed form, all of the intricacies of your grammar, and a customized �parsing engine�. The name of the parser file is given by the �parser file name� �configuration parameter�. The name of the �parser� itself is given by the �parser name� configuration parameter. If you wish the parser file to be written without carriage returns, suitable for a Unix environment, set the �no cr� configuration switch. ## Parser File Name "Parser file name" is a �configuration parameter� which takes a string value. The default value is "#.c". AnaGram uses this parameter to generate the name of the output C file, or �parser file�, created by the �Build Parser� command. The '#' character is used in this string as a wild card to indicate the name of the current �syntax file�. If the first character of the parser file name string is a '.' character, AnaGram will substitute the name of the current working directory for the dot. Thus ".\\#.c" will create the file name as a complete path. This can sometimes be important when using the �line numbers� switch to enable a debugger to find code in your parser file. Note that the parser file name is not the same as the �parser name�. ## Parser Generator A parser generator, such as AnaGram, is a program that converts a �grammar�, a rule-based description of the input to a program, into a conventional, procedural module called a �parser�. The parsers AnaGram generates are simple C modules which can be compiled on almost any platform. AnaGram parsers are also compatible with C++. ## Header File, Parser Header When you use the command �Build Parser� to generate source code for a parser, AnaGram creates two files, a header file and a C source file. Unless different paths are specified in the �parser file name� and �header file name� parameters, both files will be written to the directory that contains the �syntax file�. The header file contains a number of typedef statements, including the definition of the �parser control block�, and a number of macro definitions which may be useful in your parser or in other modules of your program. If you do not alter the �header file name� parameter, the name of the header file will be the same as the name of your �syntax file� and it will have the extension ".h". If you wish the header file to be written without carriage returns, suitable for a Unix environment, set the �no cr� configuration switch. ## Parser Input AnaGram �parser�s may be configured to accept input in any of three different ways: By default, a �parse function� gets its input by invoking the �GET_INPUT� macro each time it is ready for another input token. The default implementation of GET_INPUT reads �input character�s from stdin. For most practical problems, you will want to override this definition of GET_INPUT, storing the current input character in PCB.input_code. Alternatively, you may configure a parser to read input from an array in memory. Set the �pointer input� switch and load the �pointer� field of the parser control block before calling the parse function. The parser will then run, incrementing the pointer, until it finishes or encounters an error. The third alternative is to set the �event driven� switch. The parser will be configured as a callback routine. Begin by calling the �initializer�. Then, for each input character, store the character in the �input_code� field of the parser control block and call the parse function. Each time you call the parse function it will continue until it needs more input. You can check its status by inspecting the �exit_flag� in the parser control block. The input to your parser may be either text characters or �tokens� accumulated by a pre-processor, or �lexical scanner�. The latter case is referred to as �token input�. If you use a lexical scanner, you may find it convenient to configure your parser as event driven. Altlhough lexical scanners are often not necessary when you use AnaGram, if you do need one you can write it in AnaGram. ## Parser Name "Parser Name" is a �configuration parameter� which defaults to "#", where "#" represents the name of your �syntax file�. AnaGram uses this parameter to name your �parse function�. The �initializer� for your parser will have the same name preceded by "init_". Note that "�parser file name�" is not the same configuration parameter as "parser name". ## Parser Stack Your �parser� uses a "parser stack" to keep track of the �grammar rules� it is trying to match and its progress in matching them. Normally, there are two separate stacks defined by AnaGram: �PCB�.�ss�, the �parser state stack� which maintains �parser state� numbers, and PCB.�vs�, the �parser value stack� which maintains the �semantic value�s of tokens that have been identified so far. If you wish to maintain a stack tracking other variables you may set the �context type� �configuration parameter�, and AnaGram will define a third stack, PCB.�cs�. All are indexed by the same stack index, PCB.�ssx�. To see how tokens accumulate on the parser stack, run the �Grammar Trace� or the �File Trace�. Normally, when the return value of a �reduction procedure� is stored on the parser value stack, it is stored by simply coercing the stack pointer to the correct type. If the return value is a C++ object, this can cause serious problems. These problems can be avoided by using the �wrapper� statement. ## Parser stack alignment Parser stack alignment is a �configuration parameter� whose value is a C or C++ data type. It defaults to "long". If any tokens have type "double", it will be automatically set to double. Thus, you will normally not need to change this parameter if your parser is to run on a PC or compatible processor. It provides alignment control for processors which restrict address for multibyte data access. The default setting provides for correct operation on 64 bit processors. To control byte alignment of the parser stack, �PCB�.�vs�, AnaGram normally adds a field of the specified data type to the "union" statement which defines the data type for the �parser stack�. This parameter can be used to deal with byte alignment problems when a �parser� is to be run on a processor with byte alignment restrictions. For instance, if your �grammar� has �token�s of type "long double" and your processor requires long double variables to be properly aligned, you can include the following statement in a �configuration section� in your grammar or in your �configuration file�: parser stack alignment = long double If the data type specified is "void", no alignment declaration will be made. ## Parser Stack Index, Stack Index The parser stack index, �PCB�.�ssx�, tracks the depth of the �parser state stack�, the �parser value stack�, and the �context stack� if you defined one. The parser stack index is incremented by �shift actions� and reduced by �reduce actions�. ## Parser Stack Overflow Your �parser� uses a �parser stack� to keep track of the �grammar rules� it is trying to match and its progress in matching them. If your grammar has any �recursive rule�s that are not strictly left recursive, then no matter how big you make the parser stack, it will be possible to create a syntactically correct input which will cause the stack to overflow. As a practical matter, however, it is usually possible to set the �parser stack size� to a value large enough so that an overflow is a freak occurrence. Nevertheless, it is necessary to check for overflow, and in the case overflow should occur, your parser has to do something. What it does is invoke the �PARSER_STACK_OVERFLOW� macro. If you don't define it, AnaGram will define it for you, although not necessarily to your taste. ## Recursive rule, Recursion A �grammar rule� is said to be "recursive" if the �token� on the left side of the rule also appears on the right side of the rule, or in an �expansion rule� of any token on the right side of the rule. If the token on the left side is the first token on the right side, the rule is said to be "left recursive". If it is the last token on the right side, the rule is said to be "right recursive". Otherwise, the rule is "center recursive". For example: statement list -> statement -> statement list, statement // left recursive fraction part -> digit -> fraction part, digit // right recursive expression -> factor -> expression, '+' + '-', factor factor -> primary -> factor, '*' + '/', primary primary -> number -> name -> '(', expression, ')' // center recursive Note that if all the tokens in the rule other then the recursive token itself are �zero length� tokens, it is possible for the rule to be matched arbitrarily many times without any input whatsoever. In other words, such a rule creates an infinite loop in the parser. AnaGram can detect this condition and issues an �empty recursion� diagnostic if it occurs. ## PARSER_STACK_OVERFLOW PARSER_STACK_OVERFLOW is a user definable macro. If you do not define it, AnaGram will define it so that it will print a message on stderr and abort the �parser� in case of a �parser stack overflow�. ## Parser Stack Size "Parser stack size" is a �configuration parameter� with a default value of 128. It is used to define the sizes of your �parser stacks� in your �parser control block�. When analyzing your grammar, AnaGram will determine the minimum amount of stack space required for the deepest left �recursion�. To this depth it will add one half the value of the parser stack size parameter. It will then set the actual stack size to the larger of this value and the parser stack size parameter. ## Parser State, State Number The essential part of your �parser� is a group of tables which describe in detail what to do for each "state" of the parser. The states of a parser are determined by sets of "�characteristic rules�". The �State Definition Table� shows the characteristic rules for each state of your parser. AnaGram numbers the states of a parser as it identifies them, beginning with zero. In all windows, state numbers are displayed as three digit numbers prefixed with the letter 'S'. ## Parser State Stack, State Stack The parser state stack is a stack maintained by your �parser� and which is an integral part of the parsing process. At any point in the parse of your input stream, the parser state stack provides a summary of what has been found so far. The parser state stack is stored in �PCB�.�ss� and is indexed by PCB.�ssx�, the �parser stack index�. ## Parser Value Stack, Value Stack In parallel with the �parser state stack�, your parser maintains a "value stack", �PCB�.�vs�, each entry of which corresponds to the �semantic value� of the token identified at that state. Since the semantic values of different tokens might well have different �data type�s, AnaGram gives you the opportunity, in your �syntax file�, to define the data type for any token. AnaGram then builds a typedef statement creating a data type which is a union of the all the types you have defined. AnaGram creates the name for this �data type� by appending "_vs_type" to the �parser name�. AnaGram uses this data type to define the value stack. ## Parser Action In a traditional LR parser, there are only four actions: the �shift action�, the �reduce action�, the �accept action� and the �error action�. AnaGram, in doing its �grammar analysis�, identifies a number of special cases, and creates a number of extra actions which make for faster processing, but which can be represented as combinations of these primitive actions. When a shift action is performed, the current state number is pushed onto the �parser state stack� and the new state number is determined by the current state number and the current input token. Different tokens cause different new states. When a reduce action is performed, the length of the rule being reduced is subtracted from the �parser stack index� and the new state number is read from the top of the parser state stack. The �reduction token� for the rule being reduced is then used as an input token. ## Parsing Engine A parser consists of three basic components: A set of syntax tables, a set of �reduction procedure�s and a parsing engine. The parsing engine is the body of code that interprets the parsing table, invokes input functions, and calls the reduction procedures. The �Build Parser� command configures a parsing engine according to the implicit requirements of the syntax specification and according to the explicit values of the �configuration parameter�s. The parsing engine itself is a simple automaton, characterized by a set of states and a set of inputs. The inputs are the tokens of your grammar. Each state is represented by a list of tokens which are admissible in that state and for each token a �parser action� to perform and a parameter which further defines the action. Each state in the grammar, with the exception of state zero, has a �characteristic token� which must have been recognized in order to jump to that state. Therefore, the �parser state stack�, which is essentially a list of state numbers, can also be thought of as a list of token numbers. This is the list of tokens that have been seen so far in the parse of your input stream. ## Partition If you use �character sets� in your grammar, AnaGram will compute a "partition" of the �character universe�. This partition is a collection of non-overlapping character sets such that every one of the sets you have defined can be written as a �union� of partition sets. Each partition set is assigned a unique �token�. If one of your character sets requires more than one partition set to represent it, AnaGram will create appropriate �production�s and add them to your grammar so your parser can make the necessary distinctions. To see how AnaGram has partitioned the character universe, you may inspect the �Partition Sets� window found in the �Browse Menu�. ## Partition Set Number Each �partition� set is identified by a unique reference number called the partition set number. Partition set numbers are displayed in the form Pxxx. Partition sets are numbered starting with zero, so the first set is P000. To see the elements of a given partition set, call up the �Partition Sets� window from the �Browse Menu�, then, after selecting a partition set, call up the �Set Elements� window from the �Auxiliary Windows� popup menu. ## Partition Sets The Partition Sets option in the �Browse Menu� pops up a window which shows the complete �partition� of the �character universe� for your parser. The Partition Sets option in the �Auxiliary Windows� popup menu for the �Character Sets� window lets you see the partition sets which cover the specified character set. Each entry in a Partition Sets window identifies a token number and a �partition set number�. The �Auxiliary Windows� menu provides a �Set Elements� entry which enables you to see precisely which characters belong to the partition set. It also has a Token Usage entry to show you what rules the set is used in. ## PCONTEXT PCONTEXT is an alternate form of the �CONTEXT� macro which takes an explicit argument to specify the �parser control block�. PCONTEXT is defined in the �parser header� file. ## PERROR_CONTEXT PERROR_CONTEXT is an alternative form of the �ERROR_CONTEXT� macro. It differs only in that it takes an argument so you can specify the appropriate �parser control block� explicitly. PERROR_CONTEXT is defined in the �parser header� file. ## pointer "pointer" is a field which will be included in the �parser control block� for your parser if you have set the �pointer input� �configuration switch�. Your main program should set PCB.pointer before it calls your parser. Thereafter, your parser will increment it appropriately. When you are executing a �reduction procedure� or a �SYNTAX_ERROR� macro PCB.pointer will always point to the next input character to be read. ## Pointer input "Pointer input" is a �configuration switch� which you may set to control �parser input�. It defaults to off. When you set pointer input, you tell AnaGram that the input to your parser is in memory and can be scanned simply by incrementing a pointer. Before calling your parser you should make sure that �PCB�.�pointer� is properly initialized to point to the first character or token in your input. Use the �configuration parameter� "�pointer type�" to specify the type of the pointer. The default value of "pointer type" is "unsigned char *" There is no particular reason why pointer type should be limited to variants on char. It could define a pointer to int or a structure just as well. If you use pointer input with structures or C++ classes, you should set the �input values� switch and define an �INPUT_CODE�(t) macro. If you are using a 16 bit compiler and your input array is so large that you need "huge" pointers, make sure that "pointer type" is properly defined. ## Pointer Type "Pointer Type is a �configuration parameter� which defaults to "unsigned char *". When you have specified �pointer input�, AnaGram uses the value of pointer type to declare a pointer field in your �parser control block�. ## Precedence, Operator Precedence In expressions of the form a+b*c, the convention is to perform the multiplication before the addition. Multiplication is said to take precedence over addition. In general the rank order in which operations are to be performed if there are no parentheses forcing an order of computation is called the precedence of the operators. If you have an ambiguous �grammar�, that is, a grammar with a number of �conflict�s, you may use �precedence declaration�s to resolve the conflicts and to set operator precedence. ## Precedence Declaration Precedence declarations are �attribute statements� which may be used to resolve �conflict�s in your grammar by assigning precedence and associativity to operators. Precedence declarations must be made inside �configuration sections�. Each declaration consists of the keyword �left�, �right�, or �nonassoc� followed by a list of �rule elements�. The rule elements in the list must be separated by commas and the entire list must be enclosed in braces ({ }). Each of the rule elements is assigned the same precedence level, which is higher than that assigned in all previous precedence declarations and lower than that in all subsequent declarations. The rule elements are defined to be left, right, or nonassociative, depending on whether the keyword was "left", "right", or "nonassoc". All conflicts which are resolved by precedence declarations are listed in the �Resolved Conflicts� window. ## Precedence Rules AnaGram can resolve certain types of �conflict�s in your grammar by applying precedence rules. There are three classes of rules available: explicit �precedence declarations�, the "�sticky�" statement, and the implicit rule associated with the use of a "�disregard�" token outside a �lexeme�. Whenever AnaGram uses a precedence rule of any kind to resolve a conflict, it produces a �warning� message and lists the conflict in the �Resolved Conflicts� window. ## Previous States The Previous States window can be accessed via the �Auxiliary Windows� popup menu from any window that identifies �parser state�s. It shows the �characteristic rule�s for all of the states which jump to the presently selected state. ## Print File Name "Print file name" is a configuration parameter which is not used in the Windows version of AnaGram. It is retained only for compatibility with pre-existing �configuration file�s. ## Problem States The Problem States window is essentially a trimmed version of the �Reduction States� window. It is available in the �Auxiliary Windows� popup menu for the �Conflicts� and �Resolved Conflicts� windows. The Problem States window has the same format as the Reduction States window, and differs only in that it shows only those reduction states for which the �conflict token� is acceptable input. ## Production Productions are the mechanism you use to describe how complex input structures are built up out of simpler ones. Each production has a left hand side and a right hand side. The right hand side, or �grammar rule�, is a sequence of �rule elements�, which may represent either �terminal tokens� or �nonterminal tokens�. The left hand side is a list of �reduction tokens�. In most cases there would be only a single reduction token. Productions with more than one �token� on the left hand side are called �semantically determined productions�. The "->" symbol is used to separate the left hand side from the right hand side. If you have several productions with the same left hand side, you can avoid rewriting the left hand side either by using '|' or by using another "->". A �null production�, or empty right hand side, cannot follow a '|'. Productions may be written thus: name -> letter -> name, digit This could also be written name -> letter | name, digit In order to accommodate semantic analysis of the data, you may attach to any grammar rule a �reduction procedure� which will be executed when the rule is identified. Each token may have a �semantic value�. By using �parameter assignment�s, you may provide the reduction procedure with access to the semantic values of tokens that comprise the grammar rule. When it finishes, the reduction procedure may return a value which will be saved on the �parser value stack� as the semantic value of the �reduction token�. ## Productions The �Production�s window is available via the �Auxiliary Windows� popup menu in any window which identifies tokens. If the token identified by the highlighted line is �nonterminal�, the Productions window will show the rules produced by that �token�. ## PRULE_CONTEXT PRULE_CONTEXT is an alternative form of the �RULE_CONTEXT� macro. It differs only in that it takes an argument so you can specify the appropriate �parser control block� explicitly. PRULE_CONTEXT is defined in the �parser header� file. ## Quick Reference "Quick reference" is an �obsolete configuration switch�. ## Range Bounds Out of Order This is a �warning� message that appears when you have a �character range� of the form 'z-a'. AnaGram interprets this range as being equal to 'a-z', but provides a warning in case the unusual order was the result of a clerical error. ## Recursive Definition of Char Set This �warning� appears when AnaGram discovers a recursively defined �character set�. Character sets cannot be defined recursively. ## Redefinition "Redefinition of <name>" is a �warning� message which appears when AnaGram discovers a redefinition of a �symbol�. The new �definition� is ignored. ## Redefinition of Grammar Token This �warning� appears when AnaGram encounters a new definition of the �grammar token�. AnaGram discards the old definition. The last definition in the syntax file wins. If you get this warning, check your �syntax file� to make sure you have the grammar token you want. ## Redefinition of token "Redefinition of token, TXXX: <name>" is a �warning� message which occurs when AnaGram encounters a �definition� statement and the specified �grammar token� has already been seen on the left side of a �production�. AnaGram will ignore the definition statement. ## Reduce Action, Reduction The reduce action, or reduction, is one of the four actions of a traditional �parsing engine�. The reduce action is performed when the parser has succeeded in matching all the elements of a �grammar rule�, and the next input token is not erroneous. Reducing the grammar rule amounts to subtracting the length of the rule from the �parser stack index�, identifying the �reduction token�, stacking its �semantic value� and then doing a �shift action� with the reduction token as though it had been input directly. ## Reduce-Reduce Conflict A grammar has a "reduce-reduce" �conflict� at some state if a single token turns out to be a �reducing token� for more than one �completed rule�. ## Reducing Token In a �parser state� with more than one �completed rule�, your parser must be able to determine which one was actually found. Therefore, during analysis of your grammar, AnaGram examines each completed rule in order to determine all the states the �parser� will branch to once the rule is reduced. These states are called the "reduction states" for the rule. In any window that displays �marked rule�s, these states may be found in the �Reduction States� window listed in the �Auxiliary Windows� popup menu. The acceptable input tokens for those states are the "reducing tokens" for the completed rules in the state under investigation. If there is a single token which is a reducing token for more than one rule, then the grammar is said to have a �reduce-reduce conflict� at that state. If in a particular state there is both a �shift action� and a �reduce action� for the same token the grammar is said to have a �shift-reduce conflict� in that state. Note that a "reducing token" is not the same as a "�reduction token�". ## Reduction Choices "Reduction choices" is a �configuration switch� which defaults to off. If it is set, AnaGram will include in your �parser file� a function which will identify the acceptable choices for �reduction token� in the current state. This function, of course, is useful only if you are using �semantically determined productions�. The prototype of this function is: int $_reduction_choices(int *); where '$' represents the name of your parser. You must provide an integer array whose length is at least as long as the maximum number of reduction choices you might have. The function will fill the array with the token numbers of those which are acceptable in the current state and will return a count of the number of acceptable choices it found. ## reduction_token "reduction_token" is a field in your �parser control block�. If your grammar uses �semantically determined productions�, your �reduction procedure�s need a mechanism to specify which token the rule is to reduce to. �PCB�.reduction_token names the variable which contains the �token number� of the �reduction token�. Prior to calling your reduction procedure, your parser will set this field to the token number of the default �reduction token�, i.e., the leftmost syntactically correct token in the reduction token list for the production being reduced. If the reduction procedure establishes that a different reduction token is appropriate, it should store the appropriate token number in PCB.reduction_token. ## Reduction Procedures The Reduction Procedures window lists the C function prototypes for the �reduction procedure�s in your grammar. When this window is active, the �syntax file� window, if visible, is synchronized with it so you can see the body of the reduction procedure as well as its usage. ## REDUCTION_TOKEN_ERROR REDUCTION_TOKEN_ERROR is a user definable macro which your �parser� invokes when it encounters an inadmissible reduction token. This error should occur only if your parser uses �semantically determined productions� and your �reduction procedure� provides an incorrect �token number�. If you do not define it, AnaGram will define it so that it will print an error message on stderr and abort the parse. ## Reduction Procedure, Semantic Action A "reduction procedure", or "semantic action", is a function you write which your �parser� executes when it has identified the grammar rule to which the reduction procedure is attached in your grammar. When your parser has identified a particular �grammar rule�, that is to say, a particular sequence of �tokens� that you have specified in your grammar, it "reduces" the production to the token at the head of the production, or �reduction token�. If you choose, you can specify a "reduction procedure" which your parser will call so that your program can do semantic analysis on the production just identified. Your reduction procedure will be called using, as arguments, the �semantic values� of tokens on the right side of the production. Your reduction procedure may, if you choose, return a value which will become the semantic value of the reduction token. Since many of the tokens in �production�s are there for only syntactic purposes, you may specify, when you write your grammar, the tokens whose values are needed as arguments for your reduction procedure. To attach a reduction procedure to a grammar rule, just write it immediately following the rule. There are two formats for reduction procedures, depending on the size and complexity of the procedure. The first form consists of an equal sign followed by a C expression and a semicolon. When the rule is matched the expression will be evaluated and its value will be stacked on the �parser value stack� as the value of the reduction token. For example: =-a; =myProcedure(x, q); The second form consists of an equal sign followed by a block of C code enclosed in curly braces. If you wish to return a value for the reduction token you have to use a return statement. For example: ={ if (x > y) return x; return x+2y; } In both forms of the reduction procedure, �parameter assignment�s may be attached to �rule element�s in order to make their semantic values available to the reduction procedure. When the reduction procedure is executed, local variables will defined with the names specified in the parameter assignments. The values of these variables will have been set to the value of the corresponding token. If the return value of your reduction procedure is a C++ object, you may wish to spacify that AnaGram enclose it in a �wrapper� so that constructor calls and destructor calls are made. Otherwise the object pushed onto and popped from the parser value stack simply by coercing the stack pointer to the appropriate type. The reduction procedures in your grammar are summarized in the �Reduction Procedures� window. ## Reduction States The Reduction States window can be accessed via the �Auxiliary Windows� popup menu from any window which displays �parser state� numbers and �marked rule�s. If the highlighted �grammar rule� has no marked token, the Reduction States window will show the states the parse could reach by reducing the rule and processing the resultant �reduction token�. ## Reduction Token A �token� which appears on the left hand side of a �production� is called a reduction token. It is so called because when the �grammar rule� on the right side of the production is matched in the input stream, your �parser� will "reduce" the sequence of tokens which matches the rule by replacing the sequence of tokens with the reduction token. If more than one reduction token is specified, the production is called a �semantically determined production� and your �reduction procedure� should choose the appropriate reduction token. If it does not, your parser will use the first token in the list that is syntactically correct as the default. The �CHANGE_REDUCTION� macro can be used to specify the reduction token. Note that a "reduction token" is not the same as a "�reducing token�". ## Reduction Trace The Reduction Trace window is available from the �Conflicts� window and the �Resolved Conflicts� window. It can be used in conjunction with the �Conflict Trace� to study �conflict�s. The Reduction Trace represents the result of taking the reduce option in the conflict state of the Conflict Trace. ## Reentrant Parser "Reentrant parser" is a �configuration switch� which defaults to off. If it is on when AnaGram builds a parser AnaGram will generate code that passes the pointer to the �parser control block� via calling sequences, rather than using static references to the pcb. You can use the reentrant parser switch to help make �thread safe parsers�. The reentrant parser switch is compatible with both C and C++. The reentrant parser switch cannot be used in conjunction with the �old style� switch. When you have enabled the reentrant parser switch, the �parse function�, the �initializer� function, and the �parser value function� will be defined to take a pointer to the parser control block as their sole argument. ## Reload Button The �File Trace� window includes a reload button to allow you to reread your �test file� after you have modified it without having to start a new file trace. After the file has been reread, the file trace is reset. ## rename macro AnaGram uses a number of macros in its generated code. It is possible, therefore, to run into naming collisions with other components of your program. The rename macro �attribute statement� allows you to change the name AnaGram uses for a particular macro to avoid these problems. For example, in the Microsoft Foundation Classes, V4.2, there is a class called "CONTEXT". If you use the �context stack� option in AnaGram, your �parser� will have a macro called �CONTEXT�. To avoid the name collision, add the following attribute statement to any configuration section in your grammar: rename macro CONTEXT AG_CONTEXT Then, simply use "AG_CONTEXT" where you would otherwise have used "CONTEXT". ## reserve keywords "reserve keywords" is an �attribute statement� which can be used to specify a list of �keyword�s that are reserved and cannot be used except as explicitly specified in the grammar. In particular this switch enables AnaGram to avoid issuing meaningless �keyword anomaly� warnings. AnaGram does not automatically presume that keywords are also reserved words, since in many grammars there is no need to specify reserved words. Reserve keywords statements must be made inside �configuration sections�. Each statement consists of the keyword "reserve keywords" followed by a list of keyword �tokens�. The tokens must be separated by commas and the list must be enclosed in braces ({ }). Each keyword listed will then be treated as a reserved word. ## Reset Button The Reset button, found on �File Trace� and �Grammar Trace� windows restores the initial configuration of the trace. This is especially convenient for �Conflict Trace� or other �Auxiliary Trace�s. ## Resolved Conflicts AnaGram creates the Resolved Conflicts window only when the grammar it is analyzing has �conflict�s and when those conflicts have been resolved by �precedence declaration�s, by "�sticky�" statements, or in connection with the explicit use of a token specified in a �disregard� statement. The Resolved Conflicts window shows the conflicts that have been resolved, using the same format as that of the �Conflicts� Window. The rule chosen is marked with an asterisk in the leftmost column of the window. ## Resynchronization "Resynchronization" is the process of getting your parser back in step with its input after encountering a �syntax error�. As such, it is one method of �error recovery�. Of course, you would resynchronize only if it is necessary to continue after the error. There are several options available when using AnaGram. You could use the �auto resynch� option, which causes AnaGram to incorporate an automatic resynchronizing procedure into your parser, or you could use the �error token resynchronization� option, which is similar to the technique used by YACC programmers. ## right "right" controls a �precedence declaration�, indicating that all of the listed �rule elements� are to be considered �right associative�. ## Right Associative A binary operator is said to be right associative if an expression with repeated instances of the operator is to be evaluated from the right. Thus, for example, when '=' is used as an assignment operator x = a = b is normally taken to mean a = b followed by x = a The assignment operator is said to be right associative. In �grammar�s with �conflict�s, you may use �precedence declaration�s to specify that an operator should be right associative. ## Rule Context The Rule Context window can be accessed via the �Auxiliary Windows� menu in any window that displays �grammar rule�s. AnaGram displays all occurrences in the �grammar� of all the �reduction token�s for the rule. ## RULE_CONTEXT RULE_CONTEXT is a macro you may use if you have defined a �context stack�. In any reduction procedure, RULE_CONTEXT will be a pointer to the context value stacked before the first token of the rule being reduced. Since the context stack contains an entry for each token in the rule, you may inspect the context value for each token in the rule by subscripting RULE_CONTEXT. RULE_CONTEXT[k] is the context of the (k-1)th token in the rule. ## Rule Coverage "Rule Coverage" is the name of both a �configuration switch� and a window. The configuration switch defaults to off. If you set it, AnaGram will include code in your �parser� to count the number of times your parser identifies each �grammar rule� in your grammar. To maintain the counts, AnaGram declares, at the beginning of your parser, an integer array, whose name is created by appending "_nrc" to your �parser name�. The array contains one counter for each rule you have defined in your grammar. There are no entries for the auxiliary rules that AnaGram creates to deal with set overlaps or �disregard� statements. In order to identify positively all the rules that the parser reduces, AnaGram has to turn off certain optimization features in your parser. Therefore a parser that has rule coverage enabled will run slightly slower that one with the switch off. In addition, AnaGram creates a pair of functions to write the counters to a file and to initialize the counters from a file. The names of these functions are given by appending "_write_counts" and "_read_counts" to the name of your parser. The name of the file is given by the �coverage file name� paramater which defaults to the name of your �syntax file� but with the extension ".nrc". If rule coverage is enabled, AnaGram will also enable the Rule Coverage option on the �Browse Menu�. If you select Rule Coverage, AnaGram will initialize a �Rule Coverage� window from the rule count file you select. AnaGram will warn you if the rule count file is older than the syntax file, since under those conditions, the coverage file might be invalid. ## Rule Derivation, Token Derivation You can use the Rule Derivation and Token Derivation windows to understand the nature of �conflict�s in your grammar. To create these windows, open the �Conflicts� window. Move the cursor bar to a �completed rule�, that is, one which has no marked token. Press the right mouse button to pop up the �Auxiliary Windows� menu. You may then select the Rule Derivation or the Token Derivation. The Rule Derivation window and the Token Derivation window, together, show how a �conflict�, or ambiguity, has arisen in your grammar. Both windows contain a sequence of rules, and both begin with the same rule, the rule which is the root cause of the conflict. Each subsequent line in the rule derivation is an �expansion� of the marked token in the previous rule. The last rule in the derivation window is the rule you selected in the Conflicts window. Thus the rule derivation window shows you how the rule involved in the conflict derives from the root. Each subsequent line in the token derivation window shows an expansion of the marked token in the previous rule. The first token of the last rule in the derivation window is the token that causes the conflict. This is the usage that is inconsistent with other usages of this token in the conflict state. The Rule Derivation and Token Derivation windows each have five auxiliary windows. The �Rule Context� window is keyed to the highlighted rule. the other four windows, the �Expansion Rules� window, the �Productions� window, the �Set Elements� window and the �Token Usage� window are keyed to the marked token. Remember that there is no marked token on the last line of the Rule Derivation window. ## Rule Element A �grammar rule� is a list of "rule elements", separated by commas. Rule elements may be �token name�s, �character sets�, �keyword�s, �immediate action�s, or �virtual productions�. When AnaGram encounters a rule element for which no token presently exists, it creates one. Any rule element may be followed by a �parameter assignment� in order to make the �semantic value� of the rule element available to a �reduction procedure�. ## Rule Number AnaGram assigns a unique rule number to each �grammar rule� that you specify in your grammar. Rules are numbered sequentially as they are encountered in the �syntax file�. AnaGram constructs rule 0 itself. Rule zero has a single �rule element�, the �grammar token�, unless you have an �disregard� statement in your grammar. In this case, there will be two elements. In AnaGram displays, rule numbers are displayed with a prefixed 'R' and a three digit decimal number. ## Rule Stack, Rule Stack Pane The Rule Stack pane appears across the bottom of a �Grammar Trace� or �File Trace� window. It provides an alternate view of the parser stack for the trace, showing, for each state, rules instead of the tokens that you see in the �Parser Stack pane�. Because it is synched with the syntax file window, the Rule Stack makes it easy to see the relationship between the trace and your grammar. For each level of the parser stack, the Rule Stack shows the �parser state� number and all the active rules. The active rules at any state consist of all the �expansion rule�s for the state that are consistent with the input at all subsequent states. Except for the last level of the stack, each rule has a �marked token�, which in the default configuration is displayed in bold, italic type. The significance of the marked token is that all tokens in the rule to the left of the marked token have already been matched in the input, and the input in subsequent levels is consistent so far with the marked token. As more input is processed, rules that are inconsistent with the new input are deleted from the display. The last level of the stack shows the current state of the parser and the rules against which the �lookahead token� will be matched. At this level, there may be rules with no marked tokens. These are rules which have been matched exactly in the input. If there is more than one such rule, at the next parser step the parser will use the lookahead token to determine which rule to reduce. In the last level of the stack, marked tokens represent the input the parser expects to see. The Rule Stack pane is synched with the �syntax file� window if it is visible so that the rule highlighted in the Rule Stack can be seen in context in the syntax file. For rules that AnaGram generated automatically (to implement �virtual productions� or the �disregard� statement). the cursor bar will move to the top of the syntax file window. The Rule Stack pane is also synched with the other panes in the trace. As you move the cursor bar in the Rule Stack, the cursor bar in the Parser Stack pane will track the stack level in the Rule Stack. In a File Trace, text will be highlighted in the �Test File� pane corresponding to the selected token in the Parser Stack pane. In a Grammar Trace, the marked token in the highlighted rule will be highlighted in the �Allowable Input pane�. Clicking the right mouse button pops up an �Auxiliary Windows� menu to give you more information about the highlighted rule. ## Rule Table The Rule Table lists, in numerical order, all the �grammar rule�s defined in your �grammar�. Each rule is preceded by the �nonterminal� tokens which produce it. If you are not using �semantically determined production�s, then there will be precisely one token line per rule. The Rule Table is synched to your �syntax file� to show the rule in context. ## Semantic Value, Token Value A �token� generally has a "semantic value", or "token value", as well as the �token number� which identifies it syntactically. Each instance of the token in the input stream can have a different value. For example, you might have a token called "variable name". In one instance the variable name might be "widget" and in another, "wombat". Then "widget" and "wombat" would be the semantic values in the two instances. Another token might have numeric semantic values. You can specify the C or C++ �data type� of the token value. The data type of "variable name" could be "char *" where the value is a pointer to a string holding the name. There are separate default types for the values of �terminal� and �nonterminal� tokens. In the usual case of ordinary character input, the value of a terminal token is just the ascii character code. The value of a nonterminal token is determined by the �reduction procedure�s attached to the rules the token produces. If there is no reduction procedure, the value of the token is the value of the first token in the rule. It should be noted that the stack operations have been implemented in such a way that a C++ object that belongs to a class for which the assignment operator has been overridden will encounter serious problems. This shortcoming will be addressed in a future version of AnaGram. Note that there is no problem with using a pointer to any C++ object. ## Semantically Determined Production A "semantically determined production" is one which has more than one �reduction token� specified on the left side of the �production�. You would write such a production when the reduction tokens are syntactically indistinguishable. The �reduction procedure� may then specify which of the listed reduction tokens the grammar rule is to reduce to based on semantic considerations. If there is no reduction procedure, or the reduction procedure does not specify a reduction token, the parser will use the first syntactically correct one in the list. To simplify changing the reduction token, AnaGram provides a predefined macro, �CHANGE_REDUCTION�. The �semantic value�s of all the reduction tokens for a given semantically determined production must have the same �data type�. �File Trace� and �Grammar Trace� have a �Reduction Choices pane� which appears when a semantically determined production is invoked and you need to choose a reduction token. ## Set Elements The Set Elements window is available via the �Auxiliary Windows� popup menu from windows which specify character sets, partition sets or tokens. It displays the actual characters which make up the set, or which map to the specified token. For each character, the numeric code as well as its display symbol is given. ## Set Expression, Expression A set expression is an algebraic expression used to define a �character set� in terms of individual characters, ranges of characters, or other sets of characters as constructed using �complements�, �unions�, �intersections�, and �differences�. ## Shift Action The shift action is one of the four actions of a traditional �parsing engine�. The shift action is performed when the input token matches one of the acceptable input tokens for the current �parser state�. The �semantic value� of the token and the current �state number� are stacked, the �parser stack index� is incremented and the state number is set to a value determined by the previous state and the input token. ## Shift-Reduce Conflict A "shift-reduce" �conflict� occurs if in some �parser state� there exists a �terminal token� that should be shifted, because it is legitimate input for one of the �grammar rule�s of the state, but should also be used to reduce some other rule because it is a �reducing token� for that rule. ## sn sn is a field in a �parser control block� to which your �error handling� routines and your �reduction procedure�s may refer. Its value is the current �state number� of your �parser�. sn is modified every time your parser "shifts" (performs a �shift action� on) a token or reduces (performs a �reduce action� on) a �production�. ## ss ss is a field in a �parser control block� to which your �error handling� and �reduction procedure�s may refer. It is the �state stack� for your �parser�. Before every �shift action�, the current �state number�, �sn�, is stored in PCB.ss[PCB.ssx], where �ssx� is the �parser stack index�. PCB.ssx is then incremented. ## ssx ssx is a field in a �parser control block� to which your �error handling� routines and �reduction procedure�s may refer. It is the �parser stack index� for your �parser�. On every �shift action� it is incremented. On every �reduce action� the length of the �grammar rule� being reduced is subtracted from PCB.ssx. ## State Definition The State Definition window can be accessed via the �Auxiliary Windows� popup menu from any window that specifies states. It displays the �characteristic rules� that define the state. The rules are displayed with a marked token, which is the next token needed in the input if the particular �grammar rule� is to be matched. If the rule is a completed rule, no token will be marked. Each line contains the state number, blank if it is the same as the state number of the previous line, the �rule number�, and finally the �marked rule�. The �State Definition Table�, found in the �Browse Menu�, displays the characteristic rules for all states in the �grammar�. ## State Definition Table The State Definition Table lists, for each �parser state�, all of the �characteristic rules� which define that state. The rules are displayed with a �marked token�, which is the next token needed in the input if the particular �grammar rule� is to be matched. If the rule is a completed rule, no token will be marked. Each line contains the state number, blank if it is the same as the state number of the previous line, the �rule number�, and finally the �marked rule�. In the �Auxiliary Windows� menu for many states there is a �State Definition� entry which provides the characteristic rules for the �parser state� identified by the cursor bar. ## State Expansion The State Expansion window may be accessed using the �Auxiliary Windows� menu from any window that identifies a particular �parser state�. It shows the complete set of �expansion rule�s for the state, consisting of the union of the set of �characteristic rule�s and, for each characteristic rule, the set of expansion rules for the marked token. Thus the State Expansion window shows all possible legal input to your parser in the given state. ## Sticky "Sticky" statements are �attribute statement�s and may be used just like a �precedence declaration� to resolve �conflict�s. If a �shift-reduce conflict� occurs in a state where the �characteristic token� is "sticky", the shift action will always be chosen. Sticky statements must be made inside �configuration sections�. Each statement consists of the keyword "sticky" followed by a list of �tokens�. The tokens must be separated by commas and the list must be enclosed in braces ({ }). Each token will then be treated as sticky. All conflicts which are resolved by sticky statements are listed in the �Resolved Conflicts� window. ## subgrammar Declaring a nonterminal token to be a "subgrammar" changes the way AnaGram searches for reducing tokens. Normally, if there is a completed rule in a particular state, AnaGram investigates all states to which the parser could jump on reducing the rule. It then considers all terminal tokens that are acceptable input in these states to be reducing tokens for the given rule. If this set of tokens overlaps the set of tokens for which there are shift actions, or the set of tokens which reduce a different rule, there is a �conflict�. Now consider a particular nonterminal token T and all the rules it produces, whether directly or indirectly. What the preceding remarks mean is that in determining the reducing tokens for any of these rules, AnaGram considers not only the definition, but also the usage of T. There are circumstances when it is inappropriate to consider the usage of T. The most common example occurs when building a lexical scanner for a language such as C. In this case, you can write a complete grammar for a C token with no difficulty. But if you try to extend it to a sequence of tokens, you get scores of conflicts. This situation arises because you specify that any C token can follow another, when in actual practice, an identifier, for example, cannot follow another identifier without some intervening space or punctuation. While it is theoretically possible to write a grammar for a sequence of tokens that has no conflicts, it is not usually pretty. The subgrammar declaration resolves this problem by telling AnaGram that when it is looking for reducing tokens for any rule produced directly or indirectly by a subgrammar token, it should disregard the usage of the token and only consider usage internal to the definition of the subgrammar token, as though the subgrammar token were the start token of the grammar. The subgrammar declaration is made in a �configuration section� and consists of the keyword "subgrammar" followed by a list of token names separated by commas and enclosed in braces ({ }). For example: subgrammar { name, number} ## Suspicious Production This �warning� message appears when AnaGram finds a �production� of the form x -> x. There is probably a typo somewhere in your �syntax file�. This production causes a �conflict� in your grammar. AnaGram leaves this production in your �grammar�, but if you build a parser, it will never succeed in recognizing this production. ## Switch Takes on/off Values Only The specified parameter is a �configuration switch�. The only values it may be assigned are ON and OFF. ## Symbol In writing your �grammar� you use symbols, or names, to represent most of your �tokens�. You may also use symbols to represent �character set�s, �virtual production�s, �immediate action�s, or �keyword�s. A symbol, or name, must begin with a letter or an underscore. It may then contain any number of these characters as well as digits and embedded white space (including comments). For identification purposes all adjacent white space characters within a symbol name are considered to be a single blank. Upper case and lower case letters are considered to be different. Examples: token name token/*embedded comment*/name All symbols used in your grammar are listed in the �Symbol Table� window found in the �Browse Menu�. ## Symbol Table The Symbol Table lists all the symbols, or names, you used in your grammar. �Symbol�s may be used, of course, to identify �tokens�, �definitions�, �virtual productions�, �immediate action�s, or �keyword�s. Each line in this table identifies a single symbol. The first field is the token number, if any. This is followed by the name. If the name identifies an �expression� or virtual production, it is followed by an equal sign and the expression or virtual production. ## Syntax Analysis Aborted This �warning� message appears if, because of previous errors, AnaGram is unable to complete the �Analyze Grammar� command on your �syntax file�. ## Syntax Directed Parsing Syntax directed parsing, or formal parsing, is an approach to building �parsers� based on formal language theory. Given a suitable description of a language, called a �grammar�, there are algorithms which can be used to create parsers for the language automatically. In this context, the set of all possible inputs to a program may be considered to constitute a language, and the rules for formulating the input to the program constitute the grammar for the language. The parsers built from a grammar have the advantage that they can recognize any input that conforms to the rules, and can reject as erroneous any input that fails to conform. Since the program logic necessary to parse input is often extremely intricate, programs which use formal parsing are usually much more reliable than those built by hand. They are also much easier to maintain, since it is much easier to modify a grammar specification than it is to modify complex program logic. ## Syntax Error When you specify a �grammar�, you specify a set of input character or token sequences which your �parser� will "recognize". Usually it is possible for there to be other sequences of input tokens which deviate from the rules set down by your grammar. Should your parser find such a sequence in its input which is not explicitly allowed for in your grammar, it is said to have found a "syntax error". The general treatment of syntax errors is called �error handling�, of which there are two distinct aspects: �error diagnosis� and �error recovery�. AnaGram allows you to make provision for error handling to fit your needs, but should you not do so, it will provide simple default error handling. ## Statements AnaGram source files, or �syntax file�s, consist of the following types of statements: �production�s �configuration section�s �embedded C� �definition�s �token declaration�s Statements may be in any order. Each statement must begin on a new line. If a statement cannot be construed as complete, it may continue onto another line. Statements may contain spaces, tabs or comments, but may not contain blank lines. ## Syntax File Input files to AnaGram are called syntax files. The default extension for syntax files is .syn. A syntax file contains a "�grammar�" and supporting C or C++ code. The file consists of several distinct types of statements. These are �token declarations�, �production�s, �definitions�, �embedded C�, and �configuration sections�. There may be as many of each as you need, in whatever order you find convenient. Each such statement begins on a new line. ## SYNTAX_ERROR SYNTAX_ERROR is a macro which your parser will invoke when it encounters a syntax error in its input stream. If you have set the �diagnose errors� �configuration switch�, the static variable �PCB�.�syntax_error� will contain a pointer to a diagnostic message when SYNTAX_ERROR is invoked. If you have also set the �error frame� switch, �PCB�.�error_frame_ssx� and �PCB�.�error_frame_token� will also be set appropriately. ## Tab Spacing "tab spacing" is a �configuration parameter� which controls the expansion of tabs when AnaGram displays your source file or test files in the �File Trace� window. The value of "tab spacing" is also used to set the default value of the �TAB_SPACING� macro in your parser. The default value of "tab spacing" is 8. If you prefer a different value, you should probably include an appropriate statement in your �configuration file�. For example: tab spacing = 2 ## TAB_SPACING If you have enabled the �lines and columns� switch, your parser needs to know tab spacing in order to increment the column count when it encounters a tab character. It is set up to use the value given by the TAB_SPACING macro. If you do not define TAB_SPACING in your parser, AnaGram will provide a default definition, setting it to the value of the �tab spacing� �configuration parameter�. ## Terminal, Terminal Token A "terminal token" is a token which does not appear on the left side of a �production�. It represents, therefore, a basic unit of input to your �parser�. If the input to your parser consists of ascii characters, you may define terminal tokens explicitly as ascii characters or as sets of ascii characters. If you have a lexical scanner, or preprocessor, which produces numeric codes, you may define the terminal tokens directly in terms of these numeric codes. ## Test File Binary "Test file binary" is a �configuration switch� which defaults to off. When it is off, and you select the �File Trace� option, AnaGram will read your test files in "text" mode, discarding carriage return characters. When "test file binary" is on, AnaGram will read test files in "binary" mode, preserving carriage return characters. If your parser needs to recognize carriage return characters explicitly, you should turn "test file binary" on. ## Test File Mask "Test file mask" is a string-valued �configuration parameter� which AnaGram uses to set up the file dialog for the �File Trace� command. It defaults to "*.*". If there is a conventional file name format for the input to the �parser� you are developing, you will probably want to set "test file mask" in a �configuration section� in your �syntax file� so it is easier to pick out your test files. ## Test range "Test range" is a �configuration switch� which defaults to on. When it is set, i.e., on, AnaGram will configure your parser so that it checks input characters to verify that they are within the range given by the �character universe� before it indexes the �token conversion� table. If range testing is not necessary for your parser, you may turn test range off and get a slight improvement in the performance of your parser. ## Thread Safe Parsers AnaGram 2.01 incorporates several changes designed to make it easier to write thread safe parsers. First, the �parser�s generated by AnaGram 2.01 no longer use static or global variables to store temporary data. All nonconstant data have been moved to the �parser control block�. Second, two new features which make it substantially easier to build thread safe parsers have been added. The �reentrant parser� switch makes the entire parser reentrant, by passing the pointer to the parser control block as an argument on all function calls. The �extend pcb� statement allows you to add your own variable declarations to the �parser control block� so you can avoid references to global or static variables in your �reduction procedure�s. Third, new support has been added for C++ classes, including the �wrapper� statement and the �PCB_TYPE� macro. ## token_number token_number is a field in a �parser control block� to which your �error handling� procedures and �reduction procedure�s may refer. It contains the actual �token number� of the current input token. If you are supplying token numbers directly, it is the result of using the actual input character to index the �token conversion� array, ag_tcv. ## Token Tokens are the units with which your parser works. There are two kinds of tokens: �terminal tokens� and �nonterminal tokens�. These latter are identified by the parser as sequences of tokens. The grouping of tokens into more complex tokens is governed by the �grammar rules�, or �production�s in your grammar. In your grammar, tokens are denoted by �token name�s, �virtual productions�, explicit �character representations�, �keyword�s, �immediate action�s, or �expression�s which yield �character sets�. ## Token Conversion By using �character set� �expression�s, you may in your �syntax file� define a number of input characters as being syntactically equivalent. When your �parser� gets an input character, it uses the character code to index a table called �ag_tcv�. The value it extracts from this table is the �token number� for the input character. The actual character code of the input character becomes the �token value�. ## Token Declaration A token declaration is simply a �production� with no right hand side. Token declarations can be used to define the �data type�s of tokens. To define the data type of a token, simply put the data type in parentheses preceding the name of the token. You can use a list of tokens joined by commas, if you wish. Thus: (char *) variable name, function name could be used to specify that the �semantic value�s of the tokens "variable name" and "function name" are both character pointers. Of course, token types may be specified as part of any production the token generates, but sometimes, in the interest of clarity, it is advisable to group all declarations together. ## Token Name All �nonterminal tokens� that you define in your �grammar� by means of explicit �production�s must have names by which they may be referenced. Token names are �symbols� which represent the token syntactically in your grammar specification. ## Token Names "Token names" is a �configuration switch� that defaults to off. If it is set, it causes AnaGram to include in the �parser file� a static array of character strings, indexed by token number, which provides ascii representations of token names. The name of this array is given by "<parser name>_token_names", where <parser name> is the name of the parser function as given by the value of the �parser name� parameter. AnaGram also defines a macro, �TOKEN_NAMES�, which evaluates to the name of the array. The array contains strings for all grammar tokens which have been explicitly named in the syntax file as well as tokens which represent �keyword�s or single character constants. The array is useful in creating �syntax error� diagnostics. Prior to version 2.01 of AnaGram, the TOKEN_NAMES array contained strings only for explicitly named tokens. If this restriction is required, set the �token names only� switch. Token names are also included if the �diagnose errors� switch is set. ## TOKEN_NAMES "TOKEN_NAMES" is the name of a macro that AnaGram defines to provide access to a static array of character strings indexed by token number, which provides ascii representation of token names. The array is generated if any of the �token names�, �token names only� or �diagnose errors� switches are ON. If �token names only� is set, the array contains non-empty strings only for those tokens which are explicitly named in the syntax file. Otherwise, the array also contains strings for tokens which represent keywords or single character constants. ## token names only "Token names only" is a �configuration switch� that defaults to off. If it is set, it will cause AnaGram to include in the parser file a static array containing the names of the tokens in your grammar. This array will include only those tokens to which you have assigned names explicitly and will not include character constants or keywords. "Token names only" takes precedence over �token names�. ## Token Not Used "Token not used, TXXX: <token name> is a �warning� message which appears if AnaGram finds an unused �token� in your �grammar�. Often an unused token is the result of an oversight of some kind and indicates a problem in the grammar. ## Token Number AnaGram assigns a unique number, called the "token number" to each token in the grammar, no matter whether it is a �terminal token� or a �nonterminal token�. Your parser does all of its analysis of your input stream using token numbers as its primary material. You may need to know the values of token numbers that AnaGram has assigned, either so a lexical scanner can output correct token numbers, or so a �reduction procedure� can correctly resolve a �semantically determined production�. To help you, AnaGram defines enumeration constants for each of the named tokens in your grammar. The definition of these constants is in the �parser header� file. ## Token Representation Not all of the �tokens� in your grammar have a �token name�. Some of the tokens may represent �character sets� which you spelled out explicitly, �virtual productions�, �immediate action�s, or �keyword�s. In its analysis tables, AnaGram tries to provide a meaningful representation for tokens whenever it can. Its first choice is to use the name, if it has one. Otherwise it will use the set definition or the definition of the virtual production if one exists. If AnaGram cannot otherwise represent your token, it will resort to using the token number which it normally represents using the letter T followed by a three digit, zero-padded token number. ## Token Table The Token Table lists all the tokens of your grammar. The first field is the token number. It is followed by a flag field which is "zl" if the token is a �nonterminal token� and is �zero length�. If the token is nonterminal and not zero length, the flag field contains "nt". If the token is a �terminal token�, the field is blank. The next field is blank unless the token has been declared �sticky� or has had a �precedence� level assigned. If the token is sticky, this field will contain 's'. If a precedence level has been assigned, this field will contain the letter 'l', 'r', or 'n' to indicate associativity followed by the precedence level. Finally there is the �data type� of the �semantic value� of this token and the �token representation�. ## Token Usage The Token Usage table may be accessed via the �Auxiliary Windows� menu from any window that identifies tokens. It shows all the rules in the grammar that use the token. ## Top Margin "Top margin" is an �obsolete configuration parameter�. ## Trace Coverage Trace Coverage is a table which is built whenever you run �Grammar Trace�, one of its pre-built versions, or a �File Trace�. You can access it from the �Browse Menu�. It shows the number of times each rule in your grammar has been reduced. Unless you have set the �Rule Coverage� �configuration switch�, some �null production�s and some rules that consist of only one element will not be counted because of speed optimizations in the parser tables. The Trace Coverage tables are reset to zero when you load a new syntax file or start AnaGram. ## Compound Action Traditionally, �LALR-1 parser�s use only four simple �parser action�s: shift, reduce, accept and error. AnaGram parsers use a number of compound actions in order to reduce the size of parse tables and speed up processing. A single compound action may replace several simple shift or reduce actions. The �Traditional Engine� �configuration switch� may be used to force AnaGram to use only the simple actions. ## Traditional Engine "Traditional engine" is a �configuration switch� that defaults to off. Traditional �LALR-1 parser�s use a �parsing engine� which has only four actions: �shift action� �reduce action� �accept action� �error action� AnaGram, in the interest of faster execution and more compact parse tables, uses a parsing engine with a number of short-cut, or �compound action�s. The "traditional engine" switch tells AnaGram not to use the short-cut actions. You would turn this switch on if you wished to use the �Grammar Trace� or �File Trace� to see how the standard four parser actions work for a particular combination of grammar and input. Note that to see the effects of single parser actions, you must use the �Single Step� button. Remember that in the Grammar Trace, when you single step and the token you have selected causes a reduce action, it will appear on the �lookahead line� of the �parser stack pane� and will be preselected in the �allowable input pane� until it is finally shifted in to the parser stack. Normally, you should leave the "traditional engine" switch off, Then AnaGram will, whenever possible, compress several parsing actions into one compound action in order to speed execution of the parser. Unfortunately use of the term "traditional" has sometimes created the impression that there is a conservative aspect to the operation of traditional engine parsers. This is not the case. They have the same effect, but are slower and have much larger tables. ## Type Redefinition "Type Redefinition of TXXX: <token name> is a �warning� message which appears when AnaGram finds a conflicting �data type� definition for a �token� in your �grammar�. The new definition will override the previous one. If you intend to use different type definitions, you should use extreme caution and check the generated code to verify that your �reduction procedure�s are getting the values you intended. ## Undefined Symbol "Undefined symbol: <name>" is a �warning� message which appears when AnaGram encounters an undefined �symbol� while evaluating a �character set� expression. The following warning in the �Warnings� window identifies the particular �token� AnaGram was trying to evaluate. ## Undefined Token "Undefined token TXXX: <name>" is a �warning� message which appears when the indicated �token� has been used in the �grammar�, but there is no definition of it as a �terminal token� nor does any �production� define it as a �nonterminal token�. ## Unexpected "Unexpected <element 1> in <element 2>" is a �warning� message which you may get when AnaGram analyzes your grammar. It appears when AnaGram unexpectedly encounters an instance of syntactic element 1 at the specified location in an instance of syntactic element 2. AnaGram cannot reliably continue parsing its input. Therefore, it limits further analysis to scanning for syntax errors. If this error is not the result of a prior error, you should correct your �syntax file�. Remember that this error could result from something missing just as well as from something extraneous. If element 1 is �eof�, it often means that you have an unbalanced brace or comment delimiter in the code following the indicated location. ## Union The union of two sets is the set of all elements that are to be found in one or another of the two sets. In an AnaGram syntax file the union of two �character sets� A and B is represented using the plus sign, as in A + B. The union operator has the same precedence as the �difference� operator: lower than that of �intersection� and �complement�. The union operator is �left associative�. Watch out! In an AnaGram syntax file 65 + 97 represents the character set which consists of the lower case 'a' and upper case 'A'. It does not represent 162, the sum of 65 and 97. ## Video mode "Video mode" is an �obsolete configuration parameter�. ## Virtual Production Virtual productions are a special short hand representation of �grammar rules� which can be used to indicate a choice of inputs. They are an important convenience, especially useful when you are first building a grammar. Here are some examples of virtual productions: name? // optional name name?... // 0 or more instances of name {name | number} // exactly one name or number {name | number}... // one or more instances of name or number [name | number] // optional choice of name or number [name | number]... // zero or more instances of name or number AnaGram rewrites virtual productions, so that when you look at the syntax tables in AnaGram, there will be actual �production�s replacing the virtual productions. A virtual production appears as one of the rule elements in a grammar rule, i.e. as one of the members of the list on the right side of a production. The simplest virtual production is the "optional" token. If x is an arbitrary token, x? can be used to indicate an optional x. Related virtual productions are x... and x?... where the three dots indicate repetition. x... represents an arbitrary number of occurrences of x, but at least one. x?... represents zero or more occurrences of x. The remaining virtual productions use curly or square brackets to enclose a sequence of rules. The brackets may be followed variously by nothing, a string of three dots, or a slash, to indicate the choices to be made from the rules. Note that rules may be used, not merely tokens. If r1 through rn are a set of �grammar rules�, then {r1 | r2 | ... | rn} is a virtual production that allows a choice of exactly one of the rules. Similarly, {r1 | r2 | ... | rn}... is a virtual production that allows a choice of one or more of the rules. And, finally, {r1 | r2 | ... | rn}/... is a virtual production that allows a choice of one or more of the rules subject to the side condition that rules must alternate, that is, that no rule can follow itself immediately without the interposition of some other rule. This is a case that is not particularly easy to write by hand, but is quite useful in a number of contexts. If the above virtual productions are written with [] instead of {}, they all become optional. [] is an optional choice, []... is zero or more choices, and []/... is zero or more alternating choices. Null productions are not permitted in virtual productions in those cases where they would cause an intrinsic ambiguity. You may use a �definition� statement to assign a name to a virtual production. ## Void token "Void token, <token name>, used as parameter" is a �warning� message which appears if AnaGram encounters a �data type� definition declaring a �token� to have type void when the token has previously been used in a �parameter assignment� for a �reduction procedure�. Your C or C++ compiler will complain when it tries to compile the call to the reduction procedure. ## vs vs is a field in a �parser control block� to which your �error handling� procedures and �reduction procedure�s may refer. It is the �parser value stack� for your parser. The �semantic values� of the �tokens� identified by the parser are stored in the value stack. The value stack, like the other �parser stacks�, is indexed by �PCB�.�ssx�. When you are executing a reduction procedure, PCB.vs[PCB.ssx] contains the semantic value of the first token in the grammar rule you are reducing, PCB.vs[PCB.ssx+1] contains the second, and so forth. The return value from your reduction procedure will be stored in turn in PCB.vs[PCB.ssx]. vs is defined to be of type $_vt, where "$" represents the name of your parser. AnaGram defines $_vt to be a union of fields of sizes corresponding to all the different data types declared in your syntax for the semantic values of your tokens. In order to avoid restrictions on the use of C++ classes, the fields are defined as character arrays. On some processors which have byte alignment restrictions for multibyte data, you might encounter a bus error. To correct this problem, set the �parser stack alignment� parameter to an appropriate data type. ## Warning If while analyzing your syntax file, AnaGram finds something suspicious, it is likely to issue a warning. The Warnings window will pop up automatically when the analysis has been completed. If the warning is for a �syntax error� in your input file, you will have to fix it, because AnaGram cannot successfully interpret it. Otherwise, AnaGram will be able to create a �parser� for you, if you wish, no matter how serious the warnings may be. You can bring up the Help topic associated with a highlighted warning by pressing F1 or by clicking with a �Help Cursor�. If you have syntax errors, AnaGram will synchronize the cursor in the �syntax file� window with the cursor in the Warnings window so that whenever the Warnings window is active, the cursor bar in the syntax file window will identify the location of the error. ## What's New Changes in AnaGram 2.40 Most of the changes in AnaGram 2.40 are under the hood - cleanup of source files, reorganization of the source tree, revision of build and test procedures, and so forth, in preparation for the open source release. All of this will, with luck, be invisible to the end user. Open Source AnaGram is now �open source�. AnaGram itself uses the 4-clause BSD �license�; the �parsing engine�, and thus the output files, are licensed with the less restrictive zlib �license�. Source distributions are available from http://www.parsifalsoft.com. The manual has been re-typeset using LaTeX instead of WordPerfect. The typographic consistency and formatting has been considerably improved; unfortunately, the pagination is now completely different, so page numbers are not portable to the new version. All the logic dealing with registration, trial copies, serial numbers, and so forth has been removed. Unix Support The Unix build of the �command line version� of AnaGram (agcl) is now supported and available to the public. There is at present no GUI for the Unix version. The long-term goal is to migrate the AnaGram GUI away from the closed (and orphaned) IBM Visual Age class library to something else, probably GTK, so as to support both Windows and Unix. Improved Functionality Examples. The examples have been adjusted to the current dialect of C++ and are now compilable again. The legacy "classlib" code some still depend on is being phased out. Increased Convenience File names. File names in the AnaGram distribution and source tree are no longer limited to 8+3 characters, and quite a few now have less cryptic names. Additionally, all HTML files are now named ".html", not ".htm". Installed files. The AnaGram.cgb and AnaGram.hlp files found in older releases of AnaGram no longer exist; their contents are compiled into the AnaGram executables instead. Bug Fixes Engine compiler error. The �error_message� field of the PCB has been changed to const char * so current C++ compilers will accept the code generated when �diagnose errors� is turned off. Multiple output header files. Including more than one AnaGram output header file at once used to cause some compilers to issue a warning, because an #ifndef directive was checking the wrong symbol. This has been corrected. Wrappers and error tokens. AnaGram 2.01 generated uncompilable code if you tried to use the �wrapper� feature and error token resynchronization at the same time. This has been corrected. More than 256 keywords. Build 8 of AnaGram 2.01 fixed certain problems with large keyword tables, but in the process introduced another, which is now fixed. For changes in the previous versions of AnaGram, see �What's New in AnaGram 2.01� and �What's New in AnaGram 2.0�. ## What's New in AnaGram 2.01 Changes in AnaGram 2.01 Improved Functionality Improved support for building �thread safe parsers�. All nonconstant parser data previously declared as static variables has been moved to the �parser control block�. When the �reentrant parser� switch is set, all references to the parser control block are passed to functions via calling sequences. The �extend pcb� switch provides a mechanism to add user-defined variables to the parser control block. Improved support for C++ parsers. The �wrapper� statement provides C++ wrapper classes for objects to be stored on the �parser value stack�. The �PCB_TYPE� macro allows you to derive a C++ class from the parser control block and to access its members from your �reduction procedures�. Support for the �ISO Latin 1� character set. When using the �case sensitive� switch, case conversion is performed for all ISO-Latin-1 characters, not just those in the ASCII range. Improved support for error diagnostics. It is now possible for users to provide their own text for the error messages created by the �diagnose errors� switch. In addition, the �token names� table option now includes ascii representation of individual characters and keywords instead of only named tokens. The �token names only� switch can be used for compatibility with previous versions of AnaGram More precise determination of error context. The tables used by the �error frame� option to provide the context of a syntax error have been reworked and now provide a substantially more precise localization of the error. Improved error diagnostics in AnaGram �Missing reduction procedure� diagnostic. In addition to warning that there is a �parameter assignment� without a �reduction procedure�, this diagnostic is now provided if the �default reduction value� does not have the same �data type� as the �reduction token�. �Command line version�. Diagnostics have been reformatted so they can be recognized by the Microsoft Visual C++ IDE. Refined �keyword anomaly� diagnostics. There should now be fewer false alarms. Increased Convenience �File Trace�. If your grammar uses �semantically determined productions�, the File Trace feature will now remember the choices you have made for �reduction token�s, so that you do not have to make the same choices over and over again as you work with an example. File Paths. The file paths in the #line directives created by the �line numbers� switch now use forward slashes instead of backslashes. Changed Defaults �Parser stack alignment�. Now defaults to long instead of int. �Parser stack size�. Now defaults to 128 instead of 32. Bug Fixes Interaction between context tracking and error token. In previous versions of AnaGram, if the first token in a rule was the �error token�, the value of �CONTEXT� was the value that corresponded to the location of the error. CONTEXT now correctly shows the context at which the aborted rule began. For instance, in the following example, if a syntax error is encountered while parsing the expression, the error rule will skip over remaining characters to the terminating semicolon. When invoked from handleError(), the CONTEXT macro will return the context as it was at the beginning of the expression. expression statement -> expression, ';' -> error, ~(eof + ';')?..., ';' =handleError(); �Distinguish lexemes�. Several minor bugs in the implementation of distinguish lexemes have been corrected. Set partition logic. Corrected problems in the interaction between the set �partition� logic and the implementation of the �disregard� statement. Table size. Fixed a data sizing problem which occurred when one particular parse table had precisely 256 entries. Keyword recognition. Fixed a problem that could cause difficulties with �keyword� recognition when the �case sensitive� switch was turned off. Default conflict resolution. With unresolved �shift-reduce conflict�s, the shift case was not always being selected. This problem has been corrected. Lockup. It was possible to write an erroneous grammar that would cause AnaGram to lock up. This problem has been corrected. Potential bus error. The error diagnostic funtion created by the �diagnose errors� switch, could, under some circumstances, access an uninitialized value on the �parser value stack�. This problem has been corrected. Internal errors. Fixed a number of minor bugs which could cause �internal error�s while running �File Trace�. For changes in the previous version of AnaGram, see �What's New in AnaGram 2.0�. ## What's New in AnaGram 2.0 AnaGram's user interface has been completely revamped to make it more convenient and easier to use. However, the same tried and true AnaGram algorithms are still in place to build your parsers. The rules for syntax files are also unchanged. The �File Trace� and �Grammar Trace� facilities have each had their windows combined into a single unit, and a �Rule Stack� synched with these windows and with your syntax file window has been added. The Rule Stack is particularly convenient for relating the progress of the parse to the �grammar rules� in your �syntax file�. A �text entry� field has also been added to the Grammar Trace. This means you can provide character input to your parser in much the same way you can with a �test file� in File Trace, but with instant control over the input. Some further controls have been added to both File and Grammar Traces. In particular there is a Reset button to reset the trace to its initial state. This is particularly useful for �Conflict Trace�s. AnaGram now has a small �Control Panel� (default position is at the upper right of the screen) from which you can conveniently control operation. A menu bar provides access to the various commands and tables. There are toolbar buttons for Analyze Grammar, Build Parser, File Trace, and so on. The panel also has a data entry field for entering search keys. You can set both colors and fonts in AnaGram windows to suit your own preferences. We suggest you check Help for �Colors� or �Fonts� before making changes to make sure that all information will still be properly displayed. AnaGram's �Help� has been updated to provide hypertext-type links. But you can still keep multiple Help windows on view at once. A popup menu shows all the links in a window. New topics have been added. Also, further documentation topics are provided in HTML format in the html subdirectory. A �Help Cursor� on the Control Panel toolbar can be used to get help for most AnaGram windows, buttons and menu items. F1 can also be used. On the �Action Menu� you will find a list of your most recently used syntax files. Just click on the file of your choice to have AnaGram analyze it (or build it if �Autobuild� is on). ## White Space In many grammars it is desirable to pass over blanks, tabs, and similar characters, as well as comments, collectively termed "white space", as though they were not there. The "�disregard�" statement in AnaGram may be optionally used to accomplish this. The "�lexeme�" statement may be used to exercise fine control over the scope of the disregard statement. ## Wrapper The wrapper �attribute statement� provides correct handling of C++ objects returned by �reduction procedure�s. If you specify a wrapper for a C++ object, then, when a reduction procedure returns an instance of the object, a copy of the object will be constructed on the �parser value stack� and the destructor will be called when the object is removed from the stack. Without a wrapper, objects are stored on the value stack simply by coercing the stack pointer to the appropriate type. There is no constructor call when the object is stored nor a destructor call when it is removed from the stack. Classes which use reference counts or otherwise overload the assignment operator should always have wrappers in order to function correctly. Wrapper statements, like other �attribute statements�, must appear in configuration sections. The syntax is simply wrapper { <comma delimited list of data types> } For example: [ wrapper {CString, CFont} ] You cannot specify a wrapper for the �default token type�. If your parser exits with an error condition, there may be objects remaining on the stack. The �DELETE_WRAPPERS� macro may be used to delete these objects. If you have enabled �auto resynch�, DELETE_WRAPPERS will be invoked automatically. The �AG_PLACEMENT_DELETE_REQUIRED� macro is used to control definition of a "placement delete" operator in the wrapper class AnaGram defines. ## Zero Length A zero length �token� is a �reduction token� which can be matched by a void, i.e. by nothing at all. It represents an optional item, or a sequence of optional items, in the input. Since the matching process can involve several levels of reductions, it is most precise to use the following recursive definition: A zero length token is one which either has at least one �null production� or has at least one grammar rule defining it such that all the tokens in the rule are zero length tokens. Care should be taken when using �zero length� tokens in �recursive rule�s. If all the tokens in the rule other than the recursive token itself are zero length tokens the rule will generate an infinite loop in the generated parser. The �Token Table� identifies zero length tokens because the use of such tokens sometimes inadvertently causes �conflict�s. ## Control Panel The AnaGram Control Panel appears at the upper right of your monitor when you start AnaGram. It has a menu bar, command buttons, a button which enables a �help cursor�, and a �status indicator�. At the lower left you will see a data entry field for entering �search� keys, with neighboring search forward and search backward buttons. Notice that the �Options Menu� has a "Stay On Top" entry which allows you to specify whether the Control Panel stays on top of other AnaGram windows. ## Status Indicator The status indicator at the right of the AnaGram Control Panel shows the status of the �current grammar�: Ready Loaded Error Parsed Analyzed Built "Ready" appears only when no grammar has been selected. "Loaded" and "Parsed" are normally transitory. "Error" means at least one syntax error has been detected in your grammar and AnaGram cannot continue. Check the Warnings window to determine the nature of the problem. "Analyzed" means that a �grammar analysis� has been completed, but no �output files� have been written. "Built" means that an analysis has been completed and output files have been written. ## Help Cursor The Help Cursor is accessed via the button with the question mark on AnaGram's �Control Panel�. It is convenient for getting help on �Warning�s, browse tables, menu items and so on. If you click on the button you enable the Help Cursor, which you can then drag with the mouse. A further mouse click will provide help for the item underneath the cursor. Note further that AnaGram also has F1 help which you may find simpler and faster than the Help Cursor. ## Search AnaGram has a simple search facility to let you search for text strings in AnaGram windows. A data entry field on the �Control Panel� is provided for you to enter text. Left-clicking on the neighboring buttons lets you search either forward or backward for a line in the active window which contains at least one instance of the text. Note that the search begins at the next line after the highlighted line for forward search; at the line preceding the highlighted line for backward search. ## Search Key To find a text string in an AnaGram window, enter the string in the Search Key field in the �Control Panel� and press Enter. To find another instance of the string click on the �Find Next� button or press F3. To find a previous instance of the string click on the �Find Previous� button or press F4. In windows that have a cursor bar, a forward search begins on the line following the cursor and a backward search begins on the line preceding the cursor. ## Find Next The Find Next key, on the �Control Panel� immediately to the right of the �Search Key� field, locates the next instance of the search key in the most recently active AnaGram window. F3 is the keyboard equivalent. ## Find Previous The Find Previous key, on the �Control Panel� immediately to the right of the �Find Next� key, searches backwards for the search key in the most recently active AnaGram window. F4 is the keyboard equivalent. ## Fonts, Set Fonts The Set Fonts dialog allows you to use the fonts of your choice in AnaGram windows. You should make sure that the �marked token�s font is very distinctive so that marked tokens will show up clearly even if they are only 1 or 2 characters long. Sometimes it is helpful to use an underlined font for marked tokens. A Default button at the bottom of the dialog lets you revert to AnaGram's original fonts if you wish. ## Colors, Set Colors The Set Colors dialog allows you change the colors of AnaGram windows. Notice that in the �File Trace� the �test file pane� requires three different sets of text and background colors. You should make sure that the backgrounds, at least, can be easily distinguished from each other so the trace information can be properly displayed. You also want to take care that an active pane in a File Trace or Grammar Trace can be distinguished from inactive panes. The Default button at the bottom of the dialog lets you revert to AnaGram's original colors if you wish. Color changes pertain only to the client areas of AnaGram windows. The remaining parts of your windows will have the customary colors you have chosen for your system. ## Marked Token Some tables and trace panes display each rule with one token marked to show how far parsing has progressed in the rule. The marked token is the next input expected in the input stream. It is shown in a different font to distinguish it from other tokens in the rule. If no token is marked, the rule is a �completed rule�, i.e. it has been completely matched and will be reduced by the next input. You can set the font for marked tokens by choosing Fonts from the �Options Menu�. You should make sure that the font is very distinctive so that marked tokens will show up clearly even if they are only 1 or 2 characters long. Sometimes it is helpful to use an underlined font for marked tokens. ## Synch Parse The Synch Parse button replaces the �Single Step� button on the toolbar of the �File Trace window� when, for some reason, the location of the blinking cursor in the �test file pane� differs from the current parse position. This can occur when you single click in the test file pane or when the parse cannot track the cursor because of a �syntax error� or a �semantically determined production�. Click the synch parse button to resynch the parse with the cursor. ## Single Step The Single Step button is one of the control buttons for the �File Trace� and �Grammar Trace�. It advances the parse one �parser action� at a time. In the File Trace, it is replaced with the "�Synch Parse�" button whenever the blinking cursor loses synch with the current parse location. In the Grammar Trace, the Single Step button takes its input from the Allowable Input pane, the Reduction Choices pane, or the �text entry� field, depending on which is active. ## Proceed The Proceed button is one of the control buttons for the �Grammar Trace�. If the �Reduction Choices pane� or the �Allowable Input pane� is active, Proceed parses the highlighted token until it is shifted in to the �parser stack�. If the �text entry� field is active, Proceed parses all text in the field. If a �syntax error� is encountered, the parse stops and all �reduce action�s are undone. Note that selecting a token in Allowable Input can cause a syntax error under certain circumstances. This can happen only if the following conditions are all true: the indicated operation is a �reduction�, the reduction token for the rule being reduced has been used in several different contexts in the grammar and the specified token may follow it in some contexts and not in others. ## Reduction Choices Pane The �File Trace� and �Grammar Trace� display a Reduction Choices pane when they need to reduce a �semantically determined production�. The rule to be reduced is highlighted in the �rule stack pane�. If the �syntax file� window is visible, it shows the rule in context in your grammar. The Reduction Choices pane lists all possible �reduction token�s for the specified rule. The first reduction token that is admissible in the current context is highlighted and it appears as the �lookahead token� in the �parser stack pane�. The text that comprises the entire rule is highlighted in the �test file pane�. Select the desired reduction token before continuing with the parse. If you select a token and it does not appear as the lookahead token, it is not syntactically correct in the current context. If you try to proceed with the parse, you will get a �selection error�. ## Selection Error The �Parse Status� field indicates a "selection error" if you choose a �reduction token� from the �Reduction Choices pane� of a �File Trace� or �Grammar Trace� and the selected token is not syntactically correct in the current context. ## Parser Stack Pane The Parser Stack pane, the upper left pane of the �File Trace� and �Grammar Trace� windows, displays the �parser stack� for the current trace. Each line corresponds to one level in the parser state stack. It shows the stack index, the �parser state� for that level, and the �token� which was seen at that state. The last line of the stack, the �lookahead line�, corresponds to the current state of the parser. Since no input has yet been processed for this state, the token, if any, which appears at this level is a �lookahead token�. If you move the cursor in the Parser Stack pane of a File Trace, the text that makes up the selected token will be highlighted in the �Test File pane�. You can back the parse up to any desired stack level by double clicking at the beginning of the token text in the Test File pane. Similarly, if you move the cursor bar in the Parser Stack pane of a Grammar Trace, the �Allowable Input pane� will change to display the allowable tokens in the selected state. The previously selected token will be highlighted. Then, double click on any token in the Allowable Input pane to back the parse up and choose a token a second time. The �Rule Stack pane� of the File or Grammar Trace is also synched to the Parser Stack pane. If the �syntax file� window is visible, it will be synched to show the rule currently selected in the rule stack pane. Note that rules that have been automatically generated by the expansion of �virtual productions� cannot be synched, so the top line of the syntax file will be highlighted instead. In the Grammar Trace, the last line of the Parser Stack may or may not display a �lookahead token�, depending on the last �parser action� performed. If input was taken from Allowable Input and the last action was a simple �reduce action�, the last input token selected will be displayed as the lookahead input. But if the last action performed shifted the token in, the lookahead field will be empty. If you right-click on a highlighted line in the Parser Stack pane, you will get a pop-up menu to give you more information. In particular you can get an �Auxiliary Trace� starting at the current point in your File or Grammar Trace, so you can explore various possibilities without losing your position in the old trace. ## Exit Select this entry from the �Action Menu� to terminate AnaGram. ## Allowable Input, Allowable Input Pane The upper right pane of the �Grammar Trace� window lists the allowable input tokens for the current state of the �grammar�. The tokens in the Allowable Input pane are listed in two groups: first, the �terminal tokens� allowable in this state, and second, the �nonterminal tokens�. Between these two groups of tokens is inserted a line which is either an option for a �default reduction�, or declares that there is no default action. Double click, press Enter, or click the �Proceed� button to parse the highlighted token. When all parse actions triggered by the highlighted token have been completed, all panes of the trace will be redrawn to show the new state of the parser. Note that selecting a token in Allowable Input can cause a syntax error under certain circumstances. This can happen only if the following conditions are all true: the indicated operation is a �reduction�, the reduction token for the rule being reduced has been used in several different contexts in the grammar and the specified token may follow it in some contexts and not in others. If you wish to see the results of a single parser action, click on the �single step� button. The parser will perform a single parser action. If the token you selected was not shifted in, it will now be displayed as the �lookahead token� on the last line, the �lookahead line� in the �Parser Stack pane�, and will be preselected in the Allowable Input pane. Because AnaGram, by default, uses a number of compound parser actions, this situation does not arise very often unless you have set the �traditional engine� switch or reset the �default reductions� switch. Usually you will want to select the same token to proceed, but it is not necessary. The Allowable Input pane also displays the �parser action� associated with a specific token. If it is not a �compound action�, the action and its result are also shown. The �parser action� field for a token may be interpreted as follows: If this token would cause a shift to a new state, the action field is ">>" followed by the new state number. If the token would cause a �reduction�, the action field is "<<" followed by a �rule number� to show the rule reduced. If the parser action is a compound action, the action field is blank. If the token would cause the grammar to be accepted, the action field is "Accept". The �text entry� field at the bottom of the Grammar Trace can be used as a convenient alternative to the Allowable Input pane. It accepts characters rather than tokens. Most non-printing characters such as newline are only available from Allowable Input. ## Copy The Copy command on the �Windows Menu� copies the currently active table or Help topic to the clipboard. ## Statistical Summary While your grammar is being analyzed, a Statistical Summary window pops up to show you the progress of the analysis. Unless you have turned off �Show Statistics� on the �Options Menu�, this window will remain on-screen for your reference. Among other things, it shows you the number of rules and states in your grammar, and the number of conflicts and warnings, if any. Note that if your grammar is small and you have Show Statistics turned off, the appearance of this window on your monitor may be exceedingly brief - you may just see a flash. If the window is turned off or you have closed it, you can get it from the �Browse Menu�. ## Stay On Top The Stay On Top entry in the �Options Menu� allows you to specify whether the �Control Panel� stays on top of other AnaGram windows. ## Show Syntax If this entry in the �Options Menu� is checked, AnaGram will display the �syntax file� when it has analyzed your �grammar�. If this entry is not checked or you have closed the syntax file window, you can select the window from the �Browse Menu�. ## Show Statistics If this entry in the �Options Menu� is checked, AnaGram will leave the �Statistical Summary� on the screen after it has analyzed your �grammar�. If this entry is not checked or you have closed the Statistical Summary window, you can select the window from the �Browse Menu�. ## About AnaGram Select this entry from the �Help Menu� to find out the version and serial numbers of your copy of AnaGram, and how to contact Parsifal Software. ## Help Topics Select Help Topics from the �Help Menu� to get a complete list of AnaGram Help Topics titles. You can bring up the window for a highlighted topic by double-clicking with the left mouse button, pressing F1, or using the �Help Cursor�. ## Cascade Windows Select this entry from the �Windows Menu� to cascade your open windows starting at top left of the screen. ## Close Windows Select this entry from the �Windows Menu� to close all open windows except the �Control Panel�. You may also close the active window by pressing the Escape key. ## Hide Windows Select this entry from the �Windows Menu� to hide all open windows except the �Control Panel�. Restore them to the screen with �Restore Windows� ## Restore Windows Use this command on the �Windows Menu� to restore to the screen any windows you have previously hidden with �Hide Windows�. ## Token Input, Preprocessor, Lexical Scanner AnaGram makes it unnecessary, in most cases, to have a separate preprocessor to provide the �tokens� which are fed to your parser. However in some cases you may want to use a preprocessor, or lexical scanner, to provide input to your parser. The preprocessor may or may not be written in AnaGram. If it sends the parser token numbers, as opposed to character codes, this is referred to as token input, as opposed to character input. Please refer to the AnaGram User's Guide for information on identifying the tokens to the parser and providing their semantic values, if any. Since a �File Trace� is based on character codes, it will be greyed out on the �Action Menu� if you have token input. For a �Grammar Trace�, entering characters in the �text entry� field is not appropriate and will simply cause a syntax error. ## Lookahead Line The last line of the �Parser Stack pane�, the "lookahead" line, will sometimes show a �lookahead token�, and sometimes not. In a �File Trace�, you will always see a lookahead token because it is available from the �test file�. In a �Grammar Trace� you will usually see a lookahead token only when you have used the �Single Step� button or if there is available input in the �text entry� field. In the latter case the token corresponding to the first character of the input will appear on the lookahead line. If you click Single Step after selecting a token from �Allowable Input� and it causes only a simple �reduce action� (as opposed to a shift or a compound action), then, upon completion of the reduction, the token you selected will appear on the lookahead line and also will be preselected in Allowable Input. Usually you would select this token for the next parse step. However, if there are other possible inputs in this state, the parse theoretically could have arrived at this state by a different sequence of input tokens. Thus, if you are more interested in the behavior of the parser at this state than in the response of the parser to a particular sequence of inputs, it is perfectly valid to select a different input token, and AnaGram will let you do it. Note that if you have enabled the �traditional engine� switch or disabled the �default reductions� switch, the probability of finding a token which does a simple reduction is noticeably higher than otherwise. ## Action Menu The Action menu begins with the �Analyze Grammar� and �Build Parser� commands. If a grammar has already been analyzed, but not yet built, there will also be an extra Build command bearing the name of your syntax file. There are also �Reanalyze� and �Rebuild� commands which are initially greyed out. They become available if you change the current syntax file. The next section has �File Trace� and �Grammar Trace� commands. If you have enabled the �Error Trace� �configuration switch�, this section also shows an Error Trace command. The menu ends with an �Exit� command and a list of recently used syntax files, if any. Just click on a syntax file name to have AnaGram analyze it, or build it if the �Autobuild� option is on. ## Browse Menu Initially, the Browse Menu shows only a single entry: �Configuration Parameters� which lets you see the current state of configuration parameters before any may have been set by your syntax file. Once you have analyzed a grammar, this menu fills up with many tables containing information about your grammar. You can also bring up a window showing your �syntax file� from this menu. If your grammar has generated �syntax error�s or warnings, or contains conflicts, there will be �Warning�s or �Conflict�s entries. ## Options Menu From this menu you can select a �Fonts� or �Colors� dialog so you can set AnaGram's fonts and colors to suit your own tastes. You can set �Autobuild� if you want AnaGram to automatically build your �grammar� when you select a �syntax file� from the �Action Menu�. You can also choose whether or not to automatically show the �Statistical Summary� window or your syntax file window when you open a grammar, or make the �Control Panel� stay on top of other AnaGram windows. ## Windows Menu The Windows menu lets you cascade, close, or hide all AnaGram windows except the �Control Panel�, or restore them if they have been hidden. It also has a list of open windows (even if hidden) so you can select the one you want. The Copy command will copy most windows to the clipboard. ## Help Menu The Help Menu has the following entries: �Getting Started� provides a brief description of AnaGram and introductory suggestions. �Help Topics� brings up a list of all help topics. �Using Help� tells you how to use AnaGram's help facilities. �What's New� has information on new features of this version of AnaGram. �About AnaGram� tells you what version of AnaGram you are using, and also provides contact information for Parsifal Software. ## Autobuild When Autobuild (�Options Menu�) is checked, selecting a file from the list of most recently used files on the �Action Menu� invokes the �Build Parser� command. Otherwise, the �Analyze Grammar� command is invoked. ## Reanalyze, Rebuild Reanalyze and Rebuild commands on the �Action Menu� are initially greyed out. Reanalyze becomes available if you have a syntax file currently analyzed or built in AnaGram and change it while AnaGram is still running. Rebuild becomes available if you have a syntax file currently built and change it while AnaGram is still running. ## Percent Sign The percent sign ( % ) is used to mark certain tokens in your grammar which AnaGram must redefine in order to implement the �disregard� statement. If you have used this statement in your grammar, You will probably notice the percent sign appearing in some windows and traces. The percent sign indicates the original token, without the optional white space attached. Early versions of AnaGram used the degree sign instead, but this character is not generally available in Windows. ## Program Development The first step in writing a program is to write a �grammar� in AnaGram notation which describes the input the program expects. The file containing the grammar, called the �syntax file�, should have the extension ".syn". You could also make up a few sample input files at this time, but it is not necessary to write �reduction procedure�s at this stage. Run AnaGram and use the �Analyze Grammar� command to create parse tables. If there are �syntax errors� in the grammar at this point, you will have to correct them before proceeding, but you do not necessarily have to eliminate �conflicts�, if there are any, at this time. There are, however, many aids available to help you with conflicts. These aids are described in the AnaGram User's Guide, and somewhat more briefly in the Online Help topics. Once syntax errors are corrected, you can try out your grammar on the sample input files using the �File Trace� facility. With File Trace, you can see interactively just how your grammar operates on your test files. You can also use �Grammar Trace� to answer "what if" questions concerning input to the grammar. The Grammar Trace does not use a test file, but rather allows you to make input choices interactively. At any time, you can write �reduction procedure�s to process your input data as its components are identified in the input stream. Each procedure is associated with a �grammar rule�. The reduction procedures will be incorporated into your parser when you create it with the �Build Parser� command. By default, unless you specify an input procedure, �parser input� will be read from stdin, using the default �GET_INPUT� macro. You will probably wish to redefine GET_INPUT, or configure your parser to use �pointer input� or �event driven� input. ## License, Copyright, Copying, Open Source, Warranty, No Warranty AnaGram, A System for Syntax Directed Programming Copyright 1993-2002 Parsifal Software Copyright 2006, 2007 David A. Holland All Rights Reserved. AnaGram itself is released to the public under the traditional 4-clause BSD license: Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by Parsifal Software, Jerome T. Holland, and their contributors. 4. Neither the name of Parsifal Software nor the name of Jerome T. Holland nor the names of their contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY PARSIFAL SOFTWARE, JEROME T. HOLLAND, AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL PARSIFAL SOFTWARE, JEROME T. HOLLAND, OR THE CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. The AnaGram �parsing engine�, that is, the code that is emitted by AnaGram and incorporated into programs developed using AnaGram, uses this less restrictive zlib-style license: This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. ##