view anagram/guisupport/helpdata.src @ 24:a4899cdfc2d6 default tip

Obfuscate the regexps to strip off the IBM compiler's copyright banners. I don't want bots scanning github to think they're real copyright notices because that could cause real problems.
author David A. Holland
date Mon, 13 Jun 2022 00:40:23 -0400 (2022-06-13)
parents 13d2b8934445
children
line wrap: on
line source
Accept Action

The accept action is one of the four actions of a
traditional �parsing engine�. The accept action is
performed when the �parser� has succeeded in identifying
the goal, or �grammar token� for the �grammar�.  When
the parser executes the accept action, it sets the �exit_flag�
field in the �parser control block� to AG_SUCCESS_CODE and returns
to the calling program. The accept action is thus the last action of
the parsing engine and occurs only once for each successful execution
of the parser.

If the grammar token has a non-void value, you may
obtain its value by calling the �parser value function�
whose name is given by <parser name>_value, that is,
by appending "_value" to the �parser name�.
##

Parser Value Function, Return Value

The value assigned to the �grammar token� in your parser
may be retrieved by calling the parser value function after
the parser has finished. The name of this function is given
by <�parser name�>_value. The return type of the function
is the type assigned to the grammar token.

If you have set the �reentrant parser� switch, the parser
value function takes a pointer to the �parser control block�
as its sole argument. Otherwise, it takes no arguments. The
value function is not defined if the grammar token has type "void".
##

AG_PLACEMENT_DELETE_REQUIRED

When the �wrapper� option is specified, the wrapper
template class that AnaGram defines uses a "placement
new" operator to construct the wrapper object on the
�parser value stack�. The MSVC++ 6.0 compiler requires,
in this situation, that a corresponding "placement
delete" operator be defined. Other C++ compilers,
notably MSVC++ 5.0, generate an error message if
they encounter the definition of a "placement delete"
operator.

Accordingly, AG_PLACEMENT_DELETE_REQUIRED is used to determine
whther a "placement delete" operator should be defined.

AG_PLACEMENT_DELETE_REQUIRED is defined to be 1 if you are using MSVC++
6.0 or greater, 0 otherwise. You can override the automatic definition of
AG_PLACEMENT_DELETE_REQUIRED by defining it in the �C prologue� section
of your grammar. Set it to a non-zero value to force the "placement
delete" definition, zero to skip the definition.

##

ag_tcv

ag_tcv is an array AnaGram includes in your �parser�.
Your parser uses ag_tcv to translate external codes to
the internal token numbers that AnaGram uses. It uses
the actual input code to index the ag_tcv array to
fetch a �token number�. The token number is then used
to identify the input token.
##

Allow macros

"Allow macros" is a �configuration switch� which
defaults to on. When it is set, i.e., on, �reduction
procedure�s will be implemented as macros if they are
sufficiently simple. This makes your �parser� somewhat
more compact but makes it somewhat more difficult to
debug. It's a good idea to turn this switch off for
debugging.
##

Analyze Grammar

The Analyze Grammar command will scan and
analyze your �syntax file�, and create a number of
tables summarizing your grammar.

Analyze Grammar does not create any �output files�.
To create a �parser�, use the �Build Parser� command.
You would probably use Analyze Grammar, rather than Build Parser, during
initial development of your �grammar�.

You can use �File Trace� and �Grammar Trace� as soon as you have
analyzed your grammar. It is not necessary to build a parser first.
##

Attribute Statement

Attribute statements are used in �configuration
sections� of your �syntax file� to specify certain
properties for �token�s, �character set�s, or other
units of your grammar. The attribute statements
available are:
	�disregard�
	�distinguish keywords�
	�enum�
	�extend pcb�
	�hidden�
	�left�
	�lexeme�
	�nonassoc�
	�rename macro�
	�reserve keywords�
	�right�
	�sticky�
	�subgrammar�
	�wrapper�
##

Auto init

Auto init is a �configuration switch� which defaults to
on. It controls the initialization of any �parser� that
it is not �event driven�. When it is set to on, your
parser is automatically initialized every time it is
called. This is the situation you will normally use. On
occasion, however, it is desirable to call a parser
several times without reinitializing it. In this case,
you may set the auto init parameter to off and then
call the �initializer� yourself whenever it is
appropriate.
##

Auto resynch

"Auto resynch" is a �configuration switch� which
defaults to off. You may use it to specify �automatic
resynchronization� as an �error recovery� mechanism.

Setting the "auto resynch" switch causes AnaGram to
include an automatic �resynchronization� procedure in
your �parser�. The resynchronization procedure will be
invoked when your parser encounters a �syntax error�
and will skip over input until it finds input
characters or �tokens� consistent with its state at the
time of the error.

An alternate technique, �error token resynchronization�,
uses an �error token� which you include in your grammar.
##

Automatic Resynchronization

Automatic �resynchronization� is one of several �error
recovery� options available as part of parsers built by
AnaGram. You enable automatic resynchronization by
setting the �auto resynch� �configuration switch�. If
your parser includes automatic resynchronization it will
incorporate a heuristic procedure which will skip over
input tokens until it finds a token which makes sense
with respect to one or another of the �production�s
active at the time of the �syntax error�.

The purpose of the resynchronization procedure is to
provide a simple way for your parser to proceed in the
event of syntax errors so that it can find more than one
syntax error on a given pass. The resynchronization
procedure uses a heuristic based on your own syntax.
AnaGram itself uses this technique to resynchronize
after syntax errors in its input.

A disadvantage to using this resynchronization technique
is that the resynchronization procedure turns off all
�reduction procedure�s. Because of the error, a number
of reduction procedures, which normally would be
executed, will be skipped. The parameters for any
reduction procedures that might be called later would be
suspect and could cause serious problems. It seems more
prudent simply to shut them down.

If you use the automatic resynchronization procedure,
you must also specify an �eof token� so that the
synchronizer doesn't inadvertently skip over the end of
file.

An alternative technique for resynchronization is called
�error token resynchronization�.
##

Auxiliary Trace

An Auxiliary Trace is a pre-built grammar trace which
you may select from the �Auxiliary Windows� popup menu for
most windows which display parser state information.
The Auxiliary Trace provides a path to the state
specified in the highlighted line of the primary window.

When obtained
from the Parser Stack pane of the �File Trace� or �Grammar Trace�, the
Auxiliary Trace is simply a copy of the current status of these
traces so you can explore your alternatives while still retaining the
status of the original trace for reference.
##

Auxiliary Windows

From most AnaGram windows you can pop up an Auxiliary Windows
menu by clicking the right mouse button or by pressing Shift F10.
Auxiliary Windows may
have Auxiliary Windows of their own.

 Windows with a cursor bar (highlighted line):
The windows available in the Auxiliary Windows menu depend on the
grammar elements identified by the cursor bar in the parent window. If
the cursor bar identifies a �parser state�, there will be windows that
describe the state. If the cursor bar identifies a �grammar rule�,
there will be windows that describe the rule. If the cursor bar
identifies a �token�, there will be windows that describe the token. In
the case of a �marked rule�, token windows will describe the marked
token, if any. In some cases, specialized pre-built grammar traces
such as the �Conflict Trace� or �Auxiliary Trace� are on the menu.

 Help windows:
For Help windows, the Auxiliary Windows menu will show all the
available links to other �Help topics� from this window. �Using Help�
is always available.
##

Backtrack

If your �parser� does not continue after encountering a
�syntax error�, you can speed it up and make it a
little smaller by turning off the backtrack
�configuration switch�. If backtrack is on, AnaGram
configures your parser so that in case of syntax error
it can undo any �default reductions� it might have made
as a consequence of the erroneous input. The purpose of
such an undo function is to identify the proper �error
frame� and to maximize the probability of being able to
recover gracefully.
##

Empty Recursion

This warning message tells you that the recursive step of the
specified �recursive rule� can be completely matched by �zero
length� tokens, i.e., by nothing at all.
The result is potentially an infinite loop in the generated �parser�.
The specified rule is an expansion rule of the specified token.

Because of the possibility of encountering an infinite loop while parsing,
AnaGram turns off its �keyword anomaly� analysis if empty recursion is
found. The �File Trace� function is also disabled for the same reason.

The �circular definition� of a token has the same effect as an
empty recursion, in that no additional input is required to match
the recursive rule.

##
Keyword Anomaly analysis aborted: empty recursion

The �keyword anomaly� analysis has been turned off, since the presence of
�recursive rule�s with �empty recursion� can cause infinite loops in the analysis.

##

Keyword Anomaly analysis aborted: circular definition

The �keyword anomaly� analysis has been turned off, since the presence of
a �circular definition� can cause infinite loops in the analysis.

##

File Trace disabled: empty recursion

Because of the presence of �recursive rule�s with  �empty recursion� in this grammar and
the infinite loops that can ensue, the �File Trace� function has been
disabled.

##

File Trace disabled: circular definition

Because of the presence of a �circular definition� in this grammar and
the infinite loops that can ensue, the �File Trace� function has been
disabled.

##



Both Error Token Resynch and Auto Resynch Specified



This �warning� message indicates that your �grammar�
defines an �error token� and also requests �automatic
resynchronization�. AnaGram will ignore the request
for automatic resynchronization and will provide �error
token resynchronization�. If you named a token "error"
but do not wish �error token resynchronization�, you can
either rename "error", or, in a �configuration
section�, you may explicitly specify the error token to
be something you don't otherwise use in your grammar:
	[ error token = not used ]
##

Bottom Margin

"Bottom margin" is an �obsolete configuration parameter�.
##

Bright Background

"Bright background" is a �configuration switch� which
was used in the DOS version of AnaGram. It is no longer
used, but is still recognized for the sake of upward
compatibility with old �configuration file�s.
##

Build Parser

You use the Build Parser command to create a �parser� based on your
�grammar�. The parser is a C file consisting of the �embedded C� (which
may include C++) code in your �syntax file�, your �reduction
procedure�s, a number of tables derived from your grammar
specification, and a �parsing engine� customized to your requirements.

If you only wish to investigate your grammar and do not
wish to create �output files�, use the �Analyze
Grammar� command.
##

Build <file name>

This item on the �Action Menu� is available when you have analyzed a
�grammar� but you have not yet built it. It builds the grammar
without reloading the �syntax file� from the disk.
##

Cannot Make Wrapper for Default Token Type

This �warning� message occurs when AnaGram finds a token type that has
been previously defined as the �default token type�
listed in a �wrapper� statement. If a wrapper is needed for a
particular type, you must specify the �data type� explicitly
for each relevant �token�.

As a result, a wrapper class has not been created for the specified token type.
##

Token with Wrapper cannot be Default Token Type

This �warning� message indicates that an attempt has been made
to specify a class that has previously been listed in a �wrapper�
statement as the �default token type�.
If a wrapper is needed for a particular type, you must specify the
�data type� explicitly for each relevant �token�.

As a result, the default token type has not been set.
##

Case Sensitive

"Case sensitive" is a �configuration switch� which
defaults to on. When it is on, it instructs AnaGram to
build a parser for which all input is case sensitive.
When it is off, the AnaGram builds a parser which
ignores case for all input.

If the �iso latin 1� configuration switch is turned
off, case conversion will be limited to characters
in the normal ascii range. When it is on, case
conversion will be done for all iso latin 1 characters.

If you have other requirements for case conversion,
you may provide your own definition in your �embedded c� for the
�CONVERT_CASE� macro which is invoked to perform case
conversion on input characters.

Note that the value of an input token is unaffected
by the case sensitive switch. When case sensitive is
off, 'a' and 'A' will be treated as the same input
token by the parser, but the �token value�s will
nevertheless be different.
##

C Prologue

If you include a block of �embedded C� code at the very
beginning of your syntax file, it is called the "C
prologue". It will be copied to your �parser file�
before any of the code generated by AnaGram. You can
use the C prologue to ensure that copyright notices,
#include directives, or type definitions, for example,
occur at the very beginning of your parser file.

If you specify a C or C++ type of your own definition,
you must provide a definition in the C prologue.
##

CHANGE_REDUCTION

CHANGE_REDUCTION(t) is a macro which AnaGram defines in
your �parser file� if your �parser� uses �semantically
determined productions�. In your �reduction procedure�,
when you need to change the �reduction token� you can
easily do so by calling CHANGE_REDUCTION with the name
of the desired token as the argument. If the token name
has embedded spaces, replace the embedded spaces with
underline characters.
##

Character Constant

You may represent single characters in your �grammar� by
using character constants. The rules for character
constants are the same as in C. The escape sequences
are as follows:
	\a		alert (bell) character
	\b		backspace
	\f		formfeed
	\n		newline
	\r		carriage return
	\t		horizontal tab
	\v		vertical tab
	\\		backslash
	\?		question mark
	\'		single quote
	\"		double quote
	\ooo	octal number
	\xhh	hexadecimal number

 AnaGram treats a single
character as a �character set�
which contains only the specified character. Therefore you
can use a character constant in a �set expression�.
##

Character Map

The Character Map table shows you the mapping of input
characters to �token numbers�. The �ag_tcv� table in
your parser is based on the information in this table.

The fields in this table are:
	character code
	display character, if any (what Windows displays for this code)
	�partition set number�
	�token number�
	�token representation�

The display character will be what Windows displays for the character
code in the Data Tables font you have chosen.
##

Character Range

A "character range" is a simple way to specify a
�character set�. There are two ways to represent a
character range in an AnaGram �syntax file�.

The first way is like a �character constant�: 'a-z'.

The second way allows somewhat greater freedom:
	'a'..'z'
	'a'..255
	^Z..037
	-1..0xff
Here you use two arbitrary �character representations�
separated by two dots. If the two characters are out of
order, AnaGram will reverse the order, but will give
you a �warning�.

More complex �character sets� may be specified by using
�union�, �difference�, �intersection�, or �complement�
operators.
##

Character Representation

In an AnaGram �syntax file� you may represent a
character literally with a �character constant� or
numerically using decimal, octal or hexadecimal
representations following the conventions for C. Thus
'A', 65, 0101, and 0x41 all represent the same
character. Control characters can be represented using
the '^' character and either an upper or lower case
letter. Thus ^j and ^J are acceptable representations
of the ascii newline code. The rules for character
constants are identical to those in C, and the same
escape sequences are recognized.
##

Character Set

In AnaGram grammars you can conveniently specify whole
sets of characters at a time. This avoids
needless repetition and complexity.

Sets of characters may be defined in an AnaGram �syntax
file� in any of a number of ways. A single character is
taken to represent a character set consisting of a
single element. (See �character representation�.) You
can also specify a set consisting of a range of
characters (see �character range�) and perform the
familiar set operations, union, intersection, difference
and complement.

All the sets you define in your syntax file are
summarized in the �Character Sets� window.

The �union� of two character sets, represented by a '+',
contains all characters that are in one or another of
the two sets. Thus, 'A-Z' + 'a-z' represents the set of
all upper and lower case letters.

The �intersection� of two character sets, represented
by a '&', contains all characters that are in both
sets. Thus, suppose you have the �definitions�
	letter = 'A-Z' + 'a-z'
	hex digit = '0-9' + 'A-F' + 'a-f'
Then (letter & hex digit) contains precisely upper and
lower case a to f.

The �difference� of two character sets, represented by
a '-', contains all characters that are in the first
set but not in the second set. Thus, using the same
definitions as above, (letter - hex digit) contains
precisely upper and lower case g to z.

The �complement� of a character set, represented by a
preceding '~', represents all characters in the
�character universe� which are not in the given set.
Suppose you have defined a set, �eof�, which consists of
the characters which represent end of file. Then, in
your grammar where you wish to accept an arbitrary
character, what you really want is anything but an end
of file character. You can define it thus:
	anything = ~eof
##

Character Sets

This window lists all of the distinct �character set�s
which you defined, implicitly or explicitly, in your
�grammar�. Each line in the table describes one such
set.

The description takes the form of the internal set
number and the defining �expression�. The �Auxiliary
Windows� menu will allow you to see the �Partition
Sets� which cover the character set, and the �Set
Elements� which it comprises, as well as the �Token Usage�.
##

Character Universe, Universe

The character universe, or set of all expected input
characters to your parser, is defined as all characters
in the range given by a particular lower bound and a
particular upper bound, as described below.

The character universe is used for two things in
AnaGram. The first use is for calculating the
�complement� of a character set. The second use is in
the input processing of your parser. Input characters
will be used to index a �token conversion� table to
convert character codes to token numbers. The length of
this table will be given by the size of the character
universe. If you have set the �test range�
�configuration switch� you parser will verify that the
input character is within the range of the conversion
table. Otherwise, the character code will not be
checked for validity. In this case, an out-of-range
character will lead to undefined behavior.

If you have not used any characters with negative codes
in your grammar, the lower bound is zero. Otherwise, it
is the most negative such character.

If the highest character code you have used is less
than or equal to 255, the upper bound will be 255.

If you have used a character code greater than 255, the
upper bound will be the largest such code which appears
in your syntax file.
##

Characteristic Rule

Each �parser state� is characterized by a particular
set of �grammar rules�, and for each such rule, a
marked token which is the next �token� expected. The
combination of a grammar rule and its marked token is often
called a �marked rule�. A marked rule which
characterizes a state is called a "characteristic
rule". In the course of doing �grammar analysis�,
AnaGram determines the characteristic rules for each
�parser state�. After analyzing your grammar, you may
inspect the �State Definition Table� to see the
characteristic rules for any state in your parser.
##

Characteristic Token

Every state in a �parser�, except state 0, can be
characterized by the one, unique �token� which causes a
jump to that state. That token is called the
�characteristic token� of the state, because to get to
that �parser state� you must have just seen precisely
that token in the input. Note that several states could
have the same characteristic token.

When you have a list of states, such as is given by the
�parser state stack�, it is equivalent to a list of
characteristic tokens. This list of tokens is the list
of tokens that have been recognized so far by the
parser.
##

Circular Definition

If the �expansion rule�s for a �token� contain a �grammar rule� that
consists only of the token itself, the definition of the
token is circular. A circular definition is an extreme
case of �empty recursion�.

As in cases of empty recursion, the generated parser may contain
infinite loops. When such a condition is detected, therefore,
�keyword anomaly� analysis the �File Trace� option are disabled.

##

column

"column" is an integer field in your �parser control
block� used for keeping track of the column number of
the current character in your input. Line and column
numbers are tracked only if the �lines and columns�
�configuration switch� has been set.
##

Command Line

If you provide the name of a syntax file on the
command line when you start AnaGram, it will open
the file and run either �Analyze Grammar� or �Build
Parser� depending on the setting of the �Autobuild�
switch.
##

Command Line Version, agcl.exe

The command line version of AnaGram, agcl.exe, can be
used in make files. It takes the name of a single syntax
file on the command
line. Error and �warning� messages are written to stdout.

Normally you would only use the command line version once you
have finished developing your �parser� and are integrating
it with the rest of your program.

The command line version of AnaGram is not included with
trial copies.
##

Comment

You may incorporate comments in your syntax file using
either of two conventions. The first is the normal C
convention for comments which begin with "/*" and end
with "*/". Such comments may be of arbitrary length. By
setting or resetting the �nest comments� switch, you
may control whether they may be nested or not.

The second convention for comments is the C++ comment
convention. In this case the comment begins with "//"
and ends with a newline.

When writing a �grammar�, you may wish to allow a user
to comment his input freely without your having to
explicitly allow for comments in your grammar. You may
accomplish this by using the �disregard� statement.
##

Compile Command

"Compile command" is a �configuration parameter� which
takes a string value. This parameter was used in the
DOS version of AnaGram, but is ignored in the Windows
version.
##

Complement

In set theory, the complement of a set, S, is the set
of all elements of the �universe� which are not members
of the set S.

In AnaGram, the complement operator for �character
sets� is given by '~' and has higher precedence than
�difference�, �intersection�, or �union�.

In AnaGram, the most useful complement is that of the
end of file character set. For ordinary ascii files it
is often convenient to read the entire file into
memory, append a zero byte to the end, and define the
end of file set thus:
	eof = 0 + ^Z.
Then, ~�eof� represents all legitimate input characters.

You can then use set differences to specify certain
useful sets without tedious enumeration. For example, a
comment that is to be terminated by the end of line
then consists of characters from the set
	comment char = ~'\n' & ~eof
This set could also be written
	comment char = ~('\n' + eof)
##

Completed Rule

A "completed rule" is a �characteristic rule� which has no �marked
token�.  In other words, it has been completely matched and will be
reduced by the next input.

If there is more than one completed rule in a state,
the decision as to which to reduce is made based on the
next input token. If there is only one completed rule
in a state, it will be reduced by default unless the
�default reductions� switch has been reset, i.e.,
turned off.
##

Configuration File

If it can find them, AnaGram reads two configuration
files to set up �configuration parameter�s. At program
initialization, it will first attempt to read a
configuration file in the directory that contains
the AnaGram executable file you are running. Then it
will read a configuration file in your working
directory. Both files should have the name
"AnaGram.cfg" if they exist. Neither is necessary.

If a parameter is specified in both files, the
specification in the file from the working directory
takes precedence.

The effect of this two stage process is to allow you to
set your standard preferences in the principal
directory, with specific overrides in your working
directories.

The values for configuration parameters in �syntax
files� override those read from configuration files.

AnaGram does not save configuration parameters in
the Windows registry, nor does it provide any
mechanism for setting or changing the values of
configuration parameters within AnaGram itself.
##

Configuration Parameter

Configuration parameters may be specified either in
�configuration files� or in your �syntax file�. In your
syntax files, configuration parameters are specified,
one per line, in a �configuration section�.

AnaGram ignores case when identifying a configuration
parameter, so that "ALLOW MACROS", "Allow Macros", and
"allow macros" are all equivalent forms.

There may be any number of configuration sections in a
�syntax file�. Any parameter may be specified any
number of times. Since AnaGram maintains only one value
in storage for these parameters, whenever it refers to
one it will see the most recently specified value.
Every configuration parameter has a default value which
has been chosen to correspond to a standard if it
exists, customary usage if such can be determined, or
otherwise to the most likely usage.

Before executing an Analyze Grammar or Build Parser command, AnaGram
resets configuration parameters to their initial values, as
determined by the built in defaults and the configuration files read
at program initialization.

The �Configuration Parameters Window� shows the current settings of all
of the configuration parameters. When this window is active you may
press �F1� or click with the �help cursor� to pop up a help window
describing the parameter under the cursor bar.

There are several varieties of configuration
parameters. Some simply set or reset a condition. These
need simply be stated to set the condition or negated
with the tilde (~) to reset the condition. Thus
	[ nest comments ]
causes AnaGram to allow nested comments, and
	[ ~nest comments ]
causes AnaGram to disallow nested comments.

If you prefer you may explicitly specify a switch value as on or off:
	[ nest comments = on]

 A second kind
of configuration parameter takes a value
which is the name of a token. Thus
	[ grammar token = c grammar]
specifies that the token, c grammar, is the �grammar
token� which is to be analyzed.

A third variety of configuration parameter takes a
value which is a C data type. Thus
	[ default token type = unsigned char *]
signifies that the �semantic value� of a token, unless
otherwise specified is a pointer to an unsigned char.

A fourth variety of configuration parameter takes a
string value to set some ascii string used by AnaGram.
Thus
	[ header file name = "widget.h" ]
signifies that the header file created by AnaGram
should be called "widget.h".

In string-valued parameters used to specify the names
of output files or the name of your parser, you may use
the '#' character to indicate the name of your syntax
file: When the string is actually used, AnaGram will
substitute the syntax file name for the '#'.

In string-valued parameters used to specify the names
of functions or variables that AnaGram generates, you
may use '$' to specify the name of your parser. When
the string is actually used, AnaGram will substitute
the name of your parser for the '$'.

In the "�enum constant name�" configuration parameter
you may use '%' to specify where a token name is to be
substituted.

The final variety of configuration parameter takes a
numeric value. The value may be decimal, octal
or hexadecimal, following the C conventions, and may
have an optional sign. Thus
	[parser stack size = 50]
tells AnaGram to allocate space for at least fifty stack entries
when it creates your parser.
##

Configuration Parameters Window

The Configuration Parameters window lists the
�configuration parameter�s AnaGram accepts with their
current values, as set by the �configuration files� it
has read and by the most recent �syntax file� it has
analyzed. Configuration parameters cannot be changed
from within AnaGram.
##

Configuration Section

A configuration section is one of the main divisions of
your �syntax file�. It begins with a left square
bracket on a fresh line. It then contains definitions
of �configuration parameter�s, �configuration switch�
settings and �attribute statement�s. These
specifications must each start on a new line. The
configuration section is closed with a right bracket.
Any further component of your syntax file, other than a
�comment�, must start on a fresh line.

There can be any number of configuration sections in a
syntax file.
##

Configuration Switch

A configuration switch is a �configuration parameter�
which can take on only the two values true and false,
or on and off. You set a configuration switch, or turn
it on, by simply naming it in your �configuration file�
or in a �configuration section� of your �syntax file�.
You turn it off, or "reset" it, by use of the tilde:
"~nest comments", for example, resets, or turns off,
the �nest comments� switch. If you prefer, you may
assign the value "on" to set the switch, or "off" to
reset it. For example:
	nest comments = on
##

Conflict

"Conflicts" arise during the �grammar analysis� when
AnaGram cannot determine how to treat a given input
token. There are two sorts of conflicts: �shift-reduce
conflicts� and �reduce-reduce conflicts�. Conflicts may
arise either because the grammar is inherently
ambiguous, or simply because the grammar analyzer
cannot look far enough ahead to resolve the conflict.
In the latter case, it is often possible to rewrite the
grammar in such a way as to eliminate the conflict. In
particular, �null productions� are a common source of
conflicts.

When AnaGram analyzes your grammar, it lists all
unresolved conflicts in the �Conflicts� window. A number
of �Auxiliary Windows� available from the Conflicts window
provide help in identifying the source of the conflict.

There are a number of ways to deal with conflicts. If
you understand the conflict well, you may simply choose
to ignore it. When AnaGram encounters a shift-reduce
conflict while building parse tables it resolves it by
choosing the �shift action�. When AnaGram encounters a
reduce-reduce conflict while building parse tables, it
resolves it by selecting the �grammar rule� which
occurred first in the grammar.

A second way to deal with conflicts is to set �operator
precedence� parameters. If you set these parameters,
AnaGram will use them preferentially to resolve
conflicts. Any conflicts so resolved will be listed in
the �Resolved Conflicts� window.

A third way to resolve a conflict is to declare some
tokens as �sticky�. This is particularly useful for
�production�s whose sole purpose is to skip over
uninteresting input.

A fourth way to resolve conflicts is to declare a token
to be a �subgrammar�. When you do this, AnaGram does
not look beyond the definition of the subgrammar token
itself for reducing tokens. This is not a particularly
selective way to resolve conflicts and should be used
only when the subgrammar token is naturally defined
only by internal criteria. The tokens identified by
lexical scanners are prime examples of this genre.

The fifth way to deal with conflicts is to rewrite the
grammar to eliminate them. Many people prefer this
approach since it yields the highest level of
confidence in the resulting program.

Please refer to the AnaGram User's Guide for more information about
dealing with conflicts.
##

Conflicts

If there are �conflict�s in your grammar which are not
resolved by �precedence rules�, they will be listed in
the Conflicts window. The Conflicts window will also be
listed in the �Browse Menu�. Conflicts which have been
resolved by �precedence rules� are listed in the
�Resolved Conflicts� window.

The Conflicts window lists the conflicts, or
ambiguities, which AnaGram found in your grammar. The
table identifies the �parser states� in which it found
conflicts, the �conflict token�s for which it had more
than one option, and the �marked rules� for each such
option. If one of the rules for a particular conflict
has a �marked token�, the conflict is
a �shift-reduce conflict�. The marked token is the token
to be shifted. If none of the rules has a marked token the conflict is
a �reduce-reduce conflict�.

AnaGram provides a number of �Auxiliary Windows� to help
you find and fix the source of the conflict. The
�Conflict Trace� window is a pre-built �Grammar Trace�
window which shows you one of perhaps many ways to
encounter the conflict. The �Reduction Trace� window
shows the result of reducing a particular ambiguous
rule.

In addition, the �Rule Derivation� and �Token
Derivation� windows show you why the conflict token is a
�reducing token�. They are particularly useful for
shift-reduce conflicts.

The �Expansion Chain� window is helpful for understanding
reduce-reduce conflicts.

Other Auxiliary Windows which are often useful are the
�State Definition� window, the �Reduction States�
window, and the �Problem States� window.

Please refer to the AnaGram User's Guide for more information on how to
deal with conflicts.
##

Conflicts Resolved by Precedence Rules

This �warning� message indicates that AnaGram has
resolved conflicts in your grammar by using �precedence
rules�: guidelines you supplied either by explicit
�precedence declarations�, by using a �sticky�
statement or �distinguish lexemes� statement, or
implicitly by using a �disregard� statement. These
conflicts are listed in the �Resolved Conflicts�
window, and are not listed in the �Conflicts� window.
##

Conflict Token

In any given �conflict�, there is a �token� for which
an unambiguous �parser action� cannot be determined.
This token is called the "conflict token".
##

Conflict Trace

The Conflict Trace is a ready-made �Grammar Trace�
which shows you one of perhaps many ways to get to the
state which has the �conflict� selected by the cursor
bar. The Conflict Trace window is an option in the
�Auxiliary Windows� menu for the �Conflicts� window and
the �Resolved Conflicts� window.
##

Const Data

The const data �configuration switch� controls the use
of CONST qualifiers in generated code. If the switch is
set, all fixed data arrays in the �parser file� will be
qualified as CONST, unless the �old style� switch is
set. The default setting is ON. Other configuration
switches which control declaration qualifiers in the
parser file are �near functions� and �far tables�.
##

CONTEXT

"CONTEXT" is a macro which AnaGram defines for you if
you have defined a �context type�. It provides access
to the top value of the �context stack�. Your
�GET_CONTEXT� macro may store the current context by
assigning a value to CONTEXT. Suppose your parser uses
�pointer input�, and you wish to know the value of the
�pointer� for every production. You could define
GET_CONTEXT thus:
	#define GET_CONTEXT CONTEXT = PCB.pointer

 In �reduction procedure�s, you may use the CONTEXT
macro to find the context for the rule you are
reducing, that is to say, the value the context
variables had when the first token in the rule was
encountered.
##

Context Stack

It is often convenient, when writing �reduction
procedure�s, to know the actual context of the �grammar
rule� your procedure is reducing. To do this you need
to know the values that certain variables, such as
stack pointers, or input pointers, in your program had
at various stages as your parser matched the rule. You
can accomplish this by maintaining a context stack.

If you wish, AnaGram will keep track, on a stack, of any
context variables you wish. To do so, define a structure
which can hold all the values you need to stack. Use the
�context type� �configuration parameter� to tell AnaGram
how to declare the stack. Then define the �GET_CONTEXT�
macro to gather the appropriate values and store them on
the stack. The �CONTEXT� macro evaluates to the proper
location into which the GET_CONTEXT macro should store
the context value. AnaGram will invoke the GET_CONTEXT
macro whenever necessary to make sure the right values
are stacked. In a reduction procedure, you can then use
the macro �RULE_CONTEXT� to find the value of the
context structure as of the beginning of each token in
the rule you are reducing.

If your parser is �event driven�, store the context of
the input token in PCB.input_context. The default
version of GET_CONTEXT will stack the context as
appropriate.

If your parser should encounter an error, you may use
�ERROR_CONTEXT� to determine the values of the context
variables at the beginning of the aborted grammar rule.
##

context type

"Context type" is a �configuration parameter� whose
value is a C type name, possibly as defined by a
typedef statement. By default, "context type" is
undefined. If you define it, AnaGram will set up a
�context stack� in your �parser control block� so you
can track the context of �production�s.

Each time your parser pushes values onto the state
stack and value stack it will invoke the �GET_CONTEXT�
macro to store the current context on the context
stack. The macro �CONTEXT� names the current stack
location. In your GET_CONTEXT macro you can use it as
the destination for the current context. In a
�reduction procedure�, CONTEXT names the context as of
the beginning of the production. Two other macros are
available to inspect the values of the context stack.
In a reduction procedure, you may use �RULE_CONTEXT�[k]
to determine the value of the context variable as it
was as of the (k+1)th token in the rule. In particular,
RULE_CONTEXT[0] is the value the context variable had
when the first token in the rule was seen.

If you enable the �error frame� �configuration switch�,
you may use �ERROR_CONTEXT� to determine the context of
the production your parser was trying to identify at
the time of the error.
##

CONVERT_CASE

CONVERT_CASE is a user definable macro which AnaGram
invokes to convert the case of input characters when
the �case sensitive� switch has been turned off. If
you do not define the macro yourself, AnaGram will
provide a macro which will convert case correctly
for characters in the ASCII character range and
also for �ISO latin 1� characters if the corresponding
�configuration switch� is on.

##

Coverage File Name

If you have set the �rule coverage� �configuration
switch� to include coverage analysis in your parser,
AnaGram uses the value of the coverage file name
�configuration parameter� to find the results of your
testing. The value of the parameter is a string. The
default value is "#.nrc", where '#' represents the name
of your syntax file.
##

cs

cs is a field in a �parser control block� which
contains your �context stack�. cs will be defined only
if you have defined the �configuration parameter�
�context type�.
##

Current Grammar

The Current Grammar is the �grammar� you presently have
loaded. Its name is displayed on the title bar of
each AnaGram window.

A status field at the right center of the �Control Panel�
indicates the state of processing that has been
carried out on the grammar.

"Loaded" means that the �syntax file� has been read
into memory, but that syntax errors have been found.

"Parsed" means that AnaGram has tried to analyze the
grammar, but got into some kind of difficulty and did
not complete the job. The explanation should be
apparent from the messages in the �Warnings� window.

"Analyzed" means that a �grammar analysis� has been
completed, but no �output files� have been written.

"Built" means that an analysis has been completed and
output files have been written.
##

Data Type

The �tokens� in your �parser� usually have �semantic
values�. The data types for these values will be
determined by the �default input type� and �default
token type� �configuration parameter�s unless you
explicitly provide �token declarations� in your grammar.
You may also define the data type for any �nonterminal�
token by preceding the token name with an ordinary C
cast when you write a production. For example:

	(int) integer
		-> '0-9':d						=d-'0';
		-> integer:n, '0-9':d	=10*n + d - '0';

The data type may be any simple C or C++ data type, with
arbitrary indirection and qualification. You may also
use any type you have defined by means of typedef,
struct or class definitions. Template classes may also
be used. If you specify a type of your own definition,
you must provide a definition in the �C prologue� at the
beginning of your �syntax file�.

A token may have the type "void" if its value has no
interest for the parser. Since your parser will not
stack a value for a void token, your parser may run
somewhat faster when tokens are declared as void.
##

Declare pcb

"Declare pcb" is a �configuration switch� that defaults
to on. If this switch is set when you invoke the �Build
Parser� command, AnaGram will automatically declare a
�parser control block� for you, at the beginning of
your parser file. If you have used data types that you
define yourself, the typedef statements need to precede
the parser control block declaration. In this case, you
should turn "declare pcb" off and declare it yourself.

For more information, see the AnaGram User's Guide.
##

Default Input Type

The default input type is a �configuration parameter�
which determines the �data type� for the �semantic
value�s of �terminal tokens� if they are not explicitly
declared. Normally, you would explicitly declare
terminal tokens only when you have set the �input
values� �configuration switch�. If you do not set the
default input type, it will default to "int".

The default data type for the values of �nonterminal
tokens� is given by the �default token type�
configuration parameter.
##

Default Reduction

"Default reductions" is a �configuration switch� which
defaults to on.

A "default reduction" is a �parser action� which may be
used in your parser in any state which has precisely
one �completed rule�.

If a given �parser state� has, among its �characteristic
rules�, exactly one completed rule, it is usually faster
to reduce it on any input than to check specifically for
correct input before reducing it. The only time this
default reduction causes trouble is in the event of a
�syntax error�. In this situation you may get an
erroneous reduction. Normally when you are parsing a
file, this is inconsequential because you are not going
to continue semantic action in the presence of error.
But, if you are using your parser to handle real-time
interactive input, you have to be able to continue
semantic processing after notifying your user that he
has entered erroneous input. In this case you would want
default reductions to have been turned off so that
�production�s are reduced only when there is correct
input.
##

Default reduction value

If a �grammar rule� does not have a �reduction procedure�
the �semantic value� of the first token in the rule will
be taken as the semantic value of the token on the left
hand side. If these tokens do not have the same �data type�
a �warning� will be given.
##

Default Token Type

"Default token type" is a �configuration parameter�
which determines the �data type� for the �semantic
value� of a �nonterminal token� if no other type is
explicitly specified. It defaults to void. Therefore, if
any �reduction procedure� returns a value, you must
either explicitly set the type of the �reduction token�
or you must set default token type to an appropriate
value.

The default token type cannot have a �wrapper� class
defined.

The default data type for the value of a �terminal
token� is given by the �default input type�
configuration parameter.
##

Definition, Definition Statement

AnaGram syntax files may contain definition statements
which assign new names to �character sets�, �virtual
productions�, �keyword strings�, �immediate actions�,
or �tokens�. Definitions have the form
	name = <character set>
	name = <virtual production>
	name = <keyword string>
	name = <immediate action>
	name = <token name>

For example,
	letter = 'a-z' + 'A-Z'
	statement list = statement?...
	include = "include"

The symbols thus defined may be used anywhere the
expression on the right hand side might be used. Such
definitions, in and of themselves, do not define tokens.
Tokens are defined only by their usage in productions.

##

DELETE_WRAPPERS

If your parser uses �wrapper�s and exits with an error condition, there
may be objects remaining on the �parser value stack�. The DELETE_WRAPPERS macro
can be used to delete any remaining objects on the stack.
If you have enabled
�auto resynch�, DELETE_WRAPPERS will be invoked automatically.
##

Diagnose Errors

"Diagnose errors" is a �configuration switch� which
defaults to on. When this switch is on, AnaGram includes a
function, ag_diagnose(), in your parser which provides simple
syntax error disgnoses. When your parser encounters a
syntax error, this function will be called immediately prior
to the invocation of the �SYNTAX_ERROR� macro. A pointer to the message will be
stored in the �error_message� field of the �parser control block�.

If you wish to implement your own �error diagnosis�, you
should turn this switch off, and include a call to your
own diagnostic procedure in your SYNTAX_ERROR macro.

ag_diagnose() provides three possible error messages,
governed by three macros: �MISSING_FORMAT�, �UNEXPECTED_FORMAT�, and
�UNNAMED_TOKEN�. You may override the definitions of
these macros with your own definitions if you wish
to provide diagnostics in another language

If you have set the �error frame�
switch it will also set the �error_frame_token� field.
The "error_frame_token" is the non-terminal token which
the parser was trying to complete when the error was
encountered.

When the "diagnose errors" switch is set, AnaGram also
includes the a �token names� table in the parser which
contains the ascii names of the tokens in the grammar,
including entries for character constants and keywords.

Use the �token names only� switch to limit the table
to explicitly named tokens only.
##

MISSING_FORMAT

MISSING_FORMAT is a macro that is used by the error
diagnositic function created by the �diagnose errors�
switch. If you do not define it in your parser,
AnaGram will define it thus:
	#define MISSING_FORMAT "Missing %s"

 This format is used when the diagnostic function can
identify a unique terminal or nonterminal token that
would satisfy the syntactic rules and is named
in the �token names� table.
##

UNEXPECTED_FORMAT

UNEXPECTED_FORMAT is a macro that is used by the error
diagnositic function created by the �diagnose errors�
switch. If you do not define it in your parser,
AnaGram will define it thus:
	#define UNEXPECTED_FORMAT "Unexpected %s"

 This format is used when the diagnostic function cannot
identify a named, unique terminal or nonterminal token that
would satisfy the syntactic rules and finds an
incorrect token, the name of which can be found
in the �token names� table.
##

UNNAMED_TOKEN

UNNAMED_TOKEN is a macro that is used by the error
diagnositic function created by the �diagnose errors�
switch. If you do not define it in your parser,
AnaGram will define it thus:
	#define UNNAMED_TOKEN "input"

 This macro is used as argument for the �UNEXPECTED_FORMAT�
macro when the actual, erroneous input cannot be identified.
##

Difference

In set theory, the difference of two sets, A and B, is
defined to be the set of all elements of A that are not
elements of B. In an AnaGram �syntax file�, you
represent the difference of two �character sets� by
using the '-' operator. Thus the difference of A and B
is A - B. The difference operator is �left
associative�.
##

Disregard

The purpose of the "disregard" statement is to skip over
uninteresting �white space� and comments in your input
file. It allows you to specify a token that should be
passed over in the input to your parser. The statement
takes the form:
	disregard ws
where "ws" is a token name or character set. Disregard
statements, like other �attribute statement�s, may be
placed in any �configuration section�.

You may have more than one disregard statement in your
�grammar�. If you do, AnaGram will create a shell
production. For example, suppose you write:
	[ disregard alpha
	   disregard beta ]
AnaGram will proceed as though you had written:
	gamma -> alpha | beta
	[ disregard gamma ]

 It frequently happens that you wish your �parser� to
disregard blanks or comments, except that �white space�
within names, numbers, strings, and other elementary
constructs is subject to special rules and thus should
not be disregarded blindly. In this case, you can use
the "�lexeme�" statement to declare these constructs off
limits for the disregard statement. Within these
constructs, the disregard statement will be inoperative
and the admissibility of white space is determined
solely by the productions which define these constructs.

Outside those productions which define lexemes, you
should not generally use a token which is supposed to be
disregarded. If you do, your grammar will have
�conflict�s, since the token could satisfy both the
explicit usage, as well as the implicit rules set up by
the disregard statement. Such conflicts, however, are
resolved automatically in favor of your explicit use of
the token. The conflicts will appear in the �Resolved
Conflicts� window.

If you have "open ended" lexemes in your grammar such
as variable names or numeric constants, your grammar
will detect a conflict if one of these lexemes may
follow another such lexeme immediately. To deal with
these conflicts, you should turn on the "�Distinguish
Lexemes�" configuration switch. It will cause white
space to be required as a separator between the
lexemes.

In order to implement the "disregard" statement AnaGram
will redefine some tokens in your grammar. For example,
'+' may be redefined to consist of a simple plus sign
followed by optional white space:
	'+' -> '+'%, white space?...
The �percent sign� is used to indicate the original,
simple plus without the optional white space attached.
You will probably notice the percent sign appearing in
some windows and traces.
##

distinguish keywords

"distinguish keywords" is an �attribute statement�
which you may include in a �configuration section�. It
is used to tell AnaGram how to distinguish �keyword�s
from similar sequences of characters in your input
stream. For example, you may want your parser to
recognize "int" as a keyword when it appears in the
following context:
	int x;
but not when in appears in the middle of such words as
"integral" and "intolerant". The operand of
"distinguish keywords" is a list of character set
�expression�s separated by commas and enclosed in braces
({ }).

Once AnaGram has read your entire syntax file, it
evaluates all of these character sets and tests each
keyword string against the character sets in the order
in which they were encountered in the program. If all
the characters which constitute a particular keyword
are members of the specified set, the keyword logic is
set up so that it will recognize the keyword only if
the immediately following character is not in the set.

In the example above,
	[distinguish keywords {'a-z'} ]
will do the trick.

The "�sticky�" statement also affects the recognition
of keywords.
##

Distinguish Lexemes

The "distinguish lexemes" �configuration switch� is
used in conjunction with the "�disregard�" statement
and the "�lexeme�" statement to resolve the
�shift-reduce conflict�s which often crop up when
suppressing white space.

The difficulty with suppressing white space is that you
wish it to be optional in cases like "x+y", where it is
not necessary in order to parse correctly, but you want
to require it in situations such as "mytype x", where
it is necessary to separate otherwise indistinguishable
constructs. If the white space were optional, it would
be necessary to allow for "mytypex", but it would be
impossible to determine if this were to be interpreted as
"mytype x", "mytyp ex", or any of the many other
possibilities.

The distinguish lexemes switch causes AnaGram to make
the white space optional where doing so causes no
ambiguity and makes it mandatory where to make it
optional would lead to ambiguity. In the example given
above, "mytypex" would be treated as a single name, and
another name would have to follow separating white
space.

The default value for distinguish lexemes is OFF. It is
anticipated that this will be changed to ON in future
releases of AnaGram.
##

Duplicate Production

This �warning� message appears when a �production�
appears twice in your �grammar�. You will have a
number of �reduce-reduce conflict�s as a consequence.
Eliminate the duplicate, and the conflicts it caused
will go away.
##

Edit Command

"Edit command" is a �configuration parameter� which
accepts a string value. It is no longer used and is
retained only for file compatiblity with the DOS
version of AnaGram.
##

Embedded C

You may encapsulate pieces of C or C++ code in your �syntax
file� more or less arbitrarily. Such pieces of code will
simply be copied to the �parser file� in the order in
which they are encountered. Each such piece of code must
be enclosed with braces({}). The left brace must be on a
new line, and nothing except comments may follow the
right brace. AnaGram does not inspect the interior of
such a piece of C code except to identify character
constants, strings, comments and blocks surrounded with
braces so that it does not identify the end of the
embedded C prematurely. Note that AnaGram will use the
status of the �nest comments� �configuration switch� in
effect at the beginning of the embedded C.

AnaGram, of course, can be confused by unterminated
strings, unbalanced brackets, and unterminated comments.
The most likely outcome, in such a situation, is that
AnaGram will encounter an end of file looking for the
end of the embedded C. Should this happen, AnaGram will
identify the beginning of the piece of embedded C which
caused the problem.

If your syntax file begins with a block of embedded C,
called the "�C prologue�", it will be copied to the very
beginning of the parser file, preceding all of AnaGram's
output. You may use such an initial block of embedded C
to guarantee that program title comments, copyright
notices and important definitions are at the very
beginning of your parser file.

The code you include as embedded C, of course, has to
coexist with the code AnaGram generates. In order to
keep the potential for name conflicts to a minimum, all
variables and functions which AnaGram defines begin with
the letters "ag_". You should avoid variable names which
begin with these letters.

If AnaGram finds no embedded C in a syntax file, and you
ask it to build a parser, it will automatically generate
a main program that calls your parser. If you don't want
it to do this, you may turn off the �main program�
�configuration switch�.
##

Empty Keyword String

This �warning� appears when you have a keyword string
that contains no characters whatsoever. �Keyword
strings� must contain at least one character. If you
wish a null match, use a �null production� instead.
##

Enable Mouse

"Enable mouse" is a �configuration switch� that defaults
to on. It is not used in the Windows version of AnaGram
and has been retained only for file compatibility with
the DOS version.
##

Enum Constant Name

The "enum constant name" �configuration parameter�
allows you to select the name AnaGram will use for the
set of enumeration constants it defines in the �parser
header� file for your �parser�. The value of "enum
constant name" should be a string containing the '%'
character. AnaGram will substitute each token name in
turn into this template as it creates the list of
enumeration constants. If it finds a '$' character it
will substitute the name of your parser. The default
value of "enum constant name" is "$_%_token".
##

Enumeration Constants

In your �parser header� file, AnaGram includes a typedef
enum statement which provides enumeration constants
corresponding to all the named constants in your
grammar. The names of the enumeration constants
themselves are defined by the �enum constant name�
�configuration parameter�. These constants are useful
when dealing with �semantically determined productions�.
##

Enum

Within a �configuration section�, you may use an "enum"
statement to define numeric values for any number of
tokens just as you define enumeration constants in C.
The syntax is effectively the same as the enum statement
in C:

  [
    enum {
      first = 60,
      second,
      third,
      fourth = 'a',
      fifth,
    }
  ]

is exactly equivalent to
  first = 60
  second = 61
  third = 62
  fourth = 'a'
  fifth = 'b'
##

eof

"eof" is a quasi reserved word in AnaGram, used to
specify an end of file token. You may use another token
as an end of file delimiter by setting the �Eof Token�
�configuration parameter�. eof is not required unless
you use �automatic resynchronization� in your �parser�.

If you have not defined eof or specified an Eof Token
parameter, �File Trace� may show a syntax error when it
encounters the end of a test file.

There are various ascii values that are commonly used
to represent an end of file. The end of a string in
memory is commonly 0, DOS uses ^Z, Unix uses ^D, and
Unix style stream I/O uses -1. It is often convenient
then to define

	eof = -1 + 0 + ^D + ^Z
##

Eof Token

"Eof token" is a �configuration parameter� which accepts
a token name as a value. There is no default value.
AnaGram does not need a specification for the eof token
unless you are using its �automatic resynchronization�
facility.

If you use the �automatic resynchronization� capability
of AnaGram, you must specify explicitly an end of file
token. You can do this either by defining a �terminal
token� in your �grammar� called eof or by using the "eof
token" parameter to identify some other terminal token
to be used as the end of file marker. You would do this
only if you must use the name "�eof�" for some other
purpose.

Note that "eof" is case sensitive. Neither Eof nor
EOF will qualify as end of file tokens unless you
explicitly specify them using the eof token parameter.
##

Eof Token Not Defined

This �warning� appears if you have requested either
�error token resynchronization� or �automatic
resynchronization� and you have not defined an �eof
token�. The resynchronization procedure will not work
correctly at end of file.
##

Error Action

The error action is one of the four �parser action�s of a
traditional �parsing engine�. The error action is
performed when the parser has encountered an input
token which is not admissible in the current state.
The further behavior of a traditional parser is
undefined.
##

Error Defining

"Error defining TXXX: <token representation>" is a
�warning� message which appears if errors are encountered
while attempting to evaluate the �character set� for
the specified �token�. This warning is always generated
in addition to more detailed warnings that are made
when the actual errors are encountered.
##

Error frame

"Error frame" is a �configuration switch� which defaults
to off. You use this switch to specify the �error
diagnosis� capabilities of your parser. If this switch
is set and the �diagnose errors� switch is set, i.e.,
on, your parser will include a function which will
determine the "context" of any �syntax error�, that is,
the token the parser was trying to complete.

To determine the context of an error, your parser will
scan backwards through the �parser state stack�,
examining �characteristic rules� until it finds a state
which can accept a unique �nonterminal� reduction token
that you have not marked as �hidden�. It will then set
PCB.�error_frame_ssx� to the �parser stack index� for
that level.
##

ERROR_CONTEXT

ERROR_CONTEXT is a macro AnaGram defines for you. If
your parser encounters a �syntax error�, you have
enabled the �error frame� �configuration switch�, and
you have defined a �context type�, ERROR_CONTEXT will
enable you to access the �context� as of when the parser
encountered the beginning of the �error_frame_token�.
##

Error Diagnosis

"Error diagnosis" and �error recovery� are the two
aspects of �error handling�. If in the �embedded C�
portion of your syntax file you define a macro called
�SYNTAX_ERROR�, it will be invoked by the parser when a
�syntax error� is encountered. If you have set the
�diagnose errors� �configuration switch�, the
�error_message� field of the �parser control block� will
contain a pointer to a string containing a diagnostic
message. The diagnostic is of the form "Missing <token
name>" or "Unexpected <token name>".

If you do not define SYNTAX_ERROR it will be
automatically defined so that a message will be written
to stderr.

If the �lines and columns� switch has been set you will
have the current line number and column number available
for your diagnostic message.

If you have set the �error frame� switch as well as the
diagnose errors switch, the variable
PCB.�error_frame_token� will identify the �nonterminal
token� the parser was trying to recognize when the
error was encountered.

Of course, if your parser is controlling direct keyboard
input, a diagnosis might be unnecessary. In this case
you might define SYNTAX_ERROR so that it simply beeps at
the user and let it go at that.
##

Error Handling

Rarely is a parser built to read an arbitrary input
file. The normal situation is that the parser is built
to read files that conform to the rules specified in a
grammar, rules that describe a class of input files
rather than all possible input files. If the input file
does not conform to the grammar, the parser will detect
a �syntax error�.

There are two aspects to error handling in your parser:
�error diagnosis� and �error recovery�. Error diagnosis
consists in informing your user that something
unexpected has happened. Error recovery consists in
either aborting the parse, or getting it started again
in some reasonable manner. AnaGram provides several
options for both error diagnosis and error recovery.

When a syntax error is encountered, first your error
diagnosis option is executed and then your error
recovery option is executed.
##

error_message

error_message is a field in a �parser control block� to
which your �error handling� procedures may refer. If you
have set the �diagnose errors� �configuration switch�,
on encountering a �syntax error� your �parser� will
create a string containing an appropriate diagnostic
message and store a pointer to it into
PCB.error_message.
##

Error Trace

"Error Trace" is both a �configuration switch� and the
name of an option in the �Action Menu�. If the switch
is on, AnaGram adds code to your parser to capture
state information to a file in case of a �syntax error�. The Error
Trace option can then read this information and prepare a pre-built
�Grammar Trace� showing you the state of the parser at the time of
the error.

The name of the file is determined by the macro
�AG_TRACE_FILE_NAME�. AnaGram will provide a default
definition for the macro consisting of the name of
your �syntax file� plus the extension ".etr". You
may override this definition by defining AG_TRACE_FILE_NAME
in your �embedded C�.

If error trace is enabled, AnaGram will also enable the
Error Trace option on the �Action Menu�. If you select
Error Trace AnaGram will initialize a �Grammar Trace�
window from the error trace file you select. The parser
stack of the trace will be as it was when the error
occurred. The last line of the parser stack pane will
show the �lookahead token� that caused the syntax error. You may
then use the Grammar Trace to explore the nature of
the syntax error your parser encountered.

AnaGram will
warn you if the error trace file is older than
the syntax file, since under those conditions, the
error trace file might be invalid.
##

AG_TRACE_FILE_NAME

AG_TRACE_FILE_NAME is a C macro used to determine the
name of the file your parser will write when it
encounters a �syntax error� if you have enabled
the �error trace� �configuration switch�.

You may define AG_TRACE_FILE_NAME in your �embedded C�.
AnaGram provides a default definition given by the
name of your �syntax file� with the extension ".etr".
##

Error Recovery

Error recovery is the process of continuing after a
�syntax error�. AnaGram offers several options. These
are controlled by �configuration parameter�s and by
your grammar.

If you do not specify any error recovery, your parser
will simply return to the calling program when it
encounters a syntax error. �PCB�.�exit_flag� will be set
to two, to indicate termination on syntax error.

If you wish your parser to simply ignore the erroneous
token and continue, set PCB.exit_flag to zero in your
�SYNTAX_ERROR� macro. You might use this option if your
parser is dealing directly with keyboard input.

You may wish to use YACC type error handling. To do
this, simply incorporate a token called "error" in your
grammar, or specify some other token as an �error
token�. On syntax error, your parser will back up to
the most recent state where "error" was acceptable
input, treat the bad input as an instance of error, and
then skip all input until it finds an acceptable input
token. At that point it will proceed as though nothing
had happened.

AnaGram also provides an �automatic resynchronization�
option, which uses a complex heuristic to compare input
tokens against all stacked states in order to find the
best state from which to continue.
##

Error Token Resynchronization

One of your options for �error recovery� after a �syntax
error� is a technique similar to that provided in YACC.
You include a terminal token called "error" in your
grammar. (Or, use the �error token� configuration
parameter to specify some other token to serve this
purpose.) When the parser encounters an error in the
input, after invoking the �SYNTAX_ERROR� macro, it backs
up the �parser state stack� to the most recent state in
which "error" was an acceptable input. It then shifts to
the new state as though it had seen an actual "error"
token. At this point, it skips over any character in the
input which is not an acceptable input character for
this state. Once it does find an acceptable input
character, it continues processing as though nothing had
happened.
##

error_frame_ssx

error_frame_ssx is a field in a �parser control block�
to which your �error handling� routines may refer. When
your �SYNTAX_ERROR� macro is called, if you have set
both the �diagnose errors� and �error frame�
configuration switches, error_frame_ssx will contain the
value of the �parser stack index� at the beginning of
the �error_frame_token�. For example, if in a syntax
file, you fail to close a comment, AnaGram will
encounter an illegal end of file in the comment. In this
situation, error_frame_token is the token for a comment,
and error_frame_ssx gives the parser stack depth at the
beginning of the comment.
##

error_frame_token

error_frame_token is a field in a �parser control block�
to which your �error handling� routines may refer. If
you have set both the �diagnose errors� and �error
frame� �configuration switch�es, when your
�SYNTAX_ERROR� macro is called, it will contain the
�token number� of the error_frame_token.
##

error, Error Token

"Error token" is a �configuration parameter� that takes
a token name for a value. It has no default value. If
you do not specify it, and your grammar has a terminal
token called "error", it will be used as the error
token. If you have an error token defined your parser
will presume that you wish to use the �error token
resynchronization� method of �error recovery�.
##

Escape Backslashes

"�Escape backslashes�" is a �configuration switch� that
defaults to off. When turned on, the �line numbers� switch
will write pathnames with doubled backslashes. The switch
is no longer necessary, since AnaGram now uses forward slashes
in the pathnames in #line directives rather than backslashes.switch.
##

Event Driven

It is often convenient to configure your parser to be
"event driven". In this situation, instead of calling
your parser once to process the entire input, you call
an �initializer� to initialize the parser, and then you
call the parser once for each input token. Each time you
call it, the parser processes the single input token
until it can do no more.

You can interrogate the �exit_flag� field of the
�parser control block� to determine whether the parse is
complete or whether the parser encountered an error.

Event driven parsers are especially convenient for
dealing with terminal input or communications protocols.
##

Event Driven Parser Cannot Use Pointer Input

This �warning� message appears if you specify pointer
input for your �parser� and also specify that it should
be event driven. If you are going to use �pointer
input�, you should not specify your �parser� as event
driven.  Conversely, if you really want an �event
driven� parser, you cannot specify pointer input.
##

Excessive Recursion

This �warning� message appears if an internal stack in
AnaGram overflows because of the complexity of an
expression in your �grammar�. Simplify your grammar by
using �definition� statements to name subexpressions.
##

exit_flag

exit_flag is a field in the �parser control block�.
When your parser returns, PCB.exit_flag contains an exit
code describing the outcome of the parse.  Mnemonic
values for the exit codes are defined in the parser
header file AnaGram generates. These mnemonics, their
values and their meanings are:
	AG_RUNNING_CODE    				= 0:	Parse is not yet complete
	AG_SUCCESS_CODE        				= 1:  	Parse terminated successfully
	AG_SYNTAX_ERROR_CODE   		= 2:	Syntax error encountered
	AG_REDUCTION_ERROR_CODE 	= 3:	Bad reduction token encountered
	AG_STACK_ERROR_CODE   			= 4: 	Parser stack overflowed
	AG_SEMANTIC_ERROR_CODE 		= 5: 	Semantic error, user defined

 An AnaGram parser checks exit_flag on return
from every �reduction procedure�. AnaGram will exit with
the flag unchanged if it is non-zero. To halt a parse
from a reduction procedure, then, you need only set the
exit_flag to AG_SEMANTIC_ERROR_CODE, or any other unused value
greater than zero that suits your needs.
##

Expansion, Expansion Rule

In analyzing a �grammar�, we are often interested in the
full range of input that can be expected at a certain
point. The expansion of a �token� or state shows us
all the expected input. An expansion yields a set of
�marked rule�s. The �marked token� in each rule
shows us what input to expect.

The set of expansion rules of a (�nonterminal�) token
shows all the expected input that can occur whenever the
token appears in the grammar. The set consists of all
the �grammar rule�s produced by the token, plus all the
rules produced by the first token of any rule in the
set. A �marked token� for an expansion rule of a token
is the first element in the rule.

The expansion of a state consists of its �characteristic
rule�s plus the expansion rules of the marked token in each
characteristic rule.
##

Expansion Chain

You may select an Expansion Chain window from the
�Auxiliary Windows� popup menu of most windows that contain
�expansion rule�s.

The Expansion Chain window is extremely useful for
indicating why a particular �grammar rule� is an
�expansion rule� in a particular state. To see a chain
of productions that produces a desired expansion rule,
select the expansion rule with the cursor bar, press
the right mouse button for the Auxiliary Windows menu, and select
Expansion Chain.

The Expansion Chain window will then present a sequence
of expansion rules, using the same format as the
Expansion Rules window, but subject to the constraint
that each rule is produced by the �marked token� in the previous line.

The first rule in the window is a �characteristic rule�
for the given state.  The last rule in the window is
the rule selected by the cursor bar in the window from
which you chose the Expansion Chain. It should be noted
that this expansion is not unique. There may be other
derivations.
##

Expansion Rules

You may select an Expansion Rules window from the
�Auxiliary Windows� popup menu of most windows which display
�marked rules�. The Expansion Rules window shows the
complete set of �expansion rule�s for the �marked
token� in the highlighted rule.

In other windows, including all trace windows, the
Expansion Rules window shows the expansion of the token
on the highlighted line.
##

F1

Use the F1 key to bring up a context sensitive help window. Because of
various peculiarities of the Windows API, there are a few contexts
where the F1 key does not work; however, generally the �help cursor�
works where F1 does not and vice versa.

�Help� windows have hypertext links to related help windows.
In a help window, the right mouse button pops up a menu of
all the links for the window.
##

extend pcb

The "extend pcb" statement is an �attribute statement� that allows you to
add declarations of your own to the �parser control block�. With this
feature, data needed by �reduction procedure�s can be stored in the pcb
rather than in global or static storage. This capability greatly
facilitates the construction of �thread safe parsers�.

The extend pcb statement may be used in any configuration section.
The format is as follows:
	extend pcb { <C or C++ declaration>... }

It may, of course, extend over multiple lines and may contain any
C or C++ declarations. AnaGram will append it to the end of the parser
control block declaration in the generated parser �header file�.  There may
be any number of extend pcb statements. The extensions are appended to
the pcb in the order in which they occur in the syntax file.

The extend pcb statement is compatible with both C and C++ parsers. Note
that even if you are deriving your own class from the parser control
block, you might want to use the extend pcb to provide virtual function
definitions or other declarations appropriate to a base class.
##

Far Tables

"Far tables" is a �configuration switch� which defaults
to off. If it is set, when AnaGram builds a �parser� it
will declare the larger tables it builds as FAR. This
can be a convenience when using some memory models with
8086 architecture.
##

Fatal Syntax Errors

This �warning� message occurs when AnaGram cannot
complete the �Analyze Grammar� command on your �syntax
file� because of errors in your syntax file.
##

File Trace

You can use the File Trace facility to verify your grammar,
even before you have implemented �reduction procedures� or
any other code. Thus you can defer writing procedural code
until you have the grammar working to your specifications.

To run File Trace, select
File Trace from the �Action Menu� or click on the File Trace button.

Select a test file. When the �File Trace Window� appears,
double click at any point in the �test file pane�, or
click the �Parse File� button to parse the entire file.
AnaGram will parse up to the point you have selected
according to the rules in your �grammar�. If the test file does not
conform to the rules of the grammar, the parse will halt with a
�syntax error�. You can then inspect the �Parser Stack pane� and the
�Rule Stack pane� to get an idea of the nature of the problem.


AnaGram uses different colors to
distinguish the portion of the test file that has
been parsed from the portion that has not been parsed,
so the location of the error should be readily apparent.

Since the syntax error often occurs somewhat downstream
from the actual error, you may need to back the parse up
and approach the error slowly. In the Test File pane,
double click at any point prior to the error to back
the parse up to that point. You can then click on the
�Single Step� button to perform a single parser action.

You may also use the cursor keys to control the parse.
As long as no error is encountered, the parse is locked
to the blinking cursor. If you cursor past the syntax
error, however, the parse can no longer track the cursor
so the cursor location will differ from the parse location . The
cursor and parse locations will also differ after you single click
at any point other than the current parse location.

When the cursor and the parse location are thus out of synch, the
Single Step button is replaced with a �Synch Parse� button. You
can click on Synch Parse to get the parse back in synch with the
cursor.

The File Trace option will be greyed out on the �Action Menu�
if your grammar has �empty recursion�, since
such a grammar may cause infinite loops in the parser.

Because a File Trace is based on character codes, it will also be greyed out
on the �Action Menu� if your parser uses �token input� rather than
character input.

All parser actions performed by a File Trace update the �trace
coverage� counts, enabling you to verify the extent to which
your test files exercise your parser.

Normally, AnaGram reads test files in "text" mode,
discarding carriage return characters. If your parser
needs to recognize carriage return characters
explicitly, you should turn the "�test file binary�"
switch on.
##

File Trace Window

The �File Trace� window normally consists of three panes:
	The �Parser Stack pane�
	The �Test File pane�
	The �Rule Stack pane�

 If your grammar uses �semantically determined productions�,
the �Reduction Choices pane� will appear when necessary
to allow you to select a �reduction token�. The choice that
you make will be remembered and reused if you should back up
the parse and parse past this point again. The remombered choice
is not made automatically when you use �Single Step�. Thus,
if you wish to
change your choice, position the cursor at the location where
the choice must be made and Single Step past the choice.

If you �reload� the test file, the choices you have made will
be discarded.

The active pane has
a distinctively colored title panel and cursor bar. You can
use the tab key to tab among the panes. The function of
other keyboard keys depends on which pane is active.

Along the bottom of the File Trace Window is a toolbar with
two status boxes:
	�Parse Location�
	�Parse Status�
and five buttons:
	�Single Step�
	�Parse File�
	�Reset�
	�Reload�
	�Help�

 If the blinking cursor loses synch with the current
parse location, the Single Step button is replaced with
the �Synch Parse� button.
##

Grammar Trace Window

The �Grammar Trace� window normally consists of three panes:
	The �Parser Stack pane�
	The �Allowable Input pane�
	The �Rule Stack pane�

 If your grammar uses �semantically determined productions�,
the �Reduction Choices pane� will appear when necessary
to allow you to select a �reduction token�.

The active pane has
a distinctively colored column header and cursor bar. You
can use the tab key to tab among the panes. The function of other
keyboard keys depends on which pane is active.

Along the bottom of the Grammar Trace Window is a toolbar with
a �Parse Status� box, a �text entry� field
and four buttons:
	�Proceed�
	�Single Step�
	�Reset�
	�Help�

 In the �Parser Stack pane� you can see a
representation of the �parser state stack� and �parser state� as they
might appear in the course of execution of your �parser�. You can
examine the �allowable input� tokens and see the changes to the
state and the state stack caused by any input token you
choose. The �Rule Stack pane� shows the relationship between the
contents of the parser stack and your �grammar�. If your grammar
uses �semantically determined productions�, you can select the
appropriate �reduction token� from the �Reduction Choices pane�.

You can enter text characters directly in the �text entry�
field. This means you can run a Grammar Trace like a �File Trace�
where the test file is replaced by the characters you type in the
text entry field.  This is a very convenient way to check out your
grammar.
##

Test File, Test File Pane

In the �File Trace�, the file under test is displayed in the
upper right pane. To parse to a specific point, double
click at that point.

As long as the parse location and the cursor are synchronized,
when you use the cursor keys to
move the cursor, the parse will track the cursor.

If the parse encounters a �syntax error�, it will not be able
to go beyond the location of the error. In this situation,
moving the cursor right or down will cause the cursor position to
differ from the parse location. The parse and cursor positions can also
differ if you single click anywhere in the Test File pane.

If the
parse location and the cursor are thus not synchronized, the
�Single Step� button will be replaced with a �Synch Parse�
button. Click on the Synch Parse button to get the cursor
and the parse back in synch. Of course, the parse will still
not be able to proceed past a syntax error.

In the default color scheme, parsed text is shown on a lighter
background than is unparsed text.

If your grammar uses �semantically determined production�s,
the parse will halt when one is encountered and the �reduction
choices pane� will be displayed so you may select the appropriate
�reduction token�.

At any time you can click on the �Reset button� to reset the parse to
the beginning of the test file. If you modify the test file, you
can click on the �Reload button� to load the modified file and
reset the parse.

Normally, AnaGram reads test files in "text" mode, discarding carriage
return characters. If your parser needs to recognize carriage return
characters explicitly, you should turn the �test file binary�
�configuration switch� on.

Sample test files are provided with the FFCALC and FC �examples�.
##

Parse Location

The current location of the �File Trace� parser in the
�test file pane�. The format is <line number>:<column number>.
##

Parse Status

The current state of the �File Trace� or �Grammar Trace� parser.

 Ready: The parser is ready for input.
 Running: The parser is processing input.
 Parse Complete: The parser has reached the end of the input. Click
on �reset� or �reload� to restart the parse.
 Syntax error: A syntax error has been encountered. The parser cannot
go any further.
 Unexpected end of file: The parser has reached the end of the actual
input but the grammar still expects more.
 Select reduction token: The parser encountered a �semantically determined
production�. Select a �reduction token� from the �Reduction Choices pane�.
 Selection error: The reduction token selected from the Reduction Choices
pane was not allowable input in the present state. Select another
reduction token.
##

Parse File

Use the Parse File button in the �File Trace� to parse all the way
to the end of file. The parse will not stop until it encounters a
�syntax error�, a �semantically determined production�, or the end of file.
##

Reset

Use the Reset button in the �File Trace� or �Grammar Trace� to reset
the parse to its initial state. This is most convenient when using
a �Conflict Trace�, �Error Trace�, or other �Auxiliary Trace�
since these traces seldom begin at state 0.
##

Reload

The Reload button in the �File Trace Window� rereads the test file.
This is convenient if you modify the test file while you are testing
the �grammar�.
##

Lookahead Token

In an �LALR-1 parser� the "lookahead token" is the next token to be
processed. For each �parser state� there is a list of tokens that
may be seen in this state. For each token there is a corresponding
�parser action�. The parser scans the list looking for the lookahead
token and then performs the corresponding parser action. If the
lookahead token cannot be found and there is no �default reduction�,
the parser signals a �syntax error�.

In File Trace, and in some circumstances in Grammar Trace, the
lookahead token can be seen on the last line of the
�Parser Stack pane�.
##

GET_CONTEXT

If you have defined a "�context type�" �configuration
parameter�, and wish to maximize the performance of your
parser, you should write a GET_CONTEXT macro which
stores the context of the input token directly in
�CONTEXT�, the current stack location. Otherwise, you
can write your �GET_INPUT� macro so that it stores
context into �PCB�.�input_context�. The default
definition for GET_CONTEXT will then copy
PCB.input_context to the �context stack� at the
appropriate time.
##

GET_INPUT

GET_INPUT is a macro which you should define to control
�parser input� if your
parser is not �event driven� and you are not using
�pointer input�. If you don't define it, AnaGram will
define it by default to read a single character from
stdin:

	#define GET_INPUT (PCB.input_code = getchar())

 �PCB�.�input_code� is an integer field in the �parser control block�
which is used to hold the current character code. You
may also want GET_INPUT to set the values of �input_context� or
�input_value�. It may call an input function, or it may execute
in-line code when it is invoked.
##

iso latin 1

The "iso latin 1" �configuration switch� controls case
conversion on input characters when the �case sensitive�
switch is set to off. When "iso latin 1" is set, the
default �CONVERT_CASE� macro is defined to convert
correctly all characters in the latin 1 character set.
When the switch is off, only characters in the ASCII
range (0-127) are converted.
##

Dragon Book

The "dragon book" is the classic reference on formal parsing:
	Compilers: Principles, Techniques, and Tools
	Aho, Sethi, and Ullman
	Addison-Wesley, 1986.

 It is called the "dragon book" because of its
colorful cover illustration showing a knight in
armour ("data flow analysis") armed with sword
("�LALR parser generator�") and shield ("syntax
directed translation") at his PC attacking a
bright red dragon ("complexity of compiler design").
##

LALR-1 Parser

An LALR-1 parser is a �parser� created from a
�grammar� by an �LALR parser generator�.
##

LALR Parser Generator

LALR(k) (LookAhead Left-to-right Rightmost derivation)
parser generators are
programs that create parsers algorithmically from
formal grammars. The (k) refers to the number of
lookahead symbols used to make parsing decisions.
Normally, k = 1.

LALR parsers are a subset of the class of
so-called LR parsers. LALR parsers are generally more compact
and less costly to create. These advantages are
obtained at a slight sacrifice in generality. Although
is possible to contrive an LR grammar which has
�conflict�s when analyzed with the LALR algorithm,
such situations rarely occur in practice, and can
be easily resolved by rewriting a few rules.

In the �dragon book�, section 4.7, the authors list the following
attractive properties of LR parsing:
 LR parsers can be constructed to recognize virtually
all programming-language constructs for which context-free
grammars can be written.
 The LR parsing method is the most general nonbacktracking
shift-reduce parsing method known, yet it can be implemented as
efficiently as other shift-reduce methods.
 The class of grammars that can be parsed using LR methods is
a superset of the class of grammars that can be parsed with
predictive parsers.
 An LR parser can detect a syntactic error as soon as it is
possible to do so on a left-to-right scan of the input.
##

Getting Started

AnaGram is an �LALR parser generator�. Its input is
a �syntax file�, which you prepare with an ordinary
programming editor. Its output is a �parser file�. which
you can compile with a C or C++ compiler on any platform
and link into your program. To compile on Unix platforms, set
the �no cr� �configuration switch�.

AnaGram has extensive context-sensitive hypertext
�help�. In any AnaGram window, press �F1� or select an item with the
�Help Cursor�. Further documentation in HTML format, including
documentation of examples, is found in the html subdirectory. AnaGram
also has a comprehensive hard-copy manual, the AnaGram User's Guide.

If you are new to AnaGram, you might begin by reviewing the Help
Topics �How AnaGram Works� and �Program Development�, and looking at
An Annotated Example and Summary of AnaGram Notation in the HTML
documentation.

If you are not already familiar with formal parsing techniques, you
may want to read Introduction to Syntax Directed Parsing in the HTML
documentation. Note also the Fahrenheit to Celsius conversion
examples in the examples/fc directory, which comprise a graded
sequence of syntax files illustrating most of the basic
principles of �syntax directed parsing� in easy steps. Documentation
is in html/fc.html.

AnaGram has many features, many of which are not
commonly found in parser generators:
 the �configuration section�
 �thread safe parsers�
 C++ support
 the �disregard� and �lexeme� statements
 �event driven� parsers
 �character sets�
 �virtual productions�
 �File Trace�, �Grammar Trace�
 �automatic resynchronization�
 �error token resynchronization�

To familiarize yourself with the many options available for configuring
your parsers, select �Configuration Parameters� from the �Browse Menu�.
Use �F1� or the �Help Cursor� to pop up explanations of the various
parameters.


If you don't find the information you need, please visit the
AnaGram web page at http://www.parsifalsoft.com for further
information and support.

##

How AnaGram Works

AnaGram contains an �LALR Parser Generator� which creates a
table driven �LALR-1 parser� from a �grammar� written in a variant
of Backus-Naur Form. AnaGram works in two steps. In the
first step, or analysis phase, it reads a �syntax file� and
compiles a number of tables describing the grammar. In the
second step, or build phase, it writes two output files:
a �parser file� written in C or C++ and a �header file�.

Syntax files normally have the extension .syn. The rules for
writing syntax files are given in the AnaGram User's Guide
and in the Summary of AnaGram Notation in the HTML documentation.

The header file contains definitions and declarations, including
the definition of a �parser control block�.

The parser file consists of:
 The �C prologue�, if any.
 Definitions and declarations provided by AnaGram.
 �Reduction procedure�s.
 a customized �parsing engine�.
 a �parse function� to be called when input is to be parsed.

 The name of the parser file is controlled by the �parser
file name� �configuration parameter�. The name of the
parse function itself is controlled by �parser name�. In the
default case, the parser file will have the same name as
the syntax file, with the extension .c. The name of the
parse function is given by the �parser name� parameter. It defaults
to the name of the syntax file.
##

Examples

The EXAMPLES directory of the AnaGram distribution disk
contains a number of examples to help you get started.
Documentation for the examples, in HTML format, is located
in the html directory (start at index.html or examples.html).

The traditional Hello, World, in examples/hw, is a good
example for getting familiar with the mechanical
procedures of building both C and C++ parsers from
�syntax file�s.

The Fahrenheit/Celsius conversion examples in the
examples/fc directory on your AnaGram diskette comprise
a graded sequence of syntax files which illustrate
most of the basic principles of �syntax directed
parsing� in easy steps. In addition, these examples
demonstrate many features of AnaGram which are not
found in other parser generators:
 the �configuration section�
 �character sets�
 �virtual productions�
 �error token resynchronization�
 �File Trace�
 the �disregard� and �lexeme� statements
 �event driven� parsers

The Four Function Calculator (examples/ffcalc) is used
traditionally to demonstrate parser generators. If you
are already familiar with �syntax directed parsing� this
example will give you a good overview of the basics of
AnaGram. An annotated version of this example may be
found in AnaGram's HTML documentation.
The FFCALC example illustrates the use of �precedence
rules� to resolve �conflicts�.

Other examples are available to demonstrate additional
features of AnaGram.

RCALC (examples/rcalc) is a simple four function
calculator which accepts roman numeral input. It
illustrates the following AnaGram features:
 �pointer input�
 �SYNTAX_ERROR� macro
 �context stack�

DSL (examples/dsl) is a complete DOS script language,
which provides capabilities well in excess of DOS batch
files. DSL is a complete working program, used in the
past to create AnaGram's install program. Some of the
specific features of AnaGram which it illustrates are:
 �distinguish lexemes�
 �distinguish keywords�
 �far tables�

MPP is a fully functional macro preprocessor for C or
C++. Included with MPP are two C grammars, either of
which may be incorporated into MPP. MPP uses several
parsers that work together:
 TS.SYN is the primary token scanner parser that
identifies tokens, and handles preprocessor
commands.
 MAS.SYN is used to do macro argument substitution.
 CT.SYN is used to identify tokens that result from
string concatenation during macro argument
substitution.
 EX.SYN is used to evaluate constant expressions in
#if preprocessor statements.

Among the more powerful features of AnaGram that MPP
illustrates are:
 �semantically determined productions�
 �event driven� parsers
##

Goal, Goal Token, Start Token

The �grammar token� is the token which represents the
"top level" in your grammar. Some people refer to it as
the "goal" or "goal token" and others as the "start
token". Whichever it is called, it is the single token
which describes the complete input to your parser.

The most common way to specify a grammar token is as
follows:
	grammar -> statements?..., eof
This production tells AnaGram that the input to your
parser consists of a (possibly empty) sequence of
statements followed by an end of file token.

There are a number of ways of specifying which token in
your �syntax file� represents the top level of your
grammar. You may simply name it "grammar", or you may
tag it with a '$' character when you define it, or you
may set the �grammar token� �configuration parameter�.

If you should inadvertently tag several tokens with the
'$' character and/or set the grammar token parameter,
it is the last such specification in the file which
wins. Some people develop their grammars bottom up,
gradually adding new levels of complexity. In the
course of development, they may specify a number of
tokens as grammar tokens and forget to remove the old
specifications.

Notice that if you define the token "grammar" anywhere
in your syntax and specify the grammar token otherwise,
"grammar" will not be the grammar token. This is to
keep "grammar" from being a reserved word. If you need
to use it in your syntax for something other than the
whole grammar, you are free to do so.
##

Grammar

Traditionally, a "grammar" is a set of �production�s
which taken together specify precisely a set of
acceptable input streams, in terms of an abstract set
of �terminal tokens�. The set of acceptable input
streams is often called the "language" defined by the
grammar.

In AnaGram, the term "grammar" also includes
�configuration sections� as well as the �definitions�
of �character sets� and �virtual productions� which
augment the collection of productions. The term is
often used in contrast to the term "�syntax file�"
which is used to signify the complete AnaGram source
file including reduction procedures and embedded C or
the term "�parser�" which refers to AnaGram's output
file.

A grammar is often called a "syntax", and the rules of
the grammar are often called syntactic rules.
##

Grammar Analysis

The major function of AnaGram is the analysis of
context-free grammars written in a particular variant
of Backus-Naur Form.

The analysis of a grammar proceeds in four stages. In
the first, the input grammar is analyzed and a number
of tables are built which describe all of the
�production�s and components of the �grammar�.

In the second stage, AnaGram analyzes all of the
character sets defined in the grammar, and where
necessary, defines auxiliary tokens and productions.

In the third stage, AnaGram identifies all of the
states of the parser and builds the go-to table for the
parser.

In the fourth stage, Anagram identifies �reduction
tokens� for each completed �grammar rule� in each state
and checks for �conflict�s.

Use the �Analyze Grammar� command to cause AnaGram to
analyze your grammar.
##

Grammar Is Ambiguous

This �warning� message appears if your �grammar�
contains �conflict�s. AnaGram will resolve �shift-reduce
conflicts� by selecting the shift option. It will
resolve �reduce-reduce conflicts� by selecting from the
conflicting �grammar rule�s the one which appears first
in the �syntax file�.
##

Grammar Rule

A "grammar rule" is the right hand side of a production.
It is a sequence of �rule elements�. Each rule element
identifies some token, which can be either a �terminal
token� or �nonterminal token�.

A grammar rule is "matched" by a
corresponding sequence of tokens in the input stream to
the parser. The rule elements in the grammar rule may be
�token name�s, �set expressions�, �character constants�,
�immediate action�s, �keyword strings�, or �virtual
productions�.

A grammar rule may be followed by an
optional �reduction procedure�. The �semantic values� of
the tokens that comprise the rule may be passed to the
reduction procedure by using �parameter assignments�.

A grammar rule always makes up the right hand side of a
production. The left hand side of the production
identifies one or more �nonterminal tokens�, or
�reduction tokens�, to which the rule reduces when
matched. If there is more than one reduction token,
the production is called a �semantically determined production� and
there should be a �reduction procedure� to select
the correct reduction token at run time.
##

Grammar Token

The "grammar token" �configuration parameter� may be
used to specify the �goal�, or "start" token for the
syntax analyzer portion of AnaGram. Alternatively, you
could simply call the token "grammar", or you could
append a '$' character to it when you define it.

Each grammar must have a grammar token specified before
it can be analyzed or before a parser can be built. The
grammar token is the single token to which the grammar
finally condenses. When this token is identified by the
parser, the parse is complete.
##

Grammar Trace

AnaGram's Grammar Trace facility lets you examine the workings of your
�parser� in detail. You can use the Grammar Trace as soon as you have
analyzed your �grammar�, even before you have written any �reduction
procedure�s or other code. Thus you can defer writing procedural code
until you have the grammar working to your specifications.

Select the �Grammar Trace Window�
from the �Action Menu� or click on the Grammar Trace
button.

In the �Parser Stack pane� you can see a representation of the
�parser state stack� and �parser state� as they might appear in the
course of execution of your �parser�. The �Rule Stack pane� shows the
relationship between the contents of the parser stack and your
�grammar�. If your grammar uses �semantically determined
productions�, you can select the appropriate �reduction token� from
the �Reduction Choices pane�.

At any stage, the �Parser Stack� represents a parse
in progress. It shows the sequence of �token�s that have
been input so far and the states in which they were
seen. When a production is complete and the grammar rule
is reduced, the tokens that make up the rule are removed
from the stack and replaced by the token on the left
side of the production. Initially, the Parser Stack contains
only a �lookahead line�.

To explore your grammar, choose �token�s one by one from
the �Allowable Input�
pane. This pane shows the tokens allowable at the current state of the
grammar, and the actions that result when the tokens are chosen.

You can also enter text characters directly in the �text entry�
field. This means you can run a Grammar Trace like a �File Trace�
where the test file is replaced by the characters you type in the
text entry field.  This is a very convenient way to check out your
grammar. Text entry is, of course, not appropriate for grammars that
expect �token input�.

In a �File Trace� you can advance the parse no matter which pane is
active. In a Grammar Trace there is a question as to whether input is
intended to come from the Allowable Input pane or the text entry
field.  Therefore the parse can only be advanced when one of these
two is active to indicate that it is the source of input.

Specialized prebuilt Grammar Traces such as the �Conflict Trace� and
the �Auxiliary Trace� can be selected from �Auxiliary Windows� popup
menus where appropriate.

All Grammar Trace activity updates the �trace coverage� counts.
##

Text Entry

It is sometimes more convenient to enter text in the
text entry box on the �Grammar Trace� toolbar than to
select individual tokens from the �Allowable Input pane�.

By entering text you can proceed quickly to a troublesome
state without having to choose each individual token
en route.

After entering text, press Enter or click on the Proceed
button to parse the text. Click on the single step button
to work slowly through the text step by step.
##

header file name

The "header file name" parameter names the �parser
header� file that AnaGram will generate when it builds
your parser. This header file can be used with your
parser or with other modules in your program. The
header file contains a number of typedef statements and
an number of macro definitions which are needed in your
parser and may be useful in other modules.

If the value of this parameter contains a '#' character,
AnaGram will substitute the name of your syntax file for
the '#'. The default value of "header file name" is
"#.h".
##

Help, Using Help

There are 3 main ways to access AnaGram Online Help:
 Press F1 for context-sensitive help from most windows and menu items.
 Similarly, use the �Help Cursor� from most windows and menu items.
 From the Help menu, you can bring up �Help Topics� and choose a topic.

You can also get fly-over help for the toolbar buttons on the �Control
Panel�. File and Grammar Traces have a Help button.

AnaGram's Help windows, unlike most others, remain on-screen until you
dismiss them. This means you can refer to several topics at once. They
have hypertext links to other Help topics. Also, right-clicking
the mouse on a Help window or pressing F1 will pop up an Auxiliary
Windows menu of all linked topics in the window. "Using Help" is always
available from this popup menu.

Note that, for the �Warnings�, �Configuration Parameter�s and �Help
Topics� windows, F1 will give you help for the item
on the highlighted line, whereas the Help Cursor allows you
to select any line by clicking on it.

AnaGram also has documentation in HTML format, indexed in the index.html
file. This documentation covers Getting Started, examples, and some
further topics mainly condensed from the User's Guide. Hard copy
documentation is in the AnaGram User's Guide, which has the most
detail.
##

Hidden

In a �configuration section� of your grammar you may use
an �attribute statement� to declare one or more tokens
to be "hidden". Tokens that are "hidden" do not appear
in the �token names� table, and thus do not appear in syntax error
diagnoses. When your parser attempts to determine the
�error frame� of a �syntax error�, it will disregard the
tokens that have been declared hidden. The hidden
declaration consists simply of the keyword hidden
followed by a list of tokens, separated by commas and
enclosed in braces ({ }):
	[ hidden { widget, wombat, foo, bar } ]

 You would use the "hidden" attribute primarily for
tokens whose name would not mean anything to your users.
##

Immediate Action

Immediate actions are snippets of C code which are to
be executed in the middle of a �grammar rule�. Immediate
actions are denoted by a '!' character followed by
either a C expression, terminated by a semicolon; or a
block of C code enclosed in braces. For example, in a
simple desk calculator example one might write the
following:
	transaction
	    -> !printf('#');, expression:x  =printf("%d\n",x);

 Notice that the only apparent difference between an
immediate action and a �reduction procedure� is that the
immediate action is preceded by '!' instead of '='.
Notice that the immediate action must be followed by a
comma to separate it from the following �rule element�.

Immediate actions may also be used in �definition�s:
	prompt = !printf('#');

The above example, using this definition would then be:
	transaction
	  -> prompt, expression:x  =printf("%d\n",x);

 You could accomplish the same result by writing a �null
production� and a reduction procedure:
  prompt
   ->   =printf('#');

This is exactly how AnaGram implements immediate
actions.
##

Implementation Errors

"Implementation errors" are errors your parser detects
which are not the immediate result of bad input.  When
it encounters an implementation error, your parser will
call a macro which you can define to deal with the
problem in a manner suitable to your needs. If you don't
provide these macros, AnaGram will make default
definitions. There are two macros corresponding to two
implementation errors:
	�PARSER_STACK_OVERFLOW�
	�REDUCTION_TOKEN_ERROR�
##

Inappropriate Value

This �warning� message appears when the value assigned to
a �configuration parameter� is not appropriate to that
parameter. Check the definition of the parameter, by
opening the �Configuration Parameters Window�,
selecting the parameter and pressing F1.
##

Initializer

For every �parser� it generates, AnaGram generates an
"initializer" function to call the parser. AnaGram
names the initializer by prefixing the �parser name�
with "init_". If your parser is �event driven�, you must
call the initializer before you call the parser.

If your parser is not event driven, AnaGram will
normally include a call to the initializer in the
parser. If you wish to be able to call your parser more
than once without its being re-initialized, you may turn
off the �auto init� �configuration switch�. When you do
this, you assume responsibility for calling the
initializer. If your parser is event driven, you must
always call the initializer function.

If the �reentrant parser� switch is set, the initializer takes
a pointer to the �parser control block� as its sole argument. Otherwise
it takes no arguments. The initializer returns no value. All
communication is by means of the �parser control block�.
##

Input Character

The actual unit of �parser input� is usually a
single character. Note that you are not limited to
eight-bit characters. Your parser will use the input
character to index a translation table, �ag_tcv�, to
determine the �token number� for that character. The
�token number� identifies the actual syntactic token.
The character code itself will be the �semantic value�
of the token. Note that AnaGram groups together all
input characters that are syntactically
indistinguishable into a single input token.
##

input_code

input_code is a field in the �parser control block�
which contains the current �input character�, or, if your
�GET_INPUT� macro supplies �token number�s directly, the
token number.

If you write your own �GET_INPUT� macro, you must make
sure that you store the input character, or token
number, you get into �PCB�.input_code.
##

INPUT_CODE(t)

If you set both the �pointer input� and the �input
values� �configuration parameter�s, you must provide an
INPUT_CODE macro for your parser. In this situation,
your parser will use the pointer to load the
�input_value� field of the �parser control block� and
uses the INPUT_CODE macro to extract the appropriate
value for the �input_code� field. For example, if the
input_value is a structure and the appropriate member
field is called "id" you would write:

	#define INPUT_CODE(t) (t).id
##

input_context

"input_context" is a field which AnaGram adds to the
definition of the �parser control block� structure when
you define a �context type� �configuration parameter�.
If you choose, you can write your GET_INPUT macro so
that it stores the context value in �PCB�.input_context.
The default definition for �GET_CONTEXT� will then stack
the context value at the appropriate time. You can think
of PCB.input_context as a sort of temporary "parking
place" for the context value.
##

Input Scan Aborted

This �warning� message appears if AnaGram is unable to
finish scanning your �syntax file� because of previous
errors.
##

input values

"Input values" is a �configuration switch� which
defaults to off. If your �parser input� includes
explicit �token value�s which are not simply the ascii
values of corresponding ascii input characters, you must
set the "input values" switch to inform AnaGram. Unless
your parser is �event driven� or uses �pointer input�,
you must also provide your own �GET_INPUT� macro.

If your parser uses pointer input, you must provide an
�INPUT_CODE(t)� macro.

The semantic value of an input token is to be stored in the
�input_value� field of the parser control block.
##

input_value

input_value is a field in the �parser control block�
which is used to store the semantic value of the input
token.

If you write your own �GET_INPUT� macro, and you have
set the �input values� �configuration switch�, you
should make sure that you store the value of the �input
character� or token into �PCB�.input_value.
##

Internal Error

"AnaGram internal error: ..." is a �warning� message which
appears if one of AnaGram's internal consistency tests
fails. This message should never appear if AnaGram is
working properly. Usually AnaGram will abort on
encountering an internal error, although under
a small set of circumstances it may continue. Should
this happen, it would be wise to close AnaGram and
restart it.

If you do get an internal error, please note the complete
message identifing the problem and file a bug report,
following the directions posted on the AnaGram web page
at http://www.parsifalsoft.com.
A copy of the relevant
syntax file and a summary of the circumstances surrounding
the problem would be greatly appreciated.
##

Intersection

In set theory, the intersection of two sets, A and B, is
defined to be the set of all elements of A which are
also elements of B. In an AnaGram �syntax file�, the
intersection of two �character sets� is represented with
the '&' operator. The intersection operator has lower
�precedence� than the �complement� operator, but higher
precedence than the �union� and �difference� operators.
The intersection operator is �left associative�.
##

Keyboard Support

AnaGram can be controlled entirely from the keyboard. In the Control
Panel, you
can tab to any button and press Enter to select it. In addition to
the conventional
Windows keyboard functions, the following keys have been implemented:
	Escape closes any AnaGram window except the Control Panel.
	F8 toggles between an active AnaGram window and the Control Panel
	F10 accesses the Control Panel menu from any
AnaGram Window.
	Shift F10 pops up the Auxiliary Windows menu
##

Keyword, Keyword String

Keywords are a very important feature of AnaGram. They
provide an easy way to pick up special character
sequences in your input, thereby eliminating the need
for a lot of tedious �production�s.

If AnaGram finds, on the right hand side of one of your
�grammar� productions, a string enclosed in double
quotes, such as "IF", it automatically creates from the
string a "keyword" which is incorporated into your
parser. You may have any number of keywords. A keyword
is treated as a single terminal token. Recognition of
keywords is governed by the �case sensitive� switch.

Your parser will look for a keyword in its input stream
wherever you have defined this particular keyword to be
legitimate input. It will do whatever lookahead is
necessary in order to pick up the entire keyword. If
several keywords match the input, such as IF and IFF,
it will select the longest match, IFF in this example.

Important points to notice about keywords:
 1) Keywords take precedence over ordinary
characters in the input stream - thus if the character
I and the keyword IF are both legitimate input at some
point, IF will be selected, if present, in preference
to I.
 2) Keywords are not reserved words. Your parser
will only look for a keyword when it is in a state
where that keyword is legitimate input.
 3) Keywords do not participate in character sets
and should not appear in definitions of character sets.
In particular, they are not considered as belonging to
the complement of a character set. Thus
a keyword would not be considered legitimate input
for the production
		next char -> ~( '/' + '*' )

 4) Keywords may appear in virtual productions.

 5) Keywords may be named by means of a definition.

AnaGram will list all the keywords in your grammar in
the �Keywords� window. In addition, in numerous
windows where the cursor bar selects a state, the
�Auxiliary Windows� popup menu will list a Keywords option.
This window will provide a list of the keywords
acceptable in the selected �parser state�.

On occasion, a kind of conflict, called a �keyword
anomaly� may occur. If so, such conflicts will be listed
in the �Keyword Anomalies� window. The "�sticky�"
�attribute statement� is useful in dealing with keyword
anomalies.
##

Keyword Anomalies Found

This �warning� message indicates that AnaGram has found
at least one �keyword anomaly� in your �grammar�. Open
the �Keyword Anomalies� window to see a list of those
that have been found.
##

Keyword Anomaly

In �syntax directed parsing�, it is assumed that input
�token�s can be uniquely identified. In the case of
�keyword�s, however, there is the possibility that the
individual characters making up the keyword, as well as
the keyword taken as a whole, could constitute
legitimate input under some circumstances. Thus
�keywords�, though a powerful and useful tool, are not
completely consistent with the assumptions that underlie
�syntax directed parsing�. This can occasionally give
rise to a type of conflict, diagnosed by AnaGram,
called a "keyword anomaly". AnaGram is quite
conservative in its diagnoses, so that many keyword
anomalies it reports are actually innocuous and can be
safely ignored.

Basically, a keyword anomaly is a situation where a
keyword is recognized, causes a reduction, and the
parser arrives in a state where the keyword is not
legal input. If the keyword, seen simply as a sequence
of characters, might have been legal input in the
original state, AnaGram notes the existence of a
keyword anomaly.

If you have a keyword that causes a keyword anomaly and
it is actually a reserved word in your grammar, the
anomaly is by definition innocuous. You should use the
�reserve keywords� statement to inform AnaGram that the
keyword is reserved and the anomaly need not be
diagnosed.

To help identify and correct any problems associated
with keyword anomalies, AnaGram provides the �Keyword
Anomalies� window to identify the anomalies, and the
�Keyword Anomaly Trace� to help you understand a
particular anomaly.
##

Keyword Anomaly Trace

A Keyword Anomaly Trace is a ready made �grammar trace�
window which you may select from the �Auxiliary Windows�
menu of the �Keyword Anomalies� window. The anomaly
trace provides a path to a state which illustrates the
�keyword anomaly�. In this state, the keyword is a
reducing token, but after the reduction, it is not
allowable input.
##

Keyword Anomalies

The Keyword Anomalies window is available only if your
grammar has �keyword� anomalies.

Each entry in the Keyword Anomalies window consists of
two lines. The first line identifies the �parser state�
at which the �keyword anomaly� occurs and the offending
keyword. The second line identifies the �grammar rule�
which the keyword may erroneously reduce.

The �Auxiliary Windows� menu provides three auxiliary
windows keyed directly to the anomaly to help you
determine the nature of the problem: The �Keyword
Anomaly Trace� window, the �Reduction Trace� window, and
the �Rule Derivation� window. Three other windows provide
supporting information: the �Reduction States� window,
the �Rule Context� window and the �State Definition�
window.
##

Keywords

The Keywords entry in the �Browse Menu� pops up a
window which lists all of the keywords defined in your
�grammar�. The �token number� is also specified.

A Keywords window is also an option in the �Auxiliary
Windows� popup menu for any window which distinguishes
various states of your parser. The Keywords window will
show all of the �keyword�s which will be recognized in
the state selected by the cursor bar in the parent
window.

The �Auxiliary Windows� menu for a Keywords window
provides a �Token Usage� option which will allow you to
all the uses of a particular keyword in your grammar.
##

left

"left" controls a �precedence declaration�, indicating
that all of the listed �rule elements� are to be
considered �left associative�.
##

Left Associative

A binary operator is said to be left associative if
an expression with repeated instances of the operator
is to be evaluated from the left. Thus, for example,
  x = a/b/c

is normally taken to mean x = (a/b)/c The division
operator is said to be left associative.

In �grammar�s with �conflict�s, you may use �precedence
declaration�s to specify that an operator should be left
associative.
##

Lexeme

The "lexeme" �attribute statement� is used to fine-tune
the "�disregard�" statement. The lexeme statement takes
the form:
    lexeme { T1, T2,....Tn }

where T1,...Tn is a list of �nonterminal� tokens
separated by commas. Lexeme statements may be placed in
any �configuration section�, and there may be any number
of them.

When you specify that a �token� is to be disregarded,
AnaGram rewrites your �grammar� so that the token will be
passed over whenever it occurs at the beginning of a
file or following a lexical unit, or "lexeme". If you
have no lexeme statement, then the lexemes in your
grammar are just the terminal tokens.

The lexeme statement allows you to specify that certain
nonterminal tokens are also to be treated as lexemes.
This means that the disregard token will be skipped
following the lexeme, but not between the characters
that constitute the lexeme.

Lexemes correspond to the tokens that a lexical scanner,
if you were using one, would commonly identify and pass
to a parser as single tokens. You don't usually wish to
disregard �white space� within these tokens. For
example, in a grammar for a conventional programming
language where blank characters are to be disregarded,
you might include:
  [
    lexeme {string, character constant, name, number}
  ]

since blank characters must not be overlooked within
strings and constants, and should not be permitted
within names or numbers.

If your grammar allows for situations where successive
lexemes could run together if they were not separated
by space, a name followed by a number, for example, you
may use the "�distinguish lexemes�" �configuration
switch� to force a separation between the tokens.

White space may be used explicitly within definitions of
lexeme tokens in your grammar if desired, without
causing conflicts. Thus, if you wish to allow embedded
space in variable names, you might write:
  [
    disregard space
    lexeme {variable name}
  ]
  space = ' ' + '\t'
  letter = 'a-z' + 'A-Z'
  digit = '0-9'

  variable name
   -> letter
   -> variable name, letter + digit
   -> variable name, space..., letter + digit
##

line

line is a field in your �parser control block� used for
keeping track of the line number of the current
character in your input. Line and column numbers are
tracked only if the �lines and columns� �configuration
switch� has been set.
##

line length

Line length is an �obsolete configuration parameter�.
##

Line Numbers

"Line numbers" is a �configuration switch� which
defaults to off. If it is on, the �Build Parser�
command will put "#line" directives into the generated
C code file so that your compiler diagnostics will
refer to lines in the �syntax file� rather than in the
generated C code file. For more information on the
"#line" directive, see Kernighan and Ritchie, second
edition, section A12.6.

If the "line numbers" switch is off, AnaGram will put
comments into your parser file to help you find
reduction procedures and embedded C in your syntax
file.

Prior to AnaGram 2.01, if your C or C++ compiler required that the
backslashes in the pathname in the #line directive be doubled, you
would have used AnaGram's �escape backslashes� switch to make this
happen. Although you may still use �escape backslashes�, it should no
longer be necessary because AnaGram now puts forward slashes into #line
pathnames instead of backslashes.

If you wish, you may specify the pathname in the #line
directives explicitly by using the �Line Numbers Path�
configuration parameter.

You may also wish to change the "�parser file name�"
parameter to provide a full path name for your parser
file.
##

Line Numbers Path

"Line Numbers Path" is a �configuration parameter�
which takes a string value. It defaults to NULL.

When you have set the �Line Numbers� �configuration
switch� and Line Numbers Path is not NULL, AnaGram
uses it in the #line directive in place of the full
path name of your �syntax file�.

Note that Line Numbers Path should be the complete
pathname for your syntax file.

Line Numbers Path is useful when using AnaGram in cross
platform development. When parsers are to be compiled
and tested on a platform different from that used to run
AnaGram, you may use Line Numbers Path to provide a
pathname on the platform used for compiling and
testing.
##

Lines and Columns

"Lines and columns" is a �configuration switch� which
defaults to on. When set, i.e., on, it causes the
�Build Parser� command to incorporate code into your
parser which will automatically track the line number
and column number of the input token.

You would normally set the "lines and columns" switch
when you are planning to build a parser which will read
an input file and which will need to diagnose �syntax
errors� with some precision.

Your parser will store the line and column numbers in
the �line� and �column� fields respectively in the
�parser control block�.

If the input to your parser includes tab characters, you
should either set the �tab spacing� �configuration
parameter� appropriately or provide a �TAB_SPACING�
macro for your parser.

Your parser will count line and column numbers beginning
with one.
##

Main Program

The "main program" �configuration switch� determines
what AnaGram does if you invoke the �Build Parser�
command, but have no �embedded C� in your �syntax
file�. If the switch is on and you have not specified
�pointer input� or an �event driven� parser, AnaGram
creates a main program which does nothing but call your
�parser�. The "main program" switch defaults to on.

This feature, along with the default definitions for
�GET_INPUT� and �error handling�, makes it possible
to write a grammar with no �embedded C� or �reduction
procedure�s whatsoever and still get an executable
program which will read input from stdin and parse it
according to your grammar.
##

Marked Rule

A "marked rule" is a �grammar rule� together with a
marked token that indicates how much of the rule has already
been matched. The �marked token� and any tokens following it
indicate the input that should be expected if the
remainder of the rule is to be matched.

When marked rules are displayed in AnaGram windows, the
marked token is represented by a difference in the font. The token may
be in bold face, underlined, italicized, shown with a different point
size, or in a different font altogether. Since AnaGram allows you to
change fonts to suit your own preferences, you should be careful that
the font you choose for the marked tokens allows them to be readily
distinguished from the other tokens in your grammar rules. An
underlined font is often suitable.
##

Max conflicts

The "max conflicts" �configuration parameter� limits the
number of �conflict�s AnaGram will record.  Sometimes, a
simple error editing your syntax file can cause hundreds
of conflicts, which you don't need to see in gory
detail. The default value of max conflicts is 50. If you
have a grammar that is in serious trouble and you want
to see more conflicts, you may change max conflicts to
suit your needs.
##

Missing

The �warning� message Missing <element 1> in <element 2>
indicates that AnaGram expects to see an instance of
syntactic element 1 at the specified location, internal
to an instance of syntactic element 2. AnaGram cannot
reliably continue parsing its input after an error of
this type. Therefore, it limits further analysis of
your grammar to scanning for syntax errors.
##

Missing Production

"Missing production, TXXX: <token name>" is a �warning�
message which indicates that the specified �token�
appears to be defined recursively, but there is no
initial �production� to get the recursion started. If
you get this warning, check your �grammar� closely.
##

Missing Reduction Procedure

"Missing reduction procedure, RXXX" is a �warning�
message which appears either when the �grammar rule� indicated
specifies a �parameter assignment� but does not have a
�reduction procedure� to use it, or when the rule has no reduction
procedure but the value of the token on the left hand side is used in
as an argument for some other reduction procedure and the �default reduction value�
does not have the same type as the token on the left hand side.
In this latter case, a reduction procedure may be needed to effect
correct type conversion.

This warning is
provided in case the lack of a reduction procedure is an
oversight.
##

Multiple Definitions

"Multiple definitions for TXXX: <token name>" is a
�warning� message which indicates that the specified
�token� has been defined both as a �character set� and
as a �nonterminal token�. It cannot be both.
##

Near Functions

"Near Functions" is a �configuration switch� that
defaults to off. It controls the use of the "near"
keyword for static functions in your parser. If your
parser is to run on an 80x86 processor you might wish
to turn it on. Your parser will then be a slight bit
smaller and will run a little bit faster.

If you are going to run your parser on some other
processor or use a C or C++ compiler that does not
support the "near" keyword you should make sure "near
functions" is set to off.
##

Negative Character Code in Pointer Mode

This �warning� message appears if your �grammar� defines
negative character codes and uses �pointer input�. If
your grammar uses the default definition for �pointer
type� it will be reading unsigned characters so that
the parser will never see the negative codes that have
been defined. You may correct the problem by providing
your own definition of pointer type.
##

Nest Comments

"Nest comments" is a �configuration switch� which
defaults to off. It controls the treatment of �comments�
while scanning your �syntax file�. It defaults to off,
in accordance with the ANSI standard for C which
disallows �nested comments�. Note that AnaGram scans
comments in any �embedded C� code as well as in the
grammar specification. You may turn this switch on and
off as many times as necessary in a single file.
##

Nested Comment

As delivered, AnaGram treats C style �comments�
according to the ANSI standard: They do not nest. For
those who prefer nested comments, however, the �nest
comments� �configuration switch� allows them to nest.
##

Nesting too deep

This �warning� message indicates that �set
expression�s or �virtual productions� are
nested so deeply they have exhausted the available
stack space and AnaGram cannot continue its analysis.

Use a �definition� statement to name an intermediate
level.
##

no cr

"no cr" is a �configuration switch� which
defaults to off. When this switch is set, it will
cause the �parser file� and �header file� to be
written without carriage returns. This is convenient
if you wish to use the generated parser files in a
Unix environment.
##

No Grammar Token Specified

This �warning� message appears if your �grammar� does not
specify a �grammar token�. Edit your �syntax file� to
specify one.
##

No Productions in Syntax File

This �warning� message appears if AnaGram did not find
any �productions� at all in your �syntax file�. Check
to see you have the right file.
##

No Such Parameter

This �warning� message appears when AnaGram does not
recognize the name of a �configuration parameter� you
have tried to set in your �syntax file�. Check the
spelling of the parameter you wish to set in the
�Configuration Parameters Window�.
##

No Terminal Tokens in Expansion

No terminal tokens in expansion of TXXX is a �warning�
message indicating that there are no terminal tokens
to be found in an expansion of the specified token.
Although there are a few circumstances where this could
be legitimate, it is more likely that there is a missing
rule in the grammar.
##

Not a Character Set

"Not a character set, TXXX: <token name>" is a �warning�
message which indicates that the specified �token� has
been used both on the left side of a �production� and in
a �character set� expression defining some other token.
AnaGram will use an empty set in place of the
specified token in evaluating the �character set�. You
will get another warning, �Error defining� token, when
AnaGram finishes its evaluation of the character set.
##

Nothing Reduces

"Nothing reduces TXXX -> RYYY" is a �warning� message
which indicates that the �grammar� does not specify any
input to follow an instance of the indicated �grammar
rule�. In all probability, the grammar does not have
any explicit end of file, or �eof token�. If the grammar
does not have any conflicts with �token� T000, then an
explicit end of file indicator is not necessary.
Otherwise you should modify your grammar to require an
explicit end of file.
##

Null Character in String

This �warning� message appears when AnaGram finds an
explicit null character in a quoted string. If you must
allow for a null in a �keyword string�
you will have to rewrite your
�grammar rule�. For instance, instead of

  widget
    -> "abc\0def"

write

  widget
    -> "abc", 0, "def"
##

nonassoc

"nonassoc" controls a �precedence declaration�,
indicating that all of the listed �rule elements� are
to be considered non-associative.
##

Nonterminal Token, Nonterminal

A nonterminal token is one which is constructed from a
series of other tokens as specified by one or more
�production�s. Nonterminal tokens are to be
distinguished from �terminal token�s, which are the
basic input units appearing in your input stream.
Terminal tokens most often represent single characters
or a character belonging to a �character set� such as
'a-z'.
##

Null Production

A "null production" is one that has no tokens on the
right hand side whatsoever. Null �production�s
essentially are identified by the first following input
token. Null productions are extremely convenient
syntactic elements when you wish to make some input
optional. For example, suppose that you wish to allow an
optional semicolon at some point in your grammar. You
could write the following pair of productions:
  optional semicolon -> | ';'
Note that a null production can never follow a '|'.

This could also be written on multiple lines thus:
  optional semicolon
    ->
    -> ';'

You can always rewrite your grammar to eliminate null
productions if you wish, but you usually pay a price in
conciseness and clarity. Sometimes, however, it is
necessary to do such a rewrite in order to avoid
�conflict�s, to which null productions are especially
prone. For example suppose you have the following
production:
  foo -> wombat, optional semicolon, widget

You can rewrite this as two productions:
  foo
    -> wombat, widget
    -> wombat, ';', widget

This rewrite specifies exactly the same input language,
but is less prone to conflicts. On the other hand, it
may require significantly more table space in your
parser.

If you have a null production with no �reduction
procedure� specified, your parser will automatically
assign the value zero to �reduction token�.

Null productions can also be generated by �virtual
productions�.

A token that has a null production is a "�zero length�"
token.
##

Old Style

"Old Style" is a �configuration switch� which defaults
to off. It controls the function definitions in the code
AnaGram generates. When "old style" is off, it generates
ANSI style calling sequences with prototypes as
necessary. When "old style" is on, it generates old
style function definitions.
##

Output Files

When you use the �Build Parser� command, to request
output from AnaGram, it creates two files: a �parser
file� and a �parser header� file.
##

Page Length

"Page length" is an �obsolete configuration parameter�.
##

Obsolete Configuration Parameter, Obsolete Configuration Switch

A number of �configuration parameter�s and �configuration switch�es
which were used in the DOS version of AnaGram are no longer
used, but are still recognized for the sake of upward
compatibility. These parameters include:
 �bottom margin�
 �line length�
 �page length�
 �top margin�
 �quick reference�
 �video mode�

##

Parameter

"Parameter <name> has type void" is a �warning� message
which appears when a �parameter assignment� is attached
to a �token� that has been defined to have the void
�data type�.
##

Parameter Assignment

In any �grammar rule�, the �semantic value� of any
�rule element� may be passed to a �reduction procedure�
by means of a parameter assignment. Simply follow the
rule element with a colon and a C variable name. The C
variable name can then be used in the reduction
procedure to reference the semantic value of the token
it is attached to. AnaGram will automatically provide
necessary declarations.

Here are some examples of rule elements with parameter
assignments:

  '0-9':d
  integer:n
  expression:x
  declaration : declaration_descriptor

##

Parameter Not Defined

AnaGram does not have a �configuration parameter�
with the specified name. Please check the spelling.
##

Parameter Takes Integer Value
The specified �configuration parameter� takes
an integer value only.
##


Parameter Takes String Value

The specified �configuration parameter� takes
a string value only.
##

Parse Function

To run your parser, you call the parse function.
The name of the parse function is given by
the �parser name� �configuration parameter� and defaults to the
name of your parser file.

If your parser uses �pointer input�, you should set the �pointer�
field of the �parser control block� before calling the parser
function.

If your parser is �event driven�, you should first call the
�initializer�, and then you should call the parser function
for each input token you

If the �reentrant parser� switch is set, the parse function takes
a pointer to the �parser control block� as its sole argument. Otherwise
it takes no arguments. The parse function returns no value. All
communication is by means of the �parser control block�.

To retrieve the value of the �grammar token�, once the parse is complete,
use the �parser value function�.
##

Parser

A parser is a program or, more commonly, a procedure within
a program, which scans a sequence of �input characters�
or input tokens and accumulates them in an input
buffer or stack as determined by a set of �production�s
which constitute a �grammar�.

When the parser discovers
a sequence of tokens as defined by a �grammar rule�, or
right hand side of a production, it "reduces" the
sequence to a single �reduction token� as defined by the
left hand side of the grammar rule. This �nonterminal
token� now replaces the tokens which matched the grammar
rule and the search for matches continues.

If an input
token is encountered which will not yield a match for
any rule, it is considered a �syntax error� and some
kind of �error recovery� may be required to continue. If
a match, or �reduce action�, yields the �grammar token�,
sometimes called the �goal token� or �start token�, the
parser deems its work complete and returns to whatever
procedure may have called it.

The �Grammar Trace� and �File Trace� functions in
AnaGram provide a convenient means for understanding the
detailed operation of a syntax directed parser.

�Tokens� may have �semantic values�. If the �input
values� �configuration switch� is on, your parser will
expect semantic values to be provided by the input
process along with the token identification code. If the
input values switch is off, your parser will take the
ascii value of the input character, that is, the actual
input code, as the value of the character.

When the
parser reduces a production, it can call a �reduction
procedure� or �semantic action� to analyze the values of
the constituent tokens. This reduction procedure can
then return a value which characterizes the reduced
token.
##

Parser Control Block

A "Parser Control Block" is a structure which contains
all of the data necessary to describe the instantaneous
state of a parser. The typedef statement which defines
the structure is included in the �parser header� file
for your parser. AnaGram creates the name of the data
type for the structure by appending "_pcb_type" to the
�parser name�.

You may add your own declarations to the parser control
block by using the �extend pcb� statement.

If the �declare pcb� �configuration switch� is on, its
normal state, AnaGram will declare a parser control
block for you at the beginning of your parser file.
AnaGram will determine the name of the parser control
block by appending "_pcb" to the �parser name�. AnaGram
will also define the macro PCB as a short hand notation
for use within the parser. All references to the parser
control block within the code that AnaGram generates
are made using the PCB macro.

If you wish to declare your own parser control block,
you must include the �parser header� file for your
parser before your declaration. Then you declare a
control block and define PCB to refer to the control
block you have declared.

Suppose your grammar is called widget. You would then
write the following statements in your �embedded C�:
  #include "widget.h"
  widget_pcb_type widget_control_pcb;
  #define PCB widget_control_pcb

Alternatively, you could write the following:
  #include "widget.h"
  widget_pcb_type *widget_control_pcb_pointer;
  #define PCB (*widget_control_pcb)

and then allocate storage for the structure when
necessary.

Some fields of interest in the parser control block are
as follows:
	�input_code�
	�input_value�
	�input_context�
	�pointer�
	�token_number�
	�reduction_token�
	�ssx�
	�sn�
	�ss�[�parser stack size�]
	�vs�[parser stack size];
	�cs�[parser stack size];
	�line�
	�column�
	*�error_message�
	�error_frame_ssx�
	�error_frame_token�
##

PCB

"PCB" is a macro AnaGram defines for use in the code it
generates to refer to the �parser control block� for
your �parser�. Normally, AnaGram automatically declares
storage for a parser control block and defines PCB for
you. If you turn off the �declare PCB� switch, you may
define PCB yourself.
##

PCB_TYPE

If you are writing your parser in C++, you may prefer to derive
a class from the �parser control block� rather than use the
�extend pcb� statement. In this case you may define the
PCB_TYPE macro in your syntax file to specify your derived
class.

For instance, you have defined

class MyPcb : public parser_pcb_type {...};

You would then add the following line:

#define PCB_TYPE MyPcb

If you do not define PCB_TYPE, AnaGram will define it as the
type of your parser control block.
##

Parser File

The "parser file" is the C (or C++) file output by AnaGram when
you execute the �Build Parser� command. It contains all
of the �embedded C� from your �syntax file�, all of the
�reduction procedure�s defined in your �grammar�,
syntax tables which represent, in a condensed form, all
of the intricacies of your grammar, and a customized
�parsing engine�. The name of the parser file is given
by the �parser file name� �configuration parameter�. The
name of the �parser� itself is given by the �parser
name� configuration parameter.

If you wish the parser file to be written without carriage
returns, suitable for a Unix environment, set the �no cr�
configuration switch.
##

Parser File Name

"Parser file name" is a �configuration parameter� which
takes a string value. The default value is "#.c".
AnaGram uses this parameter to generate the name of the
output C file, or �parser file�, created by the �Build
Parser� command.  The '#' character is used in this
string as a wild card to indicate the name of the
current �syntax file�. If the first character of the
parser file name string is a '.' character, AnaGram
will substitute the name of the current working
directory for the dot. Thus ".\\#.c" will create the
file name as a complete path. This can sometimes be
important when using the �line numbers� switch to
enable a debugger to find code in your parser file.

Note that the parser file name is not the same as the
�parser name�.
##

Parser Generator

A parser generator, such as AnaGram, is a program that
converts a �grammar�, a rule-based description of the
input to a program, into a conventional, procedural
module called a �parser�. The parsers AnaGram generates
are simple C modules which can be compiled on almost
any platform. AnaGram parsers are also compatible with
C++.
##

Header File, Parser Header

When you use the command �Build Parser� to generate
source code for a parser, AnaGram creates two files, a
header file and a C source file. Unless different
paths are specified in the �parser file name� and
�header file name� parameters, both files will be
written to the directory that contains the �syntax file�.

The header file contains a number of typedef statements,
including the definition of the �parser control block�,
and a number of macro
definitions which may be useful in your parser
or in other modules of your program.

If you do not alter
the �header file name� parameter, the
name of the header file will be the same as the name of
your �syntax file� and it will have the extension ".h".

If you wish the header file to be written without carriage
returns, suitable for a Unix environment, set the �no cr�
configuration switch.
##

Parser Input

AnaGram �parser�s may be configured to accept input in any of
three different ways:

 By default, a �parse function� gets its input by invoking the
�GET_INPUT� macro each time it is ready for another input token. The
default implementation of GET_INPUT reads �input character�s from stdin.  For
most practical problems, you will want to override this definition of
GET_INPUT, storing the current input character in PCB.input_code.

 Alternatively, you may configure a parser to read input from an
array in memory. Set the �pointer input� switch and load the
�pointer� field of the parser control block before calling the
parse function. The parser will then run, incrementing the
pointer, until it finishes or encounters an error.

 The third alternative is to set the �event driven� switch. The
parser will be configured as a callback routine. Begin by calling
the �initializer�. Then, for each input character, store the
character in the �input_code� field of the parser control block and
call the parse function. Each time
you call the parse function it will continue until it needs more
input.  You can check its status by inspecting the �exit_flag� in the
parser control block.

The input to your parser may be either text characters or �tokens�
accumulated by a pre-processor, or �lexical scanner�. The latter
case is referred to as �token input�. If you use a lexical scanner,
you may find it convenient to configure your parser as event driven.

Altlhough lexical scanners are often not necessary
when you use AnaGram, if you do need one you can write it in AnaGram.
##

Parser Name

"Parser Name" is a �configuration parameter� which
defaults to "#", where "#" represents the name of your
�syntax file�. AnaGram uses this parameter to name your
�parse function�. The �initializer� for your parser will have the
same name preceded by "init_". Note that "�parser file
name�" is not the same configuration parameter as "parser
name".
##

Parser Stack

Your �parser� uses a "parser stack" to keep track of the
�grammar rules� it is trying to match and its progress
in matching them. Normally, there are two separate
stacks defined by AnaGram: �PCB�.�ss�, the �parser state
stack� which maintains �parser state� numbers, and
PCB.�vs�, the �parser value stack� which maintains the
�semantic value�s of tokens that have been identified so
far. If you wish to maintain a stack tracking other
variables you may set the �context type� �configuration
parameter�, and AnaGram will define a third stack,
PCB.�cs�. All are indexed by the same stack index,
PCB.�ssx�.

To see how tokens accumulate on the parser stack, run
the �Grammar Trace� or the �File Trace�.

Normally, when the return value of a �reduction procedure�
is stored on the parser value stack, it is stored by
simply coercing the stack pointer to the correct type.
If the return value is a C++ object, this can cause
serious problems. These problems can be avoided by
using the �wrapper� statement.
##

Parser stack alignment

Parser stack alignment is a �configuration parameter� whose
value is a C or C++ data type. It defaults to "long". If
any tokens have type "double", it will be automatically set
to double. Thus, you will normally not need to change this
parameter if your parser is to run on a PC or compatible
processor. It provides alignment control for processors
which restrict address for multibyte data access. The
default setting provides for correct operation on 64 bit
processors.

To control byte alignment of the parser stack,
�PCB�.�vs�, AnaGram normally adds a field of the
specified data type to the "union" statement which
defines the data type for the �parser stack�. This
parameter can be used to deal with byte alignment
problems when a �parser� is to be run on a processor
with byte alignment restrictions. For instance, if your
�grammar� has �token�s of type "long double" and your
processor requires long double variables to be
properly aligned, you can include the following
statement in a �configuration section� in your grammar
or in your �configuration file�:

  parser stack alignment = long double

If the data type specified is "void", no alignment declaration
will be made.
##

Parser Stack Index, Stack Index

The parser stack index, �PCB�.�ssx�, tracks the depth
of the �parser state stack�, the �parser value stack�,
and the �context stack� if you defined one. The parser
stack index is incremented by �shift actions� and
reduced by �reduce actions�.
##

Parser Stack Overflow

Your �parser� uses a �parser stack� to keep track of the
�grammar rules� it is trying to match and its progress
in matching them. If your grammar has any �recursive
rule�s that are not strictly left recursive, then no
matter how big you make the parser stack, it will be
possible to create a syntactically correct input which
will cause the stack to overflow. As a practical matter,
however, it is usually possible to set the �parser stack
size� to a value large enough so that an overflow is a
freak occurrence. Nevertheless, it is necessary to check
for overflow, and in the case overflow should occur,
your parser has to do something.  What it does is invoke
the �PARSER_STACK_OVERFLOW� macro. If you don't define
it, AnaGram will define it for you, although not
necessarily to your taste.
##

Recursive rule, Recursion

A �grammar rule� is said to be "recursive" if the �token� on the left side
of the rule also appears on the right side of the rule, or
in an �expansion rule� of any token on the right side of the rule.

If the token on the left side is the
first token on the right side, the rule is said to be "left recursive".
If it is the last token on the right side, the rule is said to be
"right recursive". Otherwise, the rule is "center recursive".

For example:
  statement list
    -> statement
    -> statement list, statement  // left recursive

  fraction part
    -> digit
    -> fraction part, digit       // right recursive

  expression
    -> factor
    -> expression, '+' + '-', factor

  factor
    -> primary
    -> factor, '*' + '/', primary

  primary
    -> number
    -> name
    -> '(', expression, ')'       // center recursive

Note that if all the tokens in the rule other then the recursive token itself
are �zero length� tokens, it is possible for the
rule to be matched arbitrarily many times without any input whatsoever. In
other words, such a rule creates an infinite loop in the parser. AnaGram can
detect this condition and issues an �empty recursion� diagnostic if it occurs.

##

PARSER_STACK_OVERFLOW

PARSER_STACK_OVERFLOW is a user definable macro. If you
do not define it, AnaGram will define it so that it
will print a message on stderr and abort the �parser� in
case of a �parser stack overflow�.
##

Parser Stack Size

"Parser stack size" is a �configuration parameter� with
a default value of 128. It is used to define the sizes
of your �parser stacks� in your �parser control block�.
When analyzing your grammar, AnaGram will determine the
minimum amount of stack space required for the deepest
left �recursion�. To this depth it will add one half the
value of the parser stack size parameter. It will then
set the actual stack size to the larger of this value
and the parser stack size parameter.
##

Parser State, State Number

The essential part of your �parser� is a group of tables
which describe in detail what to do for each "state" of
the parser.

The states of a parser are determined by sets of
"�characteristic rules�". The �State Definition Table�
shows the characteristic rules for each state of your
parser.

AnaGram numbers the states of a parser as it identifies
them, beginning with zero. In all windows, state numbers
are displayed as three digit numbers prefixed with the
letter 'S'.
##

Parser State Stack, State Stack

The parser state stack is a stack maintained by your
�parser� and which is an integral part of the parsing
process. At any point in the parse of your input
stream, the parser state stack provides a summary of
what has been found so far. The parser state stack is
stored in �PCB�.�ss� and is indexed by PCB.�ssx�, the
�parser stack index�.
##

Parser Value Stack, Value Stack

In parallel with the �parser state stack�, your parser
maintains a "value stack", �PCB�.�vs�, each entry of
which corresponds to the �semantic value� of the token
identified at that state. Since the semantic values of
different tokens might well have different �data type�s,
AnaGram gives you the opportunity, in your �syntax
file�, to define the data type for any token. AnaGram
then builds a typedef statement creating a data type
which is a union of the all the types you have defined.
AnaGram creates the name for this �data type� by
appending "_vs_type" to the �parser name�. AnaGram uses
this data type to define the value stack.
##

Parser Action

In a traditional LR parser, there are only four actions: the �shift
action�, the �reduce action�, the �accept action� and the �error
action�. AnaGram, in doing its �grammar analysis�, identifies a
number of special cases, and creates a number of extra actions which
make for faster processing, but which can be represented as
combinations of these primitive actions.

When a shift action is performed, the current state
number is pushed onto the �parser state stack� and the
new state number is determined by the current state
number and the current input token. Different tokens
cause different new states.

When a reduce action is performed, the length of the
rule being reduced is subtracted from the �parser stack
index� and the new state number is read from the top of
the parser state stack. The �reduction token� for the
rule being reduced is then used as an input token.
##

Parsing Engine

A parser consists of three basic components: A set of
syntax tables, a set of �reduction procedure�s and a
parsing engine. The parsing engine is the body of code
that interprets the parsing table, invokes input
functions, and calls the reduction procedures. The
�Build Parser� command configures a parsing engine
according to the implicit requirements of the syntax
specification and according to the explicit values of
the �configuration parameter�s.

The parsing engine itself is a simple automaton,
characterized by a set of states and a set of inputs.
The inputs are the tokens of your grammar. Each state
is represented by a list of tokens which are admissible
in that state and for each token a �parser action� to perform
and a parameter which further defines the action.

Each state in the grammar, with the exception of state
zero, has a �characteristic token� which must have been
recognized in order to jump to that state. Therefore,
the �parser state stack�, which is essentially a list
of state numbers, can also be thought of as a list of
token numbers. This is the list of tokens that have
been seen so far in the parse of your input stream.
##

Partition

If you use �character sets� in your grammar, AnaGram
will compute a "partition" of the �character universe�.
This partition is a collection of non-overlapping
character sets such that every one of the sets you have
defined can be written as a �union� of partition sets.

Each partition set is assigned a unique �token�. If one
of your character sets requires more than one partition
set to represent it, AnaGram will create appropriate
�production�s and add them to your grammar so your parser
can make the necessary distinctions.

To see how AnaGram has partitioned the character
universe, you may inspect the �Partition Sets� window
found in the �Browse Menu�.
##

Partition Set Number

Each �partition� set is identified by a unique
reference number called the partition set number.
Partition set numbers are displayed in the form Pxxx.
Partition sets are numbered starting with zero, so the
first set is P000.

To see the elements of a given partition set, call up
the �Partition Sets� window from the �Browse Menu�,
then, after selecting a partition set, call up the �Set
Elements� window from the �Auxiliary Windows� popup menu.
##

Partition Sets

The Partition Sets option in the �Browse Menu� pops up
a window which shows the complete �partition� of the
�character universe� for your parser.

The Partition Sets option in the �Auxiliary Windows� popup menu
for the �Character Sets� window lets you see the
partition sets which cover the specified character set.

Each entry in a Partition Sets window identifies a
token number and a �partition set number�. The �Auxiliary
Windows� menu provides a �Set Elements� entry which
enables you to see precisely which characters belong to
the partition set. It also has a Token Usage entry to show you
what rules the set is used in.
##

PCONTEXT

PCONTEXT is an alternate form of the �CONTEXT� macro
which takes an explicit argument to specify the
�parser control block�. PCONTEXT is defined in the �parser
header� file.
##

PERROR_CONTEXT

PERROR_CONTEXT is an alternative form of the
�ERROR_CONTEXT� macro. It differs only in that it takes
an argument so you can specify the appropriate
�parser control block� explicitly. PERROR_CONTEXT is defined in
the �parser header� file.
##

pointer

"pointer" is a field which will be included in the
�parser control block� for your parser if you have set
the �pointer input� �configuration switch�. Your main
program should set PCB.pointer before it calls your
parser. Thereafter, your parser will increment it
appropriately. When you are executing a �reduction
procedure� or a �SYNTAX_ERROR� macro PCB.pointer will
always point to the next input character to be read.
##

Pointer input

"Pointer input" is a �configuration switch� which you
may set to control �parser input�. It defaults to off. When you set
pointer input, you tell AnaGram that the input to your parser is in
memory and can be scanned simply by incrementing a pointer. Before
calling your parser you should make sure that �PCB�.�pointer� is
properly initialized to point to the first character or token in your
input.

Use the �configuration parameter� "�pointer type�" to
specify the type of the pointer. The default value of
"pointer type" is "unsigned char *"

There is no particular reason why pointer type should
be limited to variants on char. It could define a
pointer to int or a structure just as well.

If you use pointer input with structures or C++
classes, you should set the �input values� switch and
define an �INPUT_CODE�(t) macro.

If you are using a 16 bit compiler and your input array
is so large that you need "huge"
pointers, make sure that "pointer type" is properly
defined.
##

Pointer Type

"Pointer Type is a �configuration parameter� which
defaults to "unsigned char *". When you have specified
�pointer input�, AnaGram uses the value of pointer type
to declare a pointer field in your �parser control
block�.
##

Precedence, Operator Precedence

In expressions of the form a+b*c, the convention is to
perform the multiplication before the addition.
Multiplication is said to take precedence over
addition. In general the rank order in which operations
are to be performed if there are no parentheses forcing
an order of computation is called the precedence of the
operators.

If you have an ambiguous �grammar�, that is, a grammar
with a number of �conflict�s, you may use �precedence
declaration�s to resolve the conflicts and to set
operator precedence.
##

Precedence Declaration

Precedence declarations are �attribute statements� which
may be used to resolve �conflict�s in your grammar by
assigning precedence and associativity to operators.
Precedence declarations must be made inside
�configuration sections�. Each declaration consists of
the keyword �left�, �right�, or �nonassoc� followed by a
list of �rule elements�. The rule elements in the list
must be separated by commas and the entire list must be
enclosed in braces ({ }).

Each of the rule elements is assigned the same
precedence level, which is higher than that assigned in
all previous precedence declarations and lower than that
in all subsequent declarations. The rule elements are
defined to be left, right, or nonassociative,
depending on whether the keyword was "left", "right", or
"nonassoc".

All conflicts which are resolved by precedence
declarations are listed in the �Resolved Conflicts�
window.
##

Precedence Rules

AnaGram can resolve certain types of �conflict�s in your
grammar by applying precedence rules. There are three
classes of rules available:  explicit �precedence
declarations�, the "�sticky�" statement, and the
implicit rule associated with the use of a "�disregard�"
token outside a �lexeme�.

Whenever AnaGram uses a precedence rule of any kind to
resolve a conflict, it produces a �warning� message and
lists the conflict in the �Resolved Conflicts� window.
##

Previous States

The Previous States window can be accessed via the
�Auxiliary Windows� popup menu from any window that identifies
�parser state�s. It shows the �characteristic rule�s
for all of the states which jump to the presently
selected state.
##

Print File Name

"Print file name" is a configuration parameter which
is not used in the Windows version of AnaGram. It is
retained only for compatibility with pre-existing
�configuration file�s.
##

Problem States

The Problem States window is essentially a trimmed
version of the �Reduction States� window. It is
available in the �Auxiliary Windows� popup menu for the
�Conflicts� and �Resolved Conflicts� windows.

The Problem States window has the same format as the
Reduction States window, and differs only in that it
shows only those reduction states for which the
�conflict token� is acceptable input.
##

Production

Productions are the mechanism you use to describe how
complex input structures are built up out of simpler
ones. Each production has a left hand side and a right
hand side. The right hand side, or �grammar rule�, is a
sequence of �rule elements�, which may represent either
�terminal tokens� or �nonterminal tokens�. The left
hand side is a list of �reduction tokens�. In most
cases there would be only a single reduction token.
Productions with more than one �token� on the left hand
side are called �semantically determined productions�.

The "->" symbol is used to separate the left hand side
from the right hand side. If you have several
productions with the same left hand side, you can avoid
rewriting the left hand side either by using '|' or by
using another "->".

A �null production�, or empty right hand side, cannot
follow a '|'.

Productions may be written thus:
  name
   -> letter
   -> name, digit

This could also be written
  name -> letter | name, digit

In order to accommodate semantic analysis of the data,
you may attach to any grammar rule a �reduction
procedure� which will be executed when the rule is
identified. Each token may have a �semantic value�. By
using �parameter assignment�s, you may provide the
reduction procedure with access to the semantic values of
tokens that comprise the grammar rule. When it finishes, the
reduction procedure may return a value which will be
saved on the �parser value stack� as the semantic value of the
�reduction token�.
##

Productions

The �Production�s window is available via the �Auxiliary
Windows� popup menu in any window which identifies tokens.
If the token identified by the highlighted line is
�nonterminal�, the Productions window will show the
rules produced by that �token�.
##

PRULE_CONTEXT

PRULE_CONTEXT is an alternative form of the
�RULE_CONTEXT� macro. It differs only in that it takes
an argument so you can specify the appropriate �parser control block�
explicitly. PRULE_CONTEXT is defined in
the �parser header� file.
##

Quick Reference

"Quick reference" is an �obsolete configuration switch�.
##

Range Bounds Out of Order

This is a �warning� message that appears when you have a
�character range� of the form 'z-a'. AnaGram interprets
this range as being equal to 'a-z', but provides a
warning in case the unusual order was the result of a
clerical error.
##

Recursive Definition of Char Set

This �warning� appears when AnaGram discovers a
recursively defined �character set�. Character sets
cannot be defined recursively.
##

Redefinition

"Redefinition of <name>" is a �warning� message which
appears when AnaGram discovers a redefinition of a
�symbol�. The new �definition� is ignored.
##

Redefinition of Grammar Token

This �warning� appears when AnaGram encounters a new
definition of the �grammar token�. AnaGram discards the
old definition. The last definition in the syntax file
wins. If you get this warning, check your �syntax file�
to make sure you have the grammar token you want.
##

Redefinition of token

"Redefinition of token, TXXX: <name>" is a �warning�
message which occurs when AnaGram encounters a
�definition� statement and the specified �grammar token�
has already been seen on the left side of a
�production�. AnaGram will ignore the definition
statement.
##

Reduce Action, Reduction

The reduce action, or reduction, is one of the four
actions of a traditional �parsing engine�. The reduce
action is performed when the parser has succeeded in
matching all the elements of a �grammar rule�, and the
next input token is not erroneous. Reducing the grammar
rule amounts to subtracting the length of the rule from
the �parser stack index�, identifying the �reduction
token�, stacking its �semantic value� and then doing a
�shift action� with the reduction token as though it had
been input directly.
##

Reduce-Reduce Conflict

A grammar has a "reduce-reduce" �conflict� at some
state if a single token turns out to be a �reducing
token� for more than one �completed rule�.
##

Reducing Token

In a �parser state� with more than one �completed rule�,
your parser must be able to determine which one was
actually found. Therefore, during analysis of your
grammar, AnaGram examines each completed rule in order
to determine all the states the �parser� will branch to
once the rule is reduced. These states are called the
"reduction states" for the rule. In any window that
displays �marked rule�s, these states may be found in
the �Reduction States� window listed in the �Auxiliary
Windows� popup menu.

The acceptable input tokens for those states are the
"reducing tokens" for the completed rules in the state
under investigation. If there is a single token which is
a reducing token for more than one rule, then the
grammar is said to have a �reduce-reduce conflict� at
that state. If in a particular state there is both a
�shift action� and a �reduce action� for the same token
the grammar is said to have a �shift-reduce conflict� in
that state.

Note that a "reducing token" is not the same as a
"�reduction token�".
##

Reduction Choices

"Reduction choices" is a �configuration switch� which
defaults to off. If it is set, AnaGram will include in
your �parser file� a function which will identify the
acceptable choices for �reduction token� in the current
state. This function, of course, is useful only if you
are using �semantically determined productions�. The
prototype of this function is:
	int $_reduction_choices(int *);
 where '$' represents the name of your parser. You must
provide an integer array whose length is at least as
long as the maximum number of reduction choices you
might have. The function will fill the array with
the token numbers of those which are acceptable in the
current state and will return a count of the number of
acceptable choices it found.
##

reduction_token

"reduction_token" is a field in your �parser control
block�. If your grammar uses �semantically determined
productions�, your �reduction procedure�s need a
mechanism to specify which token the rule is to reduce
to. �PCB�.reduction_token names the variable which
contains the �token number� of the �reduction token�.
Prior to calling your reduction procedure, your parser
will set this field to the token number of the default
�reduction token�, i.e., the leftmost syntactically correct token in the
reduction token list for the production being reduced.
If the reduction procedure establishes that a different
reduction token is appropriate, it should store the
appropriate token number in PCB.reduction_token.
##

Reduction Procedures

The Reduction Procedures window lists the C function
prototypes for the �reduction procedure�s in your grammar.

When this window is active, the �syntax file� window, if
visible, is synchronized with it so you can see the body of
the reduction procedure as well as its usage.
##

REDUCTION_TOKEN_ERROR

REDUCTION_TOKEN_ERROR is a user definable macro which your �parser�
invokes when it encounters an inadmissible reduction
token. This error should occur only if your parser uses
�semantically determined productions� and your
�reduction procedure� provides an incorrect �token
number�. If you do not define it, AnaGram will define
it so that it will print an error message on stderr and
abort the parse.

##

Reduction Procedure, Semantic Action

A "reduction procedure", or "semantic action", is a
function you write which your �parser� executes when it
has identified the grammar rule to which the reduction
procedure is attached in your grammar.

When your parser has identified a particular �grammar
rule�, that is to say, a particular sequence of �tokens�
that you have specified in your grammar, it "reduces"
the production to the token at the head of the
production, or �reduction token�.

If you choose, you can
specify a "reduction procedure" which your parser will
call so that your program can do semantic analysis on
the production just identified. Your reduction procedure
will be called using, as arguments, the �semantic
values� of tokens on the right side of the production.

Your reduction procedure may, if you choose, return a
value which will become the semantic value of the
reduction token. Since many of the tokens in
�production�s are there for only syntactic purposes, you
may specify, when you write your grammar, the tokens
whose values are needed as arguments for your reduction
procedure.

To attach a reduction procedure to a grammar rule, just
write it immediately following the rule. There
are two formats for reduction procedures,
depending on the size and complexity of the procedure.

The first form consists of an equal sign followed by a C
expression and a semicolon. When the rule is matched the
expression will be evaluated and its value will be
stacked on the �parser value stack� as
the value of the reduction token. For example:
    =-a;
    =myProcedure(x, q);

The second form consists of an equal sign followed by a
block of C code enclosed in curly braces. If you wish to
return a value for the reduction token you have to use a
return statement. For example:
    ={
      if (x > y) return x;
      return x+2y;
     }

In both forms of the reduction procedure, �parameter
assignment�s may be attached to �rule element�s in
order to make their semantic values available to the reduction
procedure. When the reduction procedure is executed,
local variables
will defined with the names specified in the parameter
assignments. The values of these variables
will have been set to the value of the corresponding
token.

If the return value of your reduction procedure is a
C++ object, you may wish to spacify that AnaGram
enclose it in a �wrapper� so that constructor calls
and destructor calls are made. Otherwise the object
pushed onto and popped from the parser value stack simply by
coercing the stack pointer to the appropriate type.

The reduction procedures in your grammar are summarized
in the �Reduction Procedures� window.
##

Reduction States

The Reduction States window can be accessed via the
�Auxiliary Windows� popup menu from any window which displays
�parser state� numbers and �marked rule�s. If the highlighted
�grammar rule� has no marked token, the Reduction States window will
show the states the parse could reach by reducing the rule and
processing the resultant �reduction token�.
##

Reduction Token

A �token� which appears on the left hand side of a
�production� is called a reduction token. It is so
called because when the �grammar rule� on the right side
of the production is matched in the input stream, your
�parser� will "reduce" the sequence of tokens which
matches the rule by replacing the sequence of tokens
with the reduction token.

If more than one
reduction token is specified,
the production is called a �semantically determined production�
and your �reduction procedure�
should choose the appropriate reduction token. If it does not, your parser
will use the first token in the list that is syntactically
correct as the default.

The �CHANGE_REDUCTION� macro can be used to specify the reduction
token.

Note that a "reduction token" is not the same as a
"�reducing token�".
##

Reduction Trace

The Reduction Trace window is available from the
�Conflicts� window and the �Resolved Conflicts� window.
It can be used in conjunction with the �Conflict Trace�
to study �conflict�s. The Reduction Trace represents the
result of taking the reduce option in the conflict state
of the Conflict Trace.
##

Reentrant Parser

"Reentrant parser" is a �configuration switch� which defaults to off.
If it is on when AnaGram builds a parser AnaGram will generate code that
passes the pointer to the �parser control block� via calling sequences,
rather than using static references to the pcb.

You can use the reentrant parser switch to help make �thread safe
parsers�.

The reentrant parser switch is compatible with both C and C++.

The reentrant parser switch cannot be used in conjunction with
the �old style� switch.

When you have enabled the reentrant parser switch, the �parse function�,
the �initializer� function, and the �parser value function�
will be defined to take a pointer to the parser control block as
their sole argument.
##

Reload Button

The �File Trace� window includes a reload button to allow
you to reread your �test file� after you have modified
it without having to start a new file trace. After the
file has been reread, the file trace is reset.
##

rename macro

AnaGram uses a number of macros in its generated code.
It is possible, therefore, to run into naming
collisions with other components of your program. The
rename macro �attribute statement� allows you to change
the name AnaGram uses for a particular macro to avoid
these problems.

For example, in the Microsoft
Foundation Classes, V4.2, there is a class called
"CONTEXT". If you use the �context stack� option in
AnaGram, your �parser� will have a macro called
�CONTEXT�. To avoid the name collision, add the
following attribute statement to any configuration
section in your grammar:
	rename macro CONTEXT AG_CONTEXT
Then, simply use "AG_CONTEXT" where you would otherwise
have used "CONTEXT".
##

reserve keywords

"reserve keywords" is an �attribute statement� which
can be used to specify a list of �keyword�s that are
reserved and cannot be used except as explicitly
specified in the grammar. In particular this switch
enables AnaGram to avoid issuing meaningless �keyword
anomaly� warnings.

AnaGram does not automatically presume that keywords
are also reserved words, since in many grammars there
is no need to specify reserved words.

Reserve keywords statements must be made inside
�configuration sections�. Each statement consists of
the keyword "reserve keywords" followed by a list of
keyword �tokens�.  The tokens must be separated by
commas and the list must be enclosed in braces ({ }).
Each keyword listed will then be treated as a reserved
word.
##

Reset Button

The Reset button, found on �File Trace� and �Grammar
Trace� windows restores the initial configuration of
the trace. This is especially convenient for �Conflict
Trace� or other �Auxiliary Trace�s.
##

Resolved Conflicts

AnaGram creates the Resolved Conflicts window only when
the grammar it is analyzing has �conflict�s and when
those conflicts have been resolved by �precedence
declaration�s, by "�sticky�" statements, or in
connection with the explicit use of a token specified in
a �disregard� statement. The Resolved Conflicts window
shows the conflicts that have been resolved, using the
same format as that of the �Conflicts� Window. The rule
chosen is marked with an asterisk in the leftmost column
of the window.
##

Resynchronization

"Resynchronization" is the process of getting your
parser back in step with its input after encountering a
�syntax error�. As such, it is one method of �error
recovery�. Of course, you would resynchronize only if it
is necessary to continue after the error. There are
several options available when using AnaGram. You could
use the �auto resynch� option, which causes AnaGram to
incorporate an automatic resynchronizing procedure into
your parser, or you could use the �error token
resynchronization� option, which is similar to the
technique used by YACC programmers.
##

right

"right" controls a �precedence declaration�, indicating
that all of the listed �rule elements� are to be
considered �right associative�.
##

Right Associative

A binary operator is said to be right associative if
an expression with repeated instances of the operator
is to be evaluated from the right. Thus, for example,
when '=' is used as an assignment operator
	x = a = b
is normally taken to mean a = b followed by x = a The
assignment operator is said to be right associative.

In �grammar�s with �conflict�s, you may use �precedence
declaration�s to specify that an operator should be
right associative.
##

Rule Context

The Rule Context window can be accessed via the
�Auxiliary Windows� menu in any window that displays
�grammar rule�s. AnaGram displays all occurrences in the
�grammar� of all the �reduction token�s for the rule.
##

RULE_CONTEXT

RULE_CONTEXT is a macro you may use if you have defined
a �context stack�. In any reduction procedure,
RULE_CONTEXT will be a pointer to the context value
stacked before the first token of the rule being
reduced. Since the context stack contains an entry for
each token in the rule, you may inspect the context
value for each token in the rule by subscripting
RULE_CONTEXT. RULE_CONTEXT[k] is the context of the
(k-1)th token in the rule.
##

Rule Coverage

"Rule Coverage" is the name of both a �configuration
switch� and a window. The configuration switch
defaults to off. If you set it, AnaGram will include
code in your �parser� to count the number of times your
parser identifies each �grammar rule� in your grammar.
To maintain the counts, AnaGram declares, at the
beginning of your parser, an integer array, whose name
is created by appending "_nrc" to your �parser name�.
The array contains one counter for each rule you have
defined in your grammar. There are no entries for the
auxiliary rules that AnaGram creates to deal with set
overlaps or �disregard� statements. In order to identify
positively all the rules that the parser reduces,
AnaGram has to turn off certain optimization features in
your parser. Therefore a parser that has rule coverage
enabled will run slightly slower that one with the
switch off.

In addition, AnaGram creates a pair of functions to
write the counters to a file and to initialize the
counters from a file. The names of these functions are
given by appending "_write_counts" and "_read_counts" to
the name of your parser. The name of the file is given by the
�coverage file name� paramater which defaults
to the name of your �syntax file� but with the extension ".nrc".

If rule coverage is enabled, AnaGram will also enable the
Rule Coverage option on the �Browse Menu�. If you select
Rule Coverage, AnaGram will initialize a �Rule Coverage�
window from the rule count file you select.

AnaGram will
warn you if the rule count file is older than
the syntax file, since under those conditions, the
coverage file might be invalid.
##

Rule Derivation, Token Derivation

You can use the Rule Derivation and Token Derivation
windows to understand the nature of �conflict�s in your
grammar. To create these windows, open the �Conflicts�
window. Move the cursor bar to a �completed rule�, that
is, one which has no marked token. Press the right mouse button to pop
up the �Auxiliary Windows� menu. You may then select the Rule
Derivation or the Token Derivation.

The Rule Derivation window and the Token Derivation
window, together, show how a �conflict�, or ambiguity,
has arisen in your grammar. Both windows contain a
sequence of rules, and both begin with the same rule,
the rule which is the root cause of the conflict.

Each subsequent line in the rule derivation is an
�expansion� of the marked token in
the previous rule. The last rule in the derivation
window is the rule you selected in the Conflicts
window. Thus the rule derivation window shows you how
the rule involved in the conflict derives from the
root.

Each subsequent line in the token derivation window
shows an expansion of the marked token in the previous rule. The first
token of the last rule in the derivation window is the token that
causes the conflict. This is the usage that is inconsistent with other
usages of this token in the conflict state.

The Rule Derivation and Token Derivation windows each
have five auxiliary windows. The �Rule Context� window
is keyed to the highlighted rule. the other four
windows, the �Expansion Rules� window, the
�Productions� window, the �Set Elements� window and the
�Token Usage� window are keyed to the marked token.
Remember that there is no marked token on the last
line of the Rule Derivation window.
##

Rule Element

A �grammar rule� is a list of "rule elements", separated
by commas. Rule elements may be �token name�s,
�character sets�, �keyword�s, �immediate action�s, or
�virtual productions�. When AnaGram encounters a rule
element for which no token presently exists, it creates
one.

Any rule element may be followed by a �parameter assignment�
in order to make the �semantic value� of
the rule element available to a �reduction procedure�.
##

Rule Number

AnaGram assigns a unique rule number to each �grammar
rule� that you specify in your grammar. Rules are
numbered sequentially as they are encountered in the
�syntax file�. AnaGram constructs rule 0 itself. Rule
zero has a single �rule element�, the �grammar token�,
unless you have an �disregard� statement in your
grammar. In this case, there will be two elements.

In AnaGram displays, rule numbers are displayed with a
prefixed 'R' and a three digit decimal number.
##

Rule Stack, Rule Stack Pane

The Rule Stack pane appears across the bottom of a �Grammar
Trace� or �File Trace� window. It provides an alternate view of the
parser stack for the trace, showing, for each state, rules instead of
the tokens that you see in the �Parser Stack pane�. Because it is
synched with the syntax file window, the Rule Stack makes it easy to
see the relationship between the trace and your grammar.

For each level of the parser stack, the Rule Stack shows the �parser
state� number and all the active rules. The active rules at any
state consist of all the �expansion rule�s for the state that are
consistent with the input at all subsequent states.

Except for the last level
of the stack, each rule has a �marked token�, which in the default
configuration is displayed in bold, italic type. The significance of
the marked token is that all tokens in the rule to the left of the
marked token have already been matched in the input, and the input
in subsequent levels is consistent so far with the marked
token. As more input is processed, rules
that are inconsistent with the new input are deleted from the display.

The last level of the stack shows the current state of the parser and
the rules against which the �lookahead token� will be matched. At
this level, there may be rules with no marked tokens. These are
rules which have been matched exactly in the input. If there is
more than one such rule, at the next parser step the parser will use
the lookahead token to determine which rule to reduce.

In the last level of the stack, marked tokens represent the input the
parser expects to see.

The Rule Stack pane is synched with the �syntax file� window if it is
visible so that the rule highlighted in the Rule Stack can be seen
in context in the syntax file.
For rules that AnaGram
generated automatically (to implement �virtual productions�
or the �disregard� statement). the cursor bar will move to the
top of the syntax file window.

The Rule Stack pane is also synched with the other panes in the trace.
As you move the cursor bar in the Rule Stack, the cursor bar in the
Parser Stack pane will track the stack level in the Rule Stack. In
a File Trace, text will be highlighted in the �Test File� pane
corresponding to the selected token in the Parser Stack pane. In a
Grammar Trace, the marked token in the highlighted rule will be
highlighted in the �Allowable Input pane�.

Clicking the right mouse button pops up an �Auxiliary Windows� menu to
give you more information about the highlighted rule.
##

Rule Table

The Rule Table lists, in numerical order, all the
�grammar rule�s defined in your �grammar�. Each rule is
preceded by the �nonterminal� tokens which produce it.
If you are not using �semantically determined
production�s, then there will be precisely one token
line per rule. The Rule Table is synched to your �syntax
file� to show the rule in context.
##

Semantic Value, Token Value

A �token� generally has a "semantic value", or "token
value", as well as the �token number� which identifies
it syntactically.  Each instance of the token in the
input stream can have a different value. For example,
you might have a token called "variable name". In one
instance the variable name might be "widget" and in
another, "wombat". Then "widget" and "wombat" would be
the semantic values in the two instances. Another token
might have numeric semantic values.

You can specify the C or C++ �data type� of the token value.
The data type of "variable name" could be "char *"
where the value is a pointer to a string holding the name. There
are separate default types for the values of �terminal�
and �nonterminal� tokens. In the usual case of ordinary
character input, the value of a terminal token is just
the ascii character code.

The value of a nonterminal token is determined by the �reduction procedure�s
attached to the rules the token produces. If there is no reduction
procedure, the value of the token is the value of the first token
in the rule.

It should be noted that the stack operations have been
implemented in such a way that a C++ object that belongs
to a class for which the assignment operator has been
overridden will encounter serious problems. This shortcoming
will be addressed in a future version of AnaGram. Note that
there is no problem with using a pointer to any C++ object.
##

Semantically Determined Production

A "semantically determined production" is one which has
more than one �reduction token� specified on the left
side of the �production�. You would write such a
production when the reduction tokens are syntactically
indistinguishable. The �reduction procedure� may then
specify which of the listed reduction tokens the grammar
rule is to reduce to based on semantic considerations.
If there is no reduction procedure, or the reduction
procedure does not specify a reduction token, the parser
will use the first syntactically correct one in the list.

To simplify changing the reduction token, AnaGram
provides a predefined macro, �CHANGE_REDUCTION�.

The �semantic value�s of all the reduction tokens for a
given semantically determined production must have the
same �data type�.

�File Trace� and �Grammar Trace� have a �Reduction Choices pane� which
appears when a semantically determined production is invoked and
you need to choose a reduction token.
##

Set Elements

The Set Elements window is available via the �Auxiliary
Windows� popup menu from windows which specify character sets,
partition sets or tokens. It displays the actual
characters which make up the set, or which map to the
specified token. For each character, the numeric code as
well as its display symbol is given.
##

Set Expression, Expression

A set expression is an algebraic expression used to
define a �character set� in terms of individual
characters, ranges of characters, or other sets of
characters as constructed using �complements�, �unions�,
�intersections�, and �differences�.
##

Shift Action

The shift action is one of the four actions of a
traditional �parsing engine�. The shift action is
performed when the input token matches one of the
acceptable input tokens for the current �parser state�.
The �semantic value� of the token and the current
�state number� are stacked, the �parser stack index� is
incremented and the state number is set to a value
determined by the previous state and the input token.
##

Shift-Reduce Conflict

A "shift-reduce" �conflict� occurs if in some �parser
state� there exists a �terminal token� that should be
shifted, because it is legitimate input for one of the
�grammar rule�s of the state, but should also be used to
reduce some other rule because it is a �reducing token�
for that rule.
##

sn

sn is a field in a �parser control block� to which your
�error handling� routines and your �reduction
procedure�s may refer. Its value is the current �state
number� of your �parser�. sn is modified every time
your parser "shifts" (performs a �shift action� on) a
token or reduces (performs a �reduce action� on) a
�production�.
##

ss

ss is a field in a �parser control block� to which your
�error handling� and �reduction procedure�s may refer.
It is the �state stack� for your �parser�. Before every
�shift action�, the current �state number�, �sn�, is
stored in PCB.ss[PCB.ssx], where �ssx� is the �parser
stack index�. PCB.ssx is then incremented.
##

ssx

ssx is a field in a �parser control block� to which
your �error handling� routines and �reduction
procedure�s may refer. It is the �parser stack index�
for your �parser�. On every �shift action� it is
incremented. On every �reduce action� the length of
the �grammar rule� being reduced is subtracted from
PCB.ssx.
##

State Definition

The State Definition window can be accessed via the
�Auxiliary Windows� popup menu from any window that specifies
states. It displays the �characteristic rules� that
define the state. The rules are displayed with a marked token, which is
the next token needed in the input if the particular �grammar rule� is
to be matched. If the rule is a completed rule, no token will be
marked.

Each line contains the state number, blank if it is the
same as the state number of the previous line, the �rule
number�, and finally the �marked rule�.

The �State Definition Table�, found in the �Browse
Menu�, displays the characteristic rules for all states
in the �grammar�.
##

State Definition Table

The State Definition Table lists, for each �parser
state�, all of the �characteristic rules� which define
that state. The rules are displayed with a �marked token�, which is the
next token needed in the input if the particular �grammar rule� is to
be matched. If the rule is a completed rule, no token will be
marked.

Each line contains the state number, blank if it is the
same as the state number of the previous line, the �rule
number�, and finally the �marked rule�.

In the �Auxiliary Windows� menu for many states there is
a �State Definition� entry which provides the
characteristic rules for the �parser state� identified by
the cursor bar.
##

State Expansion

The State Expansion window may be accessed using the
�Auxiliary Windows� menu from any window that identifies
a particular �parser state�. It shows the complete set
of �expansion rule�s for the state, consisting of the
union of the set of �characteristic rule�s and, for each
characteristic rule, the set of expansion rules for the
marked token. Thus the State
Expansion window shows all possible legal input to your
parser in the given state.
##

Sticky

"Sticky" statements are �attribute statement�s and may
be used just like a �precedence declaration� to resolve
�conflict�s. If a �shift-reduce conflict� occurs in a
state where the �characteristic token� is "sticky", the
shift action will always be chosen.

Sticky statements must be made inside �configuration
sections�. Each statement consists of the keyword
"sticky" followed by a list of �tokens�. The tokens must
be separated by commas and the list must be enclosed in
braces ({ }). Each token will then be treated as sticky.

All conflicts which are resolved by sticky statements
are listed in the �Resolved Conflicts� window.
##

subgrammar

Declaring a nonterminal token to be a "subgrammar"
changes the way AnaGram searches for reducing tokens.

Normally, if there is a completed rule in a particular
state, AnaGram investigates all states to which the
parser could jump on reducing the rule. It then
considers all terminal tokens that are acceptable input
in these states to be reducing tokens for the given
rule. If this set of tokens overlaps the set of tokens
for which there are shift actions, or the set of tokens
which reduce a different rule, there is a �conflict�.

Now consider a particular nonterminal token T and all
the rules it produces, whether directly or indirectly.
What the preceding remarks mean is that in determining
the reducing tokens for any of these rules, AnaGram
considers not only the definition, but also the usage
of T.

There are circumstances when it is inappropriate to
consider the usage of T. The most common example occurs
when building a lexical scanner for a language such as
C. In this case, you can write a complete grammar for a
C token with no difficulty. But if you try to extend it
to a sequence of tokens, you get scores of conflicts.
This situation arises because you specify that any C
token can follow another, when in actual practice, an
identifier, for example, cannot follow another
identifier without some intervening space or
punctuation. While it is theoretically possible to write
a grammar for a sequence of tokens that has no
conflicts, it is not usually pretty.

The subgrammar declaration resolves this problem by
telling AnaGram that when it is looking for reducing
tokens for any rule produced directly or indirectly by a
subgrammar token, it should disregard the usage of the
token and only consider usage internal to the definition
of the subgrammar token, as though the subgrammar token
were the start token of the grammar.

The subgrammar declaration is made in a �configuration
section� and consists of the keyword "subgrammar"
followed by a list of token names separated by
commas and enclosed in braces ({ }). For example:
	subgrammar { name, number}
##

Suspicious Production

This �warning� message appears when AnaGram finds a
�production� of the form x -> x. There is probably a
typo somewhere in your �syntax file�. This production
causes a �conflict� in your grammar. AnaGram leaves
this production in your �grammar�, but if you build a
parser, it will never succeed in recognizing this
production.
##

Switch Takes on/off Values Only

The specified parameter is a �configuration switch�. The
only values it may be assigned are ON and OFF.

##

Symbol

In writing your �grammar� you use symbols, or names, to
represent most of your �tokens�. You may also use
symbols to represent �character set�s, �virtual
production�s, �immediate action�s, or �keyword�s.

A symbol, or name, must begin with a letter or an
underscore. It may then contain any number of these
characters as well as digits and embedded white space
(including comments). For identification purposes all
adjacent white space characters within a symbol name
are considered to be a single blank.

Upper case and lower case letters are considered to be
different.

Examples:
	token name
	token/*embedded comment*/name

 All symbols used in your grammar are listed in
the �Symbol Table� window found in the �Browse Menu�.
##

Symbol Table

The Symbol Table lists all the symbols, or names, you
used in your grammar. �Symbol�s may be used, of course,
to identify �tokens�, �definitions�, �virtual
productions�, �immediate action�s, or �keyword�s.

Each line in this table identifies a single symbol. The
first field is the token number, if any. This is
followed by the name. If the name identifies an
�expression� or virtual production, it is followed by an
equal sign and the expression or virtual production.
##

Syntax Analysis Aborted

This �warning� message appears if, because of previous
errors, AnaGram is unable to complete the �Analyze
Grammar� command on your �syntax file�.
##

Syntax Directed Parsing

Syntax directed parsing, or formal parsing, is an
approach to building �parsers� based on formal language
theory. Given a suitable description of a language,
called a �grammar�, there are algorithms which can be
used to create parsers for the language automatically.
In this context, the set of all possible inputs to a
program may be considered to constitute a language, and
the rules for formulating the input to the program
constitute the grammar for the language.

The parsers built from a grammar have the advantage
that they can recognize any input that conforms to the
rules, and can reject as erroneous any input that fails
to conform.

Since the program logic necessary to parse input is
often extremely intricate, programs which use formal
parsing are usually much more reliable than those built
by hand. They are also much easier to maintain, since
it is much easier to modify a grammar specification
than it is to modify complex program logic.
##

Syntax Error

When you specify a �grammar�, you specify a set of
input character or token sequences which your �parser�
will "recognize". Usually it is possible for there to
be other sequences of input tokens which deviate from
the rules set down by your grammar. Should your parser
find such a sequence in its input which is not
explicitly allowed for in your grammar, it is said to
have found a "syntax error". The general treatment of
syntax errors is called �error handling�, of which there
are two distinct aspects: �error diagnosis� and �error
recovery�. AnaGram allows you to make provision for
error handling to fit your needs, but should you not do
so, it will provide simple default error handling.
##

Statements

AnaGram source files, or �syntax file�s, consist of
the following types of statements:
	 �production�s
	 �configuration section�s
	 �embedded C�
	 �definition�s
	 �token declaration�s

 Statements may be in any order. Each statement must
begin on a new line. If a statement cannot be
construed as complete, it may continue onto another
line.

Statements may contain spaces, tabs or comments, but
may not contain blank lines.
##

Syntax File

Input files to AnaGram are called syntax files. The
default extension for syntax files is .syn. A
syntax file contains a "�grammar�" and supporting C or
C++ code.  The file consists of several distinct types
of statements. These are �token declarations�,
�production�s, �definitions�, �embedded C�, and
�configuration sections�. There may be as many of each
as you need, in whatever order you find convenient.

Each such statement begins on a new line.
##

SYNTAX_ERROR

SYNTAX_ERROR is a macro which your parser will invoke
when it encounters a syntax error in its input stream.
If you have set the �diagnose errors� �configuration
switch�, the static variable �PCB�.�syntax_error� will
contain a pointer to a diagnostic message when
SYNTAX_ERROR is invoked. If you have also set the
�error frame� switch, �PCB�.�error_frame_ssx� and
�PCB�.�error_frame_token� will also be set
appropriately.
##

Tab Spacing

"tab spacing" is a �configuration parameter� which
controls the expansion of tabs when AnaGram displays
your source file or test files in the �File Trace� window.

The value of "tab spacing" is also used to set the
default value of the �TAB_SPACING� macro in your parser.

The default value of "tab spacing" is 8. If you prefer
a different value, you should probably include an
appropriate statement in your �configuration file�. For
example:

	tab spacing = 2
##

TAB_SPACING

If you have enabled the �lines and columns� switch, your
parser needs to know tab spacing in order to increment
the column count when it encounters a tab character. It
is set up to use the value given by the TAB_SPACING
macro. If you do not define TAB_SPACING in your parser,
AnaGram will provide a default definition, setting it to
the value of the �tab spacing� �configuration
parameter�.
##

Terminal, Terminal Token

A "terminal token" is a token which does not appear on
the left side of a �production�. It represents,
therefore, a basic unit of input to your �parser�.  If
the input to your parser consists of ascii characters,
you may define terminal tokens explicitly as ascii
characters or as sets of ascii characters. If you have a
lexical scanner, or preprocessor, which produces numeric
codes, you may define the terminal tokens directly in
terms of these numeric codes.
##

Test File Binary

"Test file binary" is a �configuration switch� which
defaults to off. When it is off, and you select the
�File Trace� option, AnaGram will read your test files
in "text" mode, discarding carriage return characters.
When "test file binary" is on, AnaGram will read test
files in "binary" mode, preserving carriage return
characters.

If your parser needs to recognize carriage return
characters explicitly, you should turn "test file
binary" on.
##

Test File Mask

"Test file mask" is a string-valued �configuration
parameter� which AnaGram uses to set up the file dialog
for the �File Trace� command. It defaults to "*.*". If
there is a conventional file name format for the input
to the �parser� you are developing, you will probably
want to set "test file mask" in a �configuration
section� in your �syntax file� so it is easier to pick
out your test files.
##

Test range

"Test range" is a �configuration switch� which defaults
to on. When it is set, i.e., on, AnaGram will configure
your parser so that it checks input characters to
verify that they are within the range given by the
�character universe� before it indexes the �token
conversion� table. If range testing is not necessary
for your parser, you may turn test range off and get a
slight improvement in the performance of your parser.
##

Thread Safe Parsers

AnaGram 2.01 incorporates several changes designed to make it
easier to write thread safe parsers.

First, the �parser�s generated by AnaGram 2.01 no longer use static or global
variables to store temporary data. All nonconstant data have been
moved to the �parser control block�.

Second, two new features which make it substantially
easier to build thread safe parsers have been added. The �reentrant parser� switch
makes the entire parser reentrant, by passing the pointer to the parser control
block as an argument on all function calls. The �extend pcb� statement allows
you to add your own variable declarations to the �parser control
block� so you can avoid references to global or static variables in
your �reduction procedure�s.

Third, new support has been added for C++ classes, including
the �wrapper� statement and the �PCB_TYPE� macro.
##

token_number

token_number is a field in a �parser control block� to
which your �error handling� procedures and �reduction
procedure�s may refer. It contains the actual �token
number� of the current input token. If you are supplying
token numbers directly, it is the result of using the
actual input character to index the �token conversion�
array, ag_tcv.
##

Token

Tokens are the units with which your parser works.
There are two kinds of tokens: �terminal tokens� and
�nonterminal tokens�. These latter are identified by the
parser as sequences of tokens. The grouping of tokens
into more complex tokens is governed by the �grammar
rules�, or �production�s in your grammar. In your
grammar, tokens are denoted by �token name�s, �virtual
productions�, explicit �character representations�,
�keyword�s, �immediate action�s, or �expression�s which
yield �character sets�.
##

Token Conversion

By using �character set� �expression�s, you may in your
�syntax file� define a number of input characters as
being syntactically equivalent. When your �parser� gets
an input character, it uses the character code to index
a table called �ag_tcv�. The value it extracts from this
table is the �token number� for the input character. The
actual character code of the input character becomes the
�token value�.
##

Token Declaration

A token declaration is simply a �production� with no
right hand side. Token declarations can be used to
define the �data type�s of tokens. To define the data type
of a token, simply put the data type in parentheses
preceding the name of the token. You can use a list of
tokens joined by commas, if you wish.  Thus:
	(char *) variable name, function name
could be used to specify that the �semantic value�s of
the tokens "variable name" and "function name" are both
character pointers.

Of course, token types may be specified as part of any
production the token generates, but sometimes, in the
interest of clarity, it is advisable to group all
declarations together.
##

Token Name

All �nonterminal tokens� that you define in your
�grammar� by means of explicit �production�s must have
names by which they may be referenced. Token names are
�symbols� which represent the token syntactically in
your grammar specification.
##

Token Names

"Token names" is a �configuration switch� that defaults
to off. If it is set, it causes AnaGram to include in
the �parser file� a static array of character strings, indexed by
token number, which provides ascii representations of token
names. The name of this array is given by "<parser name>_token_names",
where <parser name> is the name of the parser function as
given by the value of the �parser name� parameter.

AnaGram also defines a macro, �TOKEN_NAMES�, which evaluates
to the name of the array.

The array contains strings for all grammar tokens which have
been explicitly named in the syntax file as well as tokens
which represent �keyword�s or single character constants.

The array is useful in creating �syntax error� diagnostics.

Prior to version 2.01 of AnaGram, the TOKEN_NAMES array contained
strings only for explicitly named tokens. If this restriction
is required, set the �token names only� switch.

Token names are also included if the �diagnose errors�
switch is set.
##

TOKEN_NAMES

"TOKEN_NAMES" is the name of a macro that AnaGram defines to
provide access to a static array of character strings indexed by
token number, which provides ascii representation of token
names. The array is generated if any of the �token names�,
�token names only� or �diagnose errors� switches are ON.

If �token names only� is set, the array contains non-empty
strings only for those tokens which are explicitly named
in the syntax file. Otherwise, the array also contains
strings for tokens which represent keywords or single
character constants.
##


token names only

"Token names only" is a �configuration switch� that defaults to
off. If it is set, it will cause AnaGram to include in the
parser file a static array containing the names of the tokens
in your grammar. This array will include only those tokens
to which you have assigned names explicitly and will not
include character constants or keywords. "Token names only"
takes precedence over �token names�.
##

Token Not Used

"Token not used, TXXX: <token name> is a �warning�
message which appears if AnaGram finds an unused �token�
in your �grammar�. Often an unused token is the result
of an oversight of some kind and indicates a problem in
the grammar.
##

Token Number

AnaGram assigns a unique number, called the "token
number" to each token in the grammar, no matter whether
it is a �terminal token� or a �nonterminal token�. Your
parser does all of its analysis of your input stream
using token numbers as its primary material.

You may need to know the values of token numbers that
AnaGram has assigned, either so a lexical scanner can
output correct token numbers, or so a �reduction
procedure� can correctly resolve a �semantically
determined production�.

To help you, AnaGram defines enumeration constants for
each of the named tokens in your grammar. The definition
of these constants is in the �parser header� file.
##

Token Representation

Not all of the �tokens� in your grammar have a �token
name�. Some of the tokens may represent �character sets�
which you spelled out explicitly, �virtual productions�,
�immediate action�s, or �keyword�s. In its analysis
tables, AnaGram tries to provide a meaningful
representation for tokens whenever it can. Its first
choice is to use the name, if it has one. Otherwise it
will use the set definition or the definition of the
virtual production if one exists. If AnaGram cannot
otherwise represent your token, it will resort to using
the token number which it normally represents using the
letter T followed by a three digit, zero-padded token
number.
##

Token Table

The Token Table lists all the tokens of your grammar.
The first field is the token number. It is followed by a
flag field which is "zl" if the token is a �nonterminal
token� and is �zero length�. If the token is nonterminal
and not zero length, the flag field contains "nt". If
the token is a �terminal token�, the field is blank.

The next field is blank unless the token has been
declared �sticky� or has had a �precedence� level
assigned. If the token is sticky, this field will
contain 's'. If a precedence level has been assigned,
this field will contain the letter 'l', 'r', or 'n' to
indicate associativity followed by the precedence
level. Finally there is the �data type� of the �semantic
value� of this token and the �token representation�.
##

Token Usage

The Token Usage table may be accessed via the �Auxiliary
Windows� menu from any window that identifies tokens. It
shows all the rules in the grammar that use the token.
##

Top Margin

"Top margin" is an �obsolete configuration parameter�.
##

Trace Coverage

Trace Coverage is a table which is built whenever you
run �Grammar Trace�, one of its pre-built versions, or a �File
Trace�.  You can access it from the �Browse Menu�. It shows the number
of times each rule in your grammar has been reduced. Unless you have
set the �Rule Coverage� �configuration switch�, some �null production�s
and some rules that consist of only one element will not be counted
because of speed optimizations in the parser tables.

The Trace Coverage tables are reset to zero when you load a new syntax
file or start AnaGram.
##

Compound Action

Traditionally, �LALR-1 parser�s use only four simple
�parser action�s: shift, reduce, accept and error.
AnaGram parsers use a number of compound actions
in order to reduce the size of parse tables and
speed up processing. A single compound action
may replace several simple shift or reduce actions.

The �Traditional Engine� �configuration switch� may
be used to force AnaGram to use only the simple
actions.
##

Traditional Engine

"Traditional engine" is a �configuration switch� that
defaults to off. Traditional �LALR-1 parser�s use a
�parsing engine� which has only four actions:
 �shift action�
 �reduce action�
 �accept action�
 �error action�


AnaGram, in the interest of
faster execution and more compact parse tables,
uses a parsing engine with a number of
short-cut, or �compound action�s. The "traditional engine" switch tells
AnaGram not to use the short-cut actions.

You would turn this switch on if you wished to use the �Grammar Trace�
or �File Trace� to see how the standard four parser actions work for
a particular combination of grammar and input. Note that to see the
effects of single parser actions, you must use the �Single Step�
button. Remember that in the Grammar Trace, when you single step and
the token you have selected causes a reduce action, it will appear
on the �lookahead line� of the �parser stack pane� and will be preselected
in the �allowable input pane� until it is finally shifted in to
the parser stack.

Normally, you should leave the "traditional engine" switch off, Then
AnaGram will, whenever possible, compress several parsing actions into
one compound action in order to speed execution of the parser.

Unfortunately use of the term "traditional" has sometimes created the
impression that there is a conservative aspect to the operation of
traditional engine parsers. This is not the case. They have the same
effect, but are slower and have much larger tables.
##

Type Redefinition

"Type Redefinition of TXXX: <token name> is a �warning�
message which appears when AnaGram finds a conflicting
�data type� definition for a �token� in your �grammar�.
The new definition will override the previous one. If
you intend to use different type definitions, you should
use extreme caution and check the generated code to
verify that your �reduction procedure�s are getting the
values you intended.
##

Undefined Symbol

"Undefined symbol: <name>" is a �warning� message which
appears when AnaGram encounters an undefined �symbol�
while evaluating a �character set� expression. The
following warning in the �Warnings� window identifies
the particular �token� AnaGram was trying to evaluate.
##

Undefined Token

"Undefined token TXXX: <name>" is a �warning� message
which appears when the indicated �token� has been used
in the �grammar�, but there is no definition of it as a
�terminal token� nor does any �production� define it as
a �nonterminal token�.
##

Unexpected

"Unexpected <element 1> in <element 2>" is a �warning�
message which you may get when AnaGram analyzes your
grammar. It appears when AnaGram unexpectedly encounters an instance of
syntactic element 1 at the specified location in an instance of
syntactic element 2.  AnaGram cannot reliably continue parsing its
input.  Therefore, it limits further analysis to scanning for syntax
errors. If this error is not the result of a prior error, you should
correct your �syntax file�.  Remember that this error could result from
something missing just as well as from something extraneous.

If element 1 is �eof�, it often means that you have
an unbalanced brace or comment delimiter in the code
following the indicated location.
##

Union

The union of two sets is the set of all elements that
are to be found in one or another of the two sets. In an
AnaGram syntax file the union of two �character sets� A
and B is represented using the plus sign, as in A + B.
The union operator has the same precedence as the
�difference� operator: lower than that of �intersection�
and �complement�. The union operator is �left
associative�.

Watch out! In an AnaGram syntax file 65 + 97 represents
the character set which consists of the lower case 'a'
and upper case 'A'. It does not represent 162, the sum
of 65 and 97.
##

Video mode

"Video mode" is an �obsolete configuration parameter�.
##

Virtual Production

Virtual productions are a special short hand
representation of �grammar rules� which can be used to
indicate a choice of inputs. They are an important
convenience, especially useful when you are first
building a grammar.

Here are some examples of virtual productions:
	name?						// optional name
	name?...					// 0 or more instances of name
	{name | number}			// exactly one name or number
	{name | number}...			// one or more instances of name or number
	[name | number]			// optional choice of name or number
	[name | number]...			// zero or more instances of name or number

 AnaGram rewrites virtual productions, so that when you
look at the syntax tables in AnaGram, there will be
actual �production�s replacing the virtual productions.

A virtual production appears as one of the rule
elements in a grammar rule, i.e. as one of the members
of the list on the right side of a production.

The simplest virtual production is the "optional"
token. If x is an arbitrary token, x? can be used to
indicate an optional x.

Related virtual productions are x... and x?...  where
the three dots indicate repetition. x... represents an
arbitrary number of occurrences of x, but at least one.
x?... represents zero or more occurrences of x.

The remaining virtual productions use curly or square
brackets to enclose a sequence of rules. The brackets
may be followed variously by nothing, a string of three
dots, or a slash, to indicate the choices to be made
from the rules. Note that rules may be used, not merely
tokens.

If r1 through rn are a set of �grammar rules�, then
	{r1 | r2 | ... | rn}
is a virtual production that allows a choice of exactly
one of the rules. Similarly,
	{r1 | r2 | ... | rn}...
is a virtual production that allows a choice of one or
more of the rules. And, finally,
	{r1 | r2 | ... | rn}/...
is a virtual production that allows a choice of one or
more of the rules subject to the side condition that
rules must alternate, that is, that no rule can follow
itself immediately without the interposition of some
other rule. This is a case that is not particularly
easy to write by hand, but is quite useful in a number
of contexts.

If the above virtual productions are written with []
instead of {}, they all become optional. [] is an
optional choice, []... is zero or more choices, and
[]/... is zero or more alternating choices.

Null productions are not permitted in virtual
productions in those cases where they would cause an
intrinsic ambiguity.

You may use a �definition� statement to assign a name to
a virtual production.
##

Void token

"Void token, <token name>, used as parameter" is a
�warning� message which appears if AnaGram encounters a
�data type� definition declaring a �token� to have type
void when the token has previously been used in a
�parameter assignment� for a �reduction procedure�. Your
C or C++ compiler will complain when it tries to compile
the call to the reduction procedure.
##

vs

vs is a field in a �parser control block� to which your
�error handling� procedures and �reduction procedure�s
may refer. It is the �parser value stack� for your
parser. The �semantic values� of the �tokens� identified
by the parser are stored in the value stack.  The value
stack, like the other �parser stacks�, is indexed by
�PCB�.�ssx�. When you are executing a reduction
procedure, PCB.vs[PCB.ssx] contains the semantic value
of the first token in the grammar rule you are reducing,
PCB.vs[PCB.ssx+1] contains the second, and so forth. The
return value from your reduction procedure will be
stored in turn in PCB.vs[PCB.ssx].

vs is defined to be of type $_vt, where "$" represents
the name of your parser. AnaGram defines $_vt to
be a union of fields of sizes corresponding to all the
different data types declared in your syntax for the
semantic values of your tokens. In order to avoid
restrictions on the use of C++ classes, the fields are
defined as character arrays. On some processors which
have byte alignment restrictions for multibyte data,
you might encounter a bus error. To correct this
problem, set the �parser stack alignment� parameter to
an appropriate data type.
##

Warning

If while analyzing your syntax file, AnaGram finds
something suspicious, it is likely to issue a warning.
The Warnings window will pop up automatically when the
analysis has been completed. If the warning is for a
�syntax error� in your input file, you will have to fix
it, because AnaGram cannot successfully interpret it.
Otherwise, AnaGram will be able to create a �parser� for
you, if you wish, no matter how serious the warnings may
be.

You can bring up the Help topic associated with a highlighted warning
by pressing F1 or by clicking with a �Help Cursor�.

If you have syntax errors, AnaGram will synchronize the
cursor in the �syntax file� window with the cursor in the
Warnings window so that whenever the Warnings window is
active, the cursor bar in the syntax file window will
identify the location of the error.

##

What's New

Changes in AnaGram 2.40

Most of the changes in AnaGram 2.40 are under the hood - cleanup of
source files, reorganization of the source tree, revision of build and
test procedures, and so forth, in preparation for the open source
release. All of this will, with luck, be invisible to the end user.

Open Source

AnaGram is now �open source�. AnaGram itself
uses the 4-clause BSD �license�; the �parsing engine�, and thus the output
files, are licensed with the less restrictive zlib �license�. Source
distributions are available from http://www.parsifalsoft.com.

The manual has been re-typeset using LaTeX instead of WordPerfect.
The typographic consistency and formatting has been considerably
improved; unfortunately, the pagination is now completely different,
so page numbers are not portable to the new version.

All the logic dealing with registration, trial copies, serial numbers,
and so forth has been removed.

Unix Support

The Unix build of the �command line version� of AnaGram (agcl) is now
supported and available to the public. There is at present no GUI for
the Unix version. The long-term goal is to migrate the AnaGram GUI
away from the closed (and orphaned) IBM Visual Age class library to
something else, probably GTK, so as to support both Windows and Unix.

Improved Functionality

 Examples. The examples have been adjusted to the current dialect of
C++ and are now compilable again. The legacy "classlib" code some
still depend on is being phased out.

Increased Convenience

 File names. File names in the AnaGram distribution and source
tree are no longer limited to 8+3 characters, and quite a few now have
less cryptic names. Additionally, all HTML files are now named ".html",
not ".htm".

 Installed files. The AnaGram.cgb and AnaGram.hlp files found in
older releases of AnaGram no longer exist; their contents are compiled
into the AnaGram executables instead.

Bug Fixes

 Engine compiler error. The �error_message� field of the PCB has
been changed to const char * so current C++ compilers will accept the
code generated when �diagnose errors� is turned off.

 Multiple output header files. Including more than one AnaGram
output header file at once used to cause some compilers to issue a
warning, because an #ifndef directive was checking the wrong
symbol. This has been corrected.

 Wrappers and error tokens. AnaGram 2.01 generated uncompilable
code if you tried to use the �wrapper� feature and error token
resynchronization at the same time. This has been corrected.

 More than 256 keywords. Build 8 of AnaGram 2.01 fixed certain
problems with large keyword tables, but in the process introduced
another, which is now fixed.

For changes in the previous versions of AnaGram, see �What's New in AnaGram 
2.01� and �What's New in AnaGram 2.0�.

##

What's New in AnaGram 2.01

Changes in AnaGram 2.01

Improved Functionality

  Improved support for building �thread safe parsers�. All
nonconstant parser data previously declared as static variables has been
moved to the �parser control block�. When the �reentrant parser� switch
is set, all references to the parser control block are passed to functions
via calling sequences. The �extend pcb� switch provides a mechanism to
add user-defined variables to the parser control block.

  Improved support for C++ parsers. The �wrapper� statement
provides C++ wrapper classes for objects to be stored on the �parser value stack�.
The �PCB_TYPE� macro allows you to derive a C++ class from the parser control
block and to access its members from your �reduction procedures�.

  Support for the �ISO Latin 1� character set. When using
the �case sensitive� switch, case conversion is performed for all ISO-Latin-1
characters, not just those in the ASCII range.

  Improved support for error diagnostics. It is now possible for users
to provide their own text for the error messages created by the �diagnose errors�
switch. In addition, the �token names� table option now includes ascii representation
of individual characters and keywords instead of only named tokens. The �token names
only� switch can be used for compatibility with previous versions of AnaGram

  More precise determination of error context. The tables used by the �error frame�
option to provide the context of a syntax error have been reworked and now provide
a substantially more precise localization of the error.

Improved error diagnostics in AnaGram

 �Missing reduction procedure� diagnostic.
In addition to warning that there is a �parameter assignment�
without a �reduction procedure�, this
diagnostic is now provided if the �default reduction value�
does not have the same �data type� as the �reduction token�.

 �Command line version�. Diagnostics have been reformatted so
they can be recognized by the Microsoft Visual C++ IDE.

 Refined �keyword anomaly� diagnostics. There should
now be fewer false alarms.

Increased Convenience

 �File Trace�. If your grammar uses �semantically determined productions�,
the File Trace feature will now remember the choices you have
made for �reduction token�s, so that you do not have to make
the same choices over and over again as you work with an example.

 File Paths. The file paths in the #line directives created by the �line numbers�
switch now use forward slashes instead of backslashes.

Changed Defaults

 �Parser stack alignment�. Now defaults to long instead of int.
 �Parser stack size�. Now defaults to 128 instead of 32.

Bug Fixes

 Interaction between context tracking and error token. In previous
versions of AnaGram, if the first token in a rule was the �error token�,
the value of �CONTEXT� was the value that corresponded to the location
of the error. CONTEXT now correctly shows the context at which the
aborted rule began. For instance, in the following example, if a
syntax error is encountered while parsing the expression, the error
rule will skip over remaining characters to the terminating semicolon.
When invoked from handleError(), the CONTEXT macro will return the
context as it was at the beginning of the expression.
        expression statement
          -> expression, ';'
          -> error, ~(eof + ';')?..., ';'        =handleError();

 �Distinguish lexemes�. Several minor bugs in the implementation of distinguish lexemes have been
corrected.

 Set partition logic. Corrected problems in the interaction between the set �partition� logic
and the implementation of the �disregard� statement.

 Table size. Fixed a data sizing problem which occurred when one particular parse table
had precisely 256 entries.

 Keyword recognition. Fixed a problem that could cause difficulties with �keyword�
recognition when the �case sensitive� switch was turned off.

 Default conflict resolution. With unresolved �shift-reduce conflict�s, the shift case was
not always being selected. This problem has been corrected.

 Lockup. It was possible to write an erroneous grammar that would cause
AnaGram to lock up. This problem has been corrected.

 Potential bus error. The error diagnostic funtion created by the �diagnose errors�
switch, could, under some circumstances, access an uninitialized value
on the �parser value stack�. This problem has been corrected.

 Internal errors. Fixed a number of minor bugs which could cause �internal error�s
while running �File Trace�.

For changes in the previous version of AnaGram, see �What's New in AnaGram 2.0�.
##

What's New in AnaGram 2.0

AnaGram's user interface has been completely revamped to make it more
convenient and easier to use. However, the same tried and true AnaGram
algorithms are still in place to build your parsers. The rules for
syntax files are also unchanged.

The �File Trace� and �Grammar Trace� facilities have each had their
windows combined into a single unit, and a �Rule Stack� synched with
these windows and with your syntax file window has been added. The
Rule Stack is particularly convenient for relating the progress of the
parse to the �grammar rules� in your �syntax file�.

A �text entry� field has also been added to the Grammar Trace. This
means you can provide character input to your parser in much the same
way you can with a �test file� in File Trace, but with instant control
over the input.

Some further controls have been added to both File and Grammar Traces.
In particular there is a Reset button to reset the trace to its initial
state. This is particularly useful for �Conflict Trace�s.

AnaGram now has a small �Control Panel� (default position is at the
upper right of the screen) from which you can conveniently control
operation.  A menu bar provides access to the various commands and
tables. There are toolbar buttons for Analyze Grammar, Build Parser,
File Trace, and so on. The panel also has a data entry field for
entering search keys.

You can set both colors and fonts in AnaGram windows to suit your own
preferences. We suggest you check Help for �Colors� or �Fonts� before
making changes to make sure that all information will still be properly
displayed.

AnaGram's �Help� has been updated to provide hypertext-type links. But
you can still keep multiple Help windows on view at once. A popup menu
shows all the links in a window. New topics have been added. Also,
further documentation topics are provided in HTML format in the html
subdirectory.

A �Help Cursor� on the Control Panel toolbar can be used to get help for
most AnaGram windows, buttons and menu items. F1 can also be used.

On the �Action Menu� you will find a list of your most recently used
syntax files. Just click on the file of your choice to have AnaGram
analyze it (or build it if �Autobuild� is on).
##

White Space

In many grammars it is desirable to pass over blanks,
tabs, and similar characters, as well as comments,
collectively termed "white space", as though they were
not there. The "�disregard�" statement in AnaGram may
be optionally used to accomplish this. The "�lexeme�"
statement may be used to exercise fine control over the
scope of the disregard statement.
##

Wrapper

The wrapper �attribute statement� provides correct handling of C++
objects returned by �reduction procedure�s.

If you specify a wrapper for a C++ object, then, when a reduction
procedure returns an instance of the object, a copy of the object will
be constructed on the �parser value stack� and the destructor will be
called when the object is removed from the stack.

Without a wrapper, objects are stored on the value stack simply
by coercing the stack pointer to the appropriate type.
There is no constructor call when the object is stored nor
a destructor call when it is removed from the stack.

Classes which use reference counts or otherwise overload the
assignment operator should always have wrappers in order to
function correctly.

Wrapper statements, like other �attribute statements�, must appear in
configuration sections. The syntax is simply
  wrapper { <comma delimited list of data types> }

For example:
  [
    wrapper {CString, CFont}
  ]

You cannot specify a wrapper for the �default token type�.

If your parser exits with an error condition, there may be
objects remaining on the stack. The �DELETE_WRAPPERS� macro
may be used to delete these objects. If you have enabled
�auto resynch�, DELETE_WRAPPERS will be invoked automatically.

The �AG_PLACEMENT_DELETE_REQUIRED� macro is used to control
definition of a "placement delete" operator in the wrapper
class AnaGram defines.
##

Zero Length

A zero length �token� is a �reduction token� which can
be matched by a void, i.e. by nothing at all. It
represents an optional item, or a sequence of optional
items, in the input. Since the matching process can
involve several levels of reductions, it is most precise
to use the following recursive definition: A zero length
token is one which either has at least one �null
production� or has at least one grammar rule defining it
such that all the tokens in the rule are zero length
tokens.

Care should be taken when using �zero length� tokens in
�recursive rule�s. If all the tokens in the rule other than
the recursive token itself are zero length tokens
the rule will generate an infinite loop in the generated
parser.

The �Token Table� identifies zero length tokens because
the use of such tokens sometimes inadvertently causes
�conflict�s.
##

Control Panel

The AnaGram Control Panel appears at the upper right of your monitor
when you start AnaGram. It has a menu bar, command buttons, a button
which enables a �help cursor�, and a �status indicator�. At the lower
left you will see a data entry field for entering �search�
keys, with neighboring search forward and search backward buttons.

Notice that the �Options Menu� has a "Stay On Top" entry which
allows you to specify whether the Control Panel stays on top of
other AnaGram windows.
##

Status Indicator

The status indicator at the right of the AnaGram
Control Panel shows the status of the �current grammar�:
  Ready
  Loaded
  Error
  Parsed
  Analyzed
  Built

"Ready" appears only when no grammar has been selected.

"Loaded" and "Parsed" are normally transitory.

"Error" means at least one syntax error has been detected
in your grammar and AnaGram cannot continue. Check the
Warnings window to determine the nature of the problem.

"Analyzed" means that a �grammar analysis� has been
completed, but no �output files� have been written.

"Built" means that an analysis has been completed and
output files have been written.
##

Help Cursor

The Help Cursor is accessed via the button with the question mark on
AnaGram's �Control Panel�. It is convenient for getting help on
�Warning�s, browse tables, menu items and so on.

If you click on the button you enable the Help Cursor, which you can
then drag with the mouse. A further mouse click will provide help
for the item underneath the cursor.

Note further that AnaGram also has F1 help which you may find
simpler and faster than the Help Cursor.
##

Search

AnaGram has a simple search facility to let you search for text strings
in AnaGram windows. A data entry field on the �Control Panel� is
provided for you to enter text. Left-clicking on the neighboring
buttons lets you search either forward or backward for a line in the
active window which contains at least one instance of the text.

Note that the search begins at the next line after the highlighted line
for forward search; at the line preceding the highlighted line for
backward search.
##

Search Key

To find a text string in an AnaGram window, enter the
string in the Search Key field in the �Control Panel�
and press Enter.

To find another instance of the string click on the
�Find Next� button or press F3.

To find a previous instance of the string click on
the �Find Previous� button or press F4.

In windows that have a cursor bar, a forward search
begins on the line following the cursor and a backward
search begins on the line preceding the cursor.
##

Find Next

The Find Next key, on the �Control Panel� immediately
to the right of the �Search Key� field, locates
the next instance of the search key in the most recently
active AnaGram window. F3 is the keyboard equivalent.
##

Find Previous

The Find Previous key, on the �Control Panel� immediately
to the right of the �Find Next� key, searches
backwards for the search key in the most recently
active AnaGram window. F4 is the keyboard equivalent.
##

Fonts, Set Fonts

The Set Fonts dialog allows you to use the fonts of your choice in
AnaGram windows. You should make sure that the �marked token�s font is
very distinctive so that marked tokens will show up clearly even if
they are only 1 or 2 characters long. Sometimes it is helpful to use an
underlined font for marked tokens.

A Default button at the bottom of the dialog lets you revert to
AnaGram's original fonts if you wish.
##

Colors, Set Colors

The Set Colors dialog allows you change the colors of
AnaGram windows. Notice that in the �File Trace� the �test file pane�
requires three different sets of text and background colors. You
should make sure that the backgrounds, at least, can be easily
distinguished from each other so the trace information can be
properly displayed. You also want to take care that an active pane in
a File Trace or Grammar Trace can be distinguished from inactive
panes.

The Default button at the bottom of the dialog lets you revert to
AnaGram's original colors if you wish.

Color changes pertain only to the client areas of AnaGram windows. The
remaining parts of your windows will have the customary colors you have
chosen for your system.
##

Marked Token

Some tables and trace panes display each rule with one token marked to
show how far parsing has progressed in the rule. The marked token is
the next input expected in the input stream. It is shown in a different
font to distinguish it from other tokens in the rule. If no token is
marked, the rule is a �completed rule�, i.e. it has been completely
matched and will be reduced by the next input.

You can set the font for marked tokens by choosing Fonts from the
�Options Menu�. You should make sure that the font is very distinctive so
that marked tokens will show up clearly even if they are only 1 or 2
characters long.  Sometimes it is helpful to use an underlined font for
marked tokens.
##

Synch Parse

The Synch Parse button replaces the �Single Step� button on the
toolbar of the �File Trace window� when, for some reason, the
location of the blinking cursor in the �test file pane� differs from
the current parse position. This can occur when you single click in
the test file pane or when the parse cannot track the cursor because
of a �syntax error� or a �semantically determined production�.

Click the synch parse button to resynch the parse with the cursor.
##


Single Step

The Single Step button is one of the control buttons for the �File
Trace� and �Grammar Trace�. It advances the parse one �parser
action� at a time. In the File Trace, it is replaced with the "�Synch
Parse�" button whenever the blinking cursor loses synch with
the current parse location.

In the Grammar Trace, the Single Step button takes its input from the
Allowable Input pane, the Reduction Choices pane, or the �text entry�
field, depending on which is active.
##

Proceed

The Proceed button is one of the control buttons for the
�Grammar Trace�. If the �Reduction Choices pane� or the �Allowable
Input pane� is active, Proceed parses the highlighted token
until it is shifted in to the �parser stack�. If the �text entry�
field is active, Proceed parses all text in the field. If a
�syntax error� is encountered, the parse stops and all �reduce
action�s are undone.

Note that selecting a token in Allowable Input can cause a syntax
error under certain circumstances. This can happen only if the
following conditions are all true:
 the indicated operation is a �reduction�,
 the reduction token for the rule being reduced has been used in several
different contexts in the grammar
 and the specified token may
follow it in some contexts and not in others.
##

Reduction Choices Pane

The �File Trace� and �Grammar Trace� display a Reduction Choices
pane when they need to reduce a �semantically determined production�.

The rule to be reduced is highlighted in the �rule stack pane�.
If the �syntax file� window is visible, it shows the rule in
context in your grammar.

The Reduction Choices pane lists all possible �reduction token�s for
the specified rule. The first reduction token that is admissible in
the current context is highlighted and it appears
as the �lookahead token� in the �parser stack pane�. The text that
comprises the entire rule is highlighted in the �test file pane�.

Select the desired reduction token before continuing with the parse.

If you select a token and it does not appear as the lookahead token,
it is not syntactically correct in the current context. If you try
to proceed with the parse, you will get a �selection error�.
##

Selection Error

The �Parse Status� field indicates a "selection error" if you
choose a �reduction token� from the �Reduction Choices pane� of
a �File Trace� or �Grammar Trace� and the selected token is not
syntactically correct in the current context.
##

Parser Stack Pane

The Parser Stack pane, the upper left pane of the �File Trace� and
�Grammar Trace� windows, displays the �parser stack� for the current
trace.

Each line corresponds to one level in the parser state stack. It shows
the stack index, the �parser state� for that level, and the �token� which
was seen at that state. The last line of the stack, the �lookahead
line�, corresponds to the current state of the parser. Since no input
has yet been processed for this state, the token, if any, which
appears at this level is a �lookahead token�.

If you move the cursor in the Parser Stack pane of a File Trace,
the text that makes up the selected token will be
highlighted in the �Test File pane�. You can back the parse up to
any desired stack level by double clicking at the beginning of the
token text in the Test File pane.

Similarly, if you move the cursor bar in the Parser Stack pane of a
Grammar Trace, the �Allowable Input pane� will change to display the
allowable tokens in the selected state. The previously
selected token will be highlighted. Then, double click on any token in
the Allowable Input pane to back the parse up and choose a token
a second time.

The �Rule Stack pane� of the File or Grammar Trace is also synched
to the Parser Stack pane. If the �syntax file� window is visible, it
will be synched to show the rule currently selected in the rule
stack pane. Note that rules that have been automatically generated
by the expansion of �virtual productions� cannot be synched, so the
top line of the syntax file will be highlighted instead.

In the Grammar Trace, the last line of the Parser Stack may or may not
display a �lookahead token�, depending on the last �parser action�
performed. If input was taken from Allowable Input and the last
action was a simple �reduce action�, the last input token selected
will be displayed as the lookahead input. But if the last action
performed shifted the token in, the lookahead field will be empty.

If you right-click on a highlighted line in the Parser Stack pane, you will
get a pop-up menu to give you more information. In particular you can
get an �Auxiliary Trace� starting at the current point in your File or
Grammar Trace, so you can explore various possibilities without losing
your position in the old trace.
##

Exit

Select this entry from the �Action Menu� to terminate AnaGram.
##

Allowable Input, Allowable Input Pane

The upper right pane of the �Grammar Trace� window lists the
allowable input tokens for the current state of the �grammar�.

The tokens in the Allowable Input pane are listed in two groups:
first, the �terminal tokens� allowable in this state, and
second, the �nonterminal tokens�. Between these two groups of tokens
is inserted a line which is either an option for a �default reduction�,
or declares that there is no default action.

Double click, press Enter, or click the �Proceed� button to
parse the highlighted token. When all parse actions triggered
by the highlighted token have been completed, all panes of the trace
will be redrawn to show the new state of the parser.

Note that selecting a token in Allowable Input can cause a syntax
error under certain circumstances. This can happen only if the
following conditions are all true:
 the indicated operation is a �reduction�,
 the reduction token for the rule being reduced has been used in several
different contexts in the grammar
 and the specified token may
follow it in some contexts and not in others.

If you wish to see the results of a single parser action, click
on the �single step� button. The parser will perform a single
parser action. If the
token you selected was not shifted in, it will now be displayed
as the �lookahead token� on the last line, the �lookahead line� in
the �Parser Stack pane�, and will be preselected in the Allowable
Input pane.

Because AnaGram, by default, uses a number of compound
parser actions, this situation does not arise very often unless you
have set the �traditional engine� switch or reset the �default
reductions� switch. Usually you will want to select the same token to
proceed, but it is not necessary.

The Allowable Input pane also displays
the �parser action� associated with a specific token. If it is
not a �compound action�, the action and its result are also shown.

The �parser action� field for a token may be interpreted as follows: If
this token would cause a shift to a new state, the action field is ">>"
followed by the new state number. If the token would cause a
�reduction�, the action field is "<<" followed by a �rule number� to
show the rule reduced.  If the parser action is a compound action, the
action field is blank.  If the token would cause the grammar to be
accepted, the action field is "Accept".


The �text entry� field at the bottom of the Grammar Trace can be
used as a convenient alternative to the Allowable Input pane. It
accepts characters rather than tokens. Most non-printing characters
such as newline are only available from Allowable Input.
##

Copy

The Copy command on the �Windows Menu� copies the currently active
table or Help topic to the clipboard.
##

Statistical Summary

While your grammar is being analyzed, a Statistical Summary window
pops up to show you the progress of the analysis. Unless you have
turned off �Show Statistics� on the �Options Menu�, this window will remain
on-screen for your reference. Among other things, it shows you the
number of rules and states in your grammar, and the number of conflicts
and warnings, if any.

Note that if your grammar is small and you have Show Statistics turned
off, the appearance of this window on your monitor may be exceedingly
brief - you may just see a flash.

If the window is turned off or you have closed it, you can get it from
the �Browse Menu�.
##

Stay On Top

The Stay On Top entry in the �Options Menu� allows you to specify whether
the �Control Panel� stays on top of other AnaGram windows.
##

Show Syntax

If this entry in the �Options Menu� is checked, AnaGram will display the
�syntax file� when it has analyzed your �grammar�. If this entry is not checked
or you have closed the syntax file window, you can select the window
from the �Browse Menu�.
##

Show Statistics

If this entry in the �Options Menu� is checked, AnaGram will leave the
�Statistical Summary� on the screen after it has analyzed your �grammar�. If
this entry is not checked or you have closed the Statistical Summary
window, you can select the window from the �Browse Menu�.
##

About AnaGram

Select this entry from the �Help Menu� to find out the version and
serial numbers of your copy of AnaGram, and how to contact Parsifal
Software.
##

Help Topics

Select Help Topics from the �Help Menu� to get a complete list of AnaGram
Help Topics titles. You can bring up the window for a highlighted topic
by double-clicking with the left mouse button, pressing F1, or using
the �Help Cursor�.
##

Cascade Windows

Select this entry from the �Windows Menu� to cascade your open windows
starting at top left of the screen.
##

Close Windows

Select this entry from the �Windows Menu� to close all open windows
except the �Control Panel�. You may also close the active window
by pressing the Escape key.
##

Hide Windows

Select this entry from the �Windows Menu� to hide all open windows
except the �Control Panel�. Restore them to the screen with �Restore
Windows�
##

Restore Windows

Use this command on the �Windows Menu� to restore to the screen
any windows you have previously hidden with �Hide Windows�.
##

Token Input, Preprocessor, Lexical Scanner

AnaGram makes it unnecessary, in most cases, to have a separate
preprocessor to provide the �tokens� which are fed to your parser.

However in some cases you may want to use a preprocessor, or lexical
scanner, to provide input to your parser. The preprocessor may
or may not be written in AnaGram. If it sends the parser token
numbers, as opposed to character codes, this is referred to as token
input, as opposed to character input. Please refer to the AnaGram
User's Guide for information on identifying the tokens to the parser
and providing their semantic values, if any.

Since a �File Trace� is based on character codes, it will be greyed out
on the �Action Menu� if you have token input. For a �Grammar Trace�,
entering characters in the �text entry� field is not appropriate and
will simply cause a syntax error.
##

Lookahead Line

The last line of the �Parser Stack pane�, the "lookahead" line,
will sometimes show a �lookahead
token�, and sometimes not. In a �File Trace�, you will always see a
lookahead token because it is available from the �test file�.

In a �Grammar Trace� you will usually see a lookahead token only when
you have used the �Single Step� button or if there is available
input in the �text entry� field. In the latter case the token
corresponding to the first character of the input will appear on the
lookahead line.

If you click Single Step after selecting a token from �Allowable
Input� and it causes only a simple �reduce action� (as opposed to a
shift or a compound action), then, upon completion of the reduction,
the token you selected will appear on the lookahead line and also
will be preselected in Allowable Input.

Usually you would select
this token for the next parse step.  However, if there are other
possible inputs in this state, the parse theoretically could have
arrived at this state by a different sequence of input tokens. Thus,
if you are more interested in the behavior of the parser at this
state than in the response of the parser to a particular sequence of
inputs, it is perfectly valid to select a different input token, and
AnaGram will let you do it.

Note that if you have enabled the �traditional engine� switch or
disabled the �default reductions� switch, the
probability of finding a token which does a simple reduction is
noticeably higher than otherwise.
##

Action Menu

The Action menu begins with the �Analyze Grammar� and �Build Parser�
commands. If a grammar has already been analyzed, but not yet built,
there will also be an extra Build command bearing the name of your
syntax file.

There are also �Reanalyze� and �Rebuild� commands which are
initially greyed out. They become available if you change the
current syntax file.

The next section has �File Trace� and �Grammar Trace�
commands. If you have enabled the �Error Trace�
�configuration switch�, this section also shows an
Error Trace command.

The menu ends with an �Exit� command
and a list of recently used syntax files, if any. Just
click on a syntax file name to have AnaGram analyze it, or
build it if the �Autobuild� option is on.
##

Browse Menu

Initially, the Browse Menu shows only a single entry:
�Configuration Parameters� which lets you see the
current state of configuration parameters before any
may have been set by your syntax file. Once you have
analyzed a grammar, this menu fills up with many tables
containing information about your grammar. You can also
bring up a window showing your �syntax file� from this menu.
If your grammar has generated �syntax error�s or warnings, or
contains conflicts, there will be �Warning�s or �Conflict�s
entries.
##

Options Menu

From this menu you can select a �Fonts� or �Colors� dialog so you can
set AnaGram's fonts and colors to suit your own tastes. You can set
�Autobuild� if you want AnaGram to automatically build your �grammar�
when you select a �syntax file� from the �Action Menu�.  You can also
choose whether or not to automatically show the �Statistical Summary�
window or your syntax file window when you open a grammar, or make
the �Control Panel� stay on top of other AnaGram windows.
##

Windows Menu

The Windows menu lets you cascade, close, or hide all AnaGram
windows except the �Control Panel�, or restore them if they
have been hidden. It also has a list of open windows (even
if hidden) so you can select the one you want. The Copy command will
copy most windows to the clipboard.
##

Help Menu

The Help Menu has the following entries:

�Getting Started� provides a brief description of AnaGram and
introductory suggestions.

�Help Topics� brings up a list of all help topics.

�Using Help� tells you how to use AnaGram's help facilities.

�What's New� has information on new features of this version of AnaGram.

�About AnaGram� tells you what version of AnaGram you are using, and also
provides contact information for Parsifal Software.
##

Autobuild

When Autobuild (�Options Menu�) is checked, selecting a file
from the list of most recently used files on the �Action Menu�
invokes the �Build Parser� command. Otherwise, the �Analyze
Grammar� command is invoked.
##

Reanalyze, Rebuild

Reanalyze and Rebuild commands on the �Action Menu� are
initially greyed out.

Reanalyze becomes available if
you have a syntax file currently analyzed or built
in AnaGram and change it while AnaGram is still running.

Rebuild becomes available if
you have a syntax file currently built
and change it while AnaGram is still running.
##

Percent Sign

The percent sign ( % ) is used to mark certain tokens in your grammar
which AnaGram must redefine in order to implement the �disregard�
statement. If you have used this statement in your grammar, You will
probably notice the percent sign appearing in some windows and traces.

The percent sign indicates the original token, without the optional
white space attached. Early versions of AnaGram used the degree sign
instead, but this character is not generally available in Windows.
##

Program Development

The first step in writing a program is to write a �grammar� in
AnaGram notation which describes the input the program expects.

The file containing the grammar, called the �syntax file�, should
have the extension ".syn". You could also make up a few sample input
files at this time, but it is not necessary to write �reduction
procedure�s at this stage.

Run AnaGram and use the �Analyze Grammar� command to create parse
tables. If there are �syntax errors� in the grammar at this point,
you will have to correct them before proceeding, but you do not
necessarily have to eliminate �conflicts�, if there are any, at this
time. There are, however, many aids available to help you with
conflicts. These aids are described in the AnaGram User's Guide, and
somewhat more briefly in the Online Help topics.

Once syntax errors are corrected, you can try out your grammar on the
sample input files using the �File Trace� facility.
With File Trace, you can see interactively just how your grammar
operates on your test files. You can also use �Grammar Trace� to
answer "what if" questions concerning input to the grammar. The
Grammar Trace does not use a test file, but rather allows you to make
input choices interactively.

At any time, you can write �reduction procedure�s to process your
input data as its components are identified in the input stream. Each
procedure is associated with a �grammar rule�. The reduction
procedures will be incorporated into your parser when you create it
with the �Build Parser� command.

By default, unless you specify an input procedure, �parser input�
will be read from stdin, using the default �GET_INPUT� macro.
You will probably wish to redefine GET_INPUT, or configure your
parser to use �pointer input� or �event driven� input.
##

License, Copyright, Copying, Open Source, Warranty, No Warranty

AnaGram, A System for Syntax Directed Programming

Copyright 1993-2002 Parsifal Software

Copyright 2006, 2007 David A. Holland

All Rights Reserved.

AnaGram itself is released to the public under the traditional 4-clause BSD
license:

   Redistribution and use in source and binary forms, with or without
   modification, are permitted provided that the following conditions are
   met:

   1. Redistributions of source code must retain the above copyright notice,
   this list of conditions and the following disclaimer.

   2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.

   3. All advertising materials mentioning features or use of this software
   must display the following acknowledgement:
      This product includes software developed by Parsifal Software,
      Jerome T. Holland, and their contributors.

   4. Neither the name of Parsifal Software nor the name of Jerome T.
   Holland nor the names of their contributors may be used to endorse or
   promote products derived from this software without specific prior written
   permission.

   THIS SOFTWARE IS PROVIDED BY PARSIFAL SOFTWARE,
   JEROME T. HOLLAND, AND CONTRIBUTORS ``AS IS'' AND ANY
   EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 
   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
   AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
   IN NO EVENT SHALL PARSIFAL SOFTWARE, JEROME T. 
   HOLLAND, OR THE CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
   INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR 
   CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
   PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
   USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
   HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
   WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
   OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
   POSSIBILITY OF SUCH DAMAGE.

The AnaGram �parsing engine�, that is, the code that is emitted by
AnaGram and incorporated into programs developed using AnaGram, uses
this less restrictive zlib-style license:

   This software is provided 'as-is', without any express or implied warranty.
   In no event will the authors be held liable for any damages arising from
   the use of this software.

   Permission is granted to anyone to use this software for any purpose,
   including commercial applications, and to alter it and redistribute it
   freely, subject to the following restrictions:

   1. The origin of this software must not be misrepresented; you must not
   claim that you wrote the original software. If you use this software in a
   product, an acknowledgment in the product documentation would be
   appreciated but is not required.

   2. Altered source versions must be plainly marked as such, and must not
   be misrepresented as being the original software.

   3. This notice may not be removed or altered from any source distribution.

##