Token
is a terminal symbol in the grammar for the source language. When the character
sequence ‘pi’ appears in the source program, a token representing identifier is
returned to the parser. A token is
a pair consisting of a token name and an optional attribute value. The token name
is an abstract symbol representing a kind of lexical unit, e.g., a particular
keyword, or a sequence of input characters denoting an identifier. The token
names are the input symbols that the parser processes. In what follows, we
shall generally write the name of a token in boldface. We will often refer to a
token by its token name.
Pattern
is a rule describing the set of lexemes that can represent a particular token
in source programs. A pattern is a
description of the form that the lexemes of a token may take. In the case of a
keyword as a token, the pattern is just the sequence of characters that form
the keyword. For identifiers and some other tokens, the pattern is a more
complex structure that is matched by
many strings.
Lexeme
is a sequence of characters in the source program that is matched by the
pattern for a token. A lexeme is
a sequence of characters in the source program that matches the pattern for a
token and is identified by the lexical analyzer as an instance of that token.
In
many programming languages, the following classes cover most or all of the
tokens:
1.
One token for each keyword. The pattern for a keyword is the same as the
keyword itself.
2.
Tokens for the1 operators, either individually or in classes such as the token
comparison.
3.
One token representing all identifiers.
4.
One or more tokens representing constants, such as numbers and literal
5.
Tokens for each punctuation symbol, such as left and right parentheses, comma,
and semicolon.
0 comments