An
alphabet is any finite set of symbols.
Typical examples of symbols are letters, digits, and punctuation. The set {0,1}
is the binary alphabet.
A
string over an alphabet is a finite sequence of
symbols drawn from that alphabet. In language theory, the terms "sentence"
and "word" are often used as synonyms for "string." The
length of a string s, usually written |s|,
is the number of occurrences of symbols in s. For
example, banana is a string of length six. The empty
string, denoted e, is the string of length zero.
The
following string-related terms are commonly used:
1.
A prefix of string s is
any string obtained by removing zero or more. symbols from the end of s.
For example, ban, banana, and e are prefixes of banana.
2.
A suffix of string s is
any string obtained by removing zero or more symbols from the beginning of s.
For example, nana, banana, and e are suffixes of banana.
3.
A substring of s
is obtained by deleting any prefix and any suffix from s.
For instance,banana, nan, and e are substrings of banana.
4.
The proper prefixes,
suffixes, and substrings of a string s are
those, prefixes, suffixes, and substrings, respectively, of s
that are not e or not equal to s itself.
5.
A subsequence of s
is any string formed by deleting zero or more not necessarily consecutive
positions of s. For example, baan is
a subsequence of banana.
A
language is any countable set of strings
over some fixed alphabet. This definition is very broad. Abstract languages
like 0, the empty set, or
{e}, the set containing only the empty string, are languages under this
definition. So too are the set of all syntactically well-formed C programs and
the set of all grammatically correct English sentences, although the latter two
languages are difficult to specify exactly.
Operations
on languages
LUD:
Union operation, where L=set of alphabets
{A..Z,a..z}
and D=set of digits {0..9}
LD:
Concatenation
L4:exponentiation:
set of strings with 4 letters
L0=
_
Li=Li-1L
L*=all
strings with _: Kleene closure of L
D+:
set of all strings of digits of one or more
1.
L U D is
the set of letters and digits — strictly speaking the language with 62 strings
of
length
one, each of which strings is either one letter or one digit.
2.
LD is the set of strings of length two, each
consisting of one letter followed by one digit.
3.
L4 is the set of all 4-letter strings.
4.
L* is the set of all strings of letters,
including e, the empty string.
5.
L(L U D)* is
the set of all strings of letters and digits beginning with a letter.
6. D+ is the set of all
strings of one or more digits.
0 comments