#include <tokenizer.h>
Inheritance diagram for tokenizer:


Public Types | |
| enum | constraints { DEFAULT_MAX_BITS = 7 } |
Public Member Functions | |
| tokenizer (int max_bits=DEFAULT_MAX_BITS) | |
| creates a tokenizer with the default characters. | |
| tokenizer (const istring &separator, const istring &assignment, int max_bits=DEFAULT_MAX_BITS) | |
| creates an empty list of tokens and uses the specified sentinel chars. | |
| tokenizer (const istring &separator, const istring &assignment, const istring "es, bool nesting=true, int max_bits=DEFAULT_MAX_BITS) | |
| similar to the constructor above, but supports quoting. | |
| tokenizer (const tokenizer &to_copy) | |
| builds a tokenizer that is identical to "to_copy". | |
| virtual | ~tokenizer () |
| IMPLEMENT_CLASS_NAME ("tokenizer") | |
| void | set_comment_chars (const istring &comments) |
| establishes a set of characters in "comments" as the comment items. | |
| tokenizer & | operator= (const tokenizer &to_copy) |
| makes this tokenizer identical to "to_copy". | |
| int | symbols () const |
| returns the number of entries in the tokenizer. | |
| void | reset () |
| clears all of the entries out. | |
| const string_table & | table () const |
| provides a constant peek at the string_table holding the values. | |
| string_table & | table () |
| provides direct access to the string_table holding the values. | |
| void | parse (const istring &to_tokenize) |
| parses the string using our established sentinel characters. | |
| istring | find (const istring &name) const |
| locates the value for a variable named "name" if it exists. | |
| bool | exists (const istring &name) const |
| returns true if the "name" exists in the tokenizer. | |
| istring | text_form () const |
| creates a new token list as a string of text. | |
| void | text_form (istring &to_fill) const |
| like text_form() above, but stores into "to_fill". | |
| bool | add_spaces () const |
| void | add_spaces (bool add_them) |
| bool | okay_for_variable_name (char to_check) const |
| true if "to_check" is a valid variable name character. | |
| const istring & | assignments () const |
| provides a peek at the assignments list. | |
| const istring & | separators () const |
| provides a peek at the separators list. | |
| const istring & | quotes () const |
| provides a peek at the quotes list. | |
| bool | assignment (char to_check) const |
| true if "to_check" is a valid assignment operator. | |
| bool | separator (char to_check) const |
| true if "to_check" is a valid separator. | |
| bool | comment_char (char to_check) const |
| true if "to_check" is a registered comment character. | |
| bool | is_eol_a_separator () const |
| reports whether any of the separators are an EOL character. | |
| bool | quote_mark (char to_check) const |
| true if "to_check" is a member of the quotes list. | |
Manipulates strings containing variable definitions where a variable is syntactically defined as a name, an assignment operator, and a value. The string can optionally define many variables by placing a separator character between the definitions. The assignment and separator are referred to as sentinels in the following docs. This class also supports quoted values if the appropriate constructor is used.
Definition at line 36 of file tokenizer.h.
| tokenizer::tokenizer | ( | int | max_bits = DEFAULT_MAX_BITS |
) |
creates a tokenizer with the default characters.
this will not look for quote characters. the "max_bits" establishes the hashing width for the internal table of strings; there will be 2 ^ "max_bits" of space in the table. the default assignment operator is '=' and the default separator is ','.
Definition at line 35 of file tokenizer.cpp.
| tokenizer::tokenizer | ( | const istring & | separator, | |
| const istring & | assignment, | |||
| int | max_bits = DEFAULT_MAX_BITS | |||
| ) |
creates an empty list of tokens and uses the specified sentinel chars.
the character that is expected to be between name/value pairs is "separator". the "assignment" character is expected to be between each name and its value. note that if the "separator" or "assignment" are more than one character long, these will be taken as a set of valid characters that can be used for those purposes.
Definition at line 46 of file tokenizer.cpp.
| tokenizer::tokenizer | ( | const istring & | separator, | |
| const istring & | assignment, | |||
| const istring & | quotes, | |||
| bool | nesting = true, |
|||
| int | max_bits = DEFAULT_MAX_BITS | |||
| ) |
similar to the constructor above, but supports quoting.
if the "quotes" list is not empty, then those characters will be treated as quoting characters that must be matched in pairs. inside a quote, separators are ignored. if "nesting" is not true, then only one level of quotes will be considered; the occurrence of other types of quotes will be ignored until the original type is completed.
Definition at line 58 of file tokenizer.cpp.
| tokenizer::tokenizer | ( | const tokenizer & | to_copy | ) |
| tokenizer::~tokenizer | ( | ) | [virtual] |
| tokenizer::IMPLEMENT_CLASS_NAME | ( | "tokenizer" | ) |
| void tokenizer::set_comment_chars | ( | const istring & | comments | ) |
establishes a set of characters in "comments" as the comment items.
comments will be specially handled by being added to the string table with the comment prefix. this allows them to be regenerated uniquely later.
Definition at line 93 of file tokenizer.cpp.
Referenced by ini_parser::add().
makes this tokenizer identical to "to_copy".
Definition at line 111 of file tokenizer.cpp.
References _add_spaces, _assignments, _implementation, _nesting, _quotes, and _separators.
| int tokenizer::symbols | ( | ) | const |
returns the number of entries in the tokenizer.
Definition at line 91 of file tokenizer.cpp.
References symbol_table< contents >::symbols().
Referenced by write_build_config::execute(), and write_build_config::output_definition_macro().
| void tokenizer::reset | ( | ) |
clears all of the entries out.
Definition at line 105 of file tokenizer.cpp.
References symbol_table< contents >::reset().
| const string_table & tokenizer::table | ( | ) | const |
provides a constant peek at the string_table holding the values.
Definition at line 107 of file tokenizer.cpp.
Referenced by ini_parser::add(), write_build_config::execute(), ini_configurator::get_section(), write_build_config::output_definition_macro(), and ini_configurator::put_section().
| string_table & tokenizer::table | ( | ) |
provides direct access to the string_table holding the values.
Definition at line 109 of file tokenizer.cpp.
| void tokenizer::parse | ( | const istring & | to_tokenize | ) |
parses the string using our established sentinel characters.
attempts to snag as many value/pairs from "to_tokenize" as are possible by using the current separator and assignment characters. E.G.: if the separator is ';' and the assignment character is '=', then one's string would look something like:
TEMP=c:\tmp; GLOB=c:\glob.exe; ....
Definition at line 166 of file tokenizer.cpp.
References symbol_table< contents >::add(), assignment(), CHOP, comment_char(), COOL, istring::end(), FUNCDEF, parser_bits::is_eol(), is_eol_a_separator(), stack< contents >::kind(), okay_for_variable_name(), stack< contents >::pop(), stack< contents >::push(), quote_mark(), istring::reset(), separator(), stack< contents >::size(), SPECIAL_VALUE, STRTAB_COMMENT_PREFIX, istring::t(), stack< contents >::top(), parser_bits::white_space(), and parser_bits::white_space_no_cr().
Referenced by ini_parser::add(), write_build_config::execute(), ini_configurator::get_section(), write_build_config::output_definition_macro(), and application_config::parse_startup_entry().
locates the value for a variable named "name" if it exists.
if "name" doesn't exist, then it returns an empty string. note that an empty string might also indicate that the value is blank; locate is the way to tell if a field is really missing. also note that when a variable name is followed by an assignment operator and an empty value (e.g., "avversione=" has no value), then a value of a single space character will be stored. this ensures that the same format is used on the output side, but it also means that if you access the table directly, then you will get a space as the value. however, this function returns an empty string for those entries to keep consistent with expectations.
Definition at line 123 of file tokenizer.cpp.
References symbol_table< contents >::find(), and SPECIAL_VALUE.
Referenced by application_config::parse_startup_entry().
| bool tokenizer::exists | ( | const istring & | name | ) | const |
returns true if the "name" exists in the tokenizer.
Definition at line 102 of file tokenizer.cpp.
References symbol_table< contents >::find().
| istring tokenizer::text_form | ( | ) | const [virtual] |
creates a new token list as a string of text.
the first separator and assignment characters in each set are used to generate it. note that the whitespace that existed in the original parsed string might not be exactly the same in the generated string.
Reimplemented from object_base.
Definition at line 381 of file tokenizer.cpp.
Referenced by ini_parser::add(), and ini_configurator::put_section().
| void tokenizer::text_form | ( | istring & | to_fill | ) | const |
like text_form() above, but stores into "to_fill".
Definition at line 347 of file tokenizer.cpp.
References istring::end(), istring::get(), string_table::is_comment(), is_eol_a_separator(), symbol_table< contents >::name(), log_base::platform_ending(), istring::reset(), symbol_table< contents >::symbols(), and istring::zap().
| bool tokenizer::add_spaces | ( | ) | const [inline] |
Definition at line 142 of file tokenizer.h.
| void tokenizer::add_spaces | ( | bool | add_them | ) | [inline] |
Definition at line 143 of file tokenizer.h.
| bool tokenizer::okay_for_variable_name | ( | char | to_check | ) | const |
true if "to_check" is a valid variable name character.
this includes any characters besides separators and assignments.
Definition at line 133 of file tokenizer.cpp.
References assignment(), and separator().
Referenced by parse().
| const istring & tokenizer::assignments | ( | ) | const |
| const istring & tokenizer::separators | ( | ) | const |
| const istring & tokenizer::quotes | ( | ) | const |
| bool tokenizer::assignment | ( | char | to_check | ) | const |
true if "to_check" is a valid assignment operator.
Definition at line 148 of file tokenizer.cpp.
References istring::matches().
Referenced by okay_for_variable_name(), and parse().
| bool tokenizer::separator | ( | char | to_check | ) | const |
true if "to_check" is a valid separator.
Definition at line 139 of file tokenizer.cpp.
References parser_bits::is_eol(), and istring::matches().
Referenced by okay_for_variable_name(), and parse().
| bool tokenizer::comment_char | ( | char | to_check | ) | const |
true if "to_check" is a registered comment character.
Definition at line 154 of file tokenizer.cpp.
References istring::matches().
Referenced by parse().
| bool tokenizer::is_eol_a_separator | ( | ) | const |
reports whether any of the separators are an EOL character.
Definition at line 337 of file tokenizer.cpp.
References istring::get(), parser_bits::is_eol(), and istring::length().
Referenced by parse(), and text_form().
| bool tokenizer::quote_mark | ( | char | to_check | ) | const |
true if "to_check" is a member of the quotes list.
Definition at line 151 of file tokenizer.cpp.
References istring::matches().
Referenced by parse().
1.5.1