tokenizer Class Reference

Manages a bank of textual definitions of variables. More...

#include <tokenizer.h>

Inheritance diagram for tokenizer:

Inheritance graph
[legend]
Collaboration diagram for tokenizer:

Collaboration graph
[legend]
List of all members.

Public Types

enum  constraints { DEFAULT_MAX_BITS = 7 }

Public Member Functions

 tokenizer (int max_bits=DEFAULT_MAX_BITS)
 creates a tokenizer with the default characters.
 tokenizer (const istring &separator, const istring &assignment, int max_bits=DEFAULT_MAX_BITS)
 creates an empty list of tokens and uses the specified sentinel chars.
 tokenizer (const istring &separator, const istring &assignment, const istring &quotes, bool nesting=true, int max_bits=DEFAULT_MAX_BITS)
 similar to the constructor above, but supports quoting.
 tokenizer (const tokenizer &to_copy)
 builds a tokenizer that is identical to "to_copy".
virtual ~tokenizer ()
 IMPLEMENT_CLASS_NAME ("tokenizer")
void set_comment_chars (const istring &comments)
 establishes a set of characters in "comments" as the comment items.
tokenizeroperator= (const tokenizer &to_copy)
 makes this tokenizer identical to "to_copy".
int symbols () const
 returns the number of entries in the tokenizer.
void reset ()
 clears all of the entries out.
const string_tabletable () const
 provides a constant peek at the string_table holding the values.
string_tabletable ()
 provides direct access to the string_table holding the values.
void parse (const istring &to_tokenize)
 parses the string using our established sentinel characters.
istring find (const istring &name) const
 locates the value for a variable named "name" if it exists.
bool exists (const istring &name) const
 returns true if the "name" exists in the tokenizer.
istring text_form () const
 creates a new token list as a string of text.
void text_form (istring &to_fill) const
 like text_form() above, but stores into "to_fill".
bool add_spaces () const
void add_spaces (bool add_them)
bool okay_for_variable_name (char to_check) const
 true if "to_check" is a valid variable name character.
const istringassignments () const
 provides a peek at the assignments list.
const istringseparators () const
 provides a peek at the separators list.
const istringquotes () const
 provides a peek at the quotes list.
bool assignment (char to_check) const
 true if "to_check" is a valid assignment operator.
bool separator (char to_check) const
 true if "to_check" is a valid separator.
bool comment_char (char to_check) const
 true if "to_check" is a registered comment character.
bool is_eol_a_separator () const
 reports whether any of the separators are an EOL character.
bool quote_mark (char to_check) const
 true if "to_check" is a member of the quotes list.

Detailed Description

Manages a bank of textual definitions of variables.

Manipulates strings containing variable definitions where a variable is syntactically defined as a name, an assignment operator, and a value. The string can optionally define many variables by placing a separator character between the definitions. The assignment and separator are referred to as sentinels in the following docs. This class also supports quoted values if the appropriate constructor is used.

Definition at line 36 of file tokenizer.h.


Member Enumeration Documentation

enum tokenizer::constraints

Enumerator:
DEFAULT_MAX_BITS 

Definition at line 39 of file tokenizer.h.


Constructor & Destructor Documentation

tokenizer::tokenizer ( int  max_bits = DEFAULT_MAX_BITS  ) 

creates a tokenizer with the default characters.

this will not look for quote characters. the "max_bits" establishes the hashing width for the internal table of strings; there will be 2 ^ "max_bits" of space in the table. the default assignment operator is '=' and the default separator is ','.

Definition at line 35 of file tokenizer.cpp.

tokenizer::tokenizer ( const istring separator,
const istring assignment,
int  max_bits = DEFAULT_MAX_BITS 
)

creates an empty list of tokens and uses the specified sentinel chars.

the character that is expected to be between name/value pairs is "separator". the "assignment" character is expected to be between each name and its value. note that if the "separator" or "assignment" are more than one character long, these will be taken as a set of valid characters that can be used for those purposes.

Definition at line 46 of file tokenizer.cpp.

tokenizer::tokenizer ( const istring separator,
const istring assignment,
const istring quotes,
bool  nesting = true,
int  max_bits = DEFAULT_MAX_BITS 
)

similar to the constructor above, but supports quoting.

if the "quotes" list is not empty, then those characters will be treated as quoting characters that must be matched in pairs. inside a quote, separators are ignored. if "nesting" is not true, then only one level of quotes will be considered; the occurrence of other types of quotes will be ignored until the original type is completed.

Definition at line 58 of file tokenizer.cpp.

tokenizer::tokenizer ( const tokenizer to_copy  ) 

builds a tokenizer that is identical to "to_copy".

Definition at line 70 of file tokenizer.cpp.

tokenizer::~tokenizer (  )  [virtual]

Definition at line 82 of file tokenizer.cpp.

References WHACK().


Member Function Documentation

tokenizer::IMPLEMENT_CLASS_NAME ( "tokenizer"   ) 

void tokenizer::set_comment_chars ( const istring comments  ) 

establishes a set of characters in "comments" as the comment items.

comments will be specially handled by being added to the string table with the comment prefix. this allows them to be regenerated uniquely later.

Definition at line 93 of file tokenizer.cpp.

Referenced by ini_parser::add().

tokenizer & tokenizer::operator= ( const tokenizer to_copy  ) 

makes this tokenizer identical to "to_copy".

Definition at line 111 of file tokenizer.cpp.

References _add_spaces, _assignments, _implementation, _nesting, _quotes, and _separators.

int tokenizer::symbols (  )  const

returns the number of entries in the tokenizer.

Definition at line 91 of file tokenizer.cpp.

References symbol_table< contents >::symbols().

Referenced by write_build_config::execute(), and write_build_config::output_definition_macro().

void tokenizer::reset (  ) 

clears all of the entries out.

Definition at line 105 of file tokenizer.cpp.

References symbol_table< contents >::reset().

const string_table & tokenizer::table (  )  const

provides a constant peek at the string_table holding the values.

Definition at line 107 of file tokenizer.cpp.

Referenced by ini_parser::add(), write_build_config::execute(), ini_configurator::get_section(), write_build_config::output_definition_macro(), and ini_configurator::put_section().

string_table & tokenizer::table (  ) 

provides direct access to the string_table holding the values.

Definition at line 109 of file tokenizer.cpp.

void tokenizer::parse ( const istring to_tokenize  ) 

parses the string using our established sentinel characters.

attempts to snag as many value/pairs from "to_tokenize" as are possible by using the current separator and assignment characters. E.G.: if the separator is ';' and the assignment character is '=', then one's string would look something like:

      TEMP=c:\tmp; GLOB=c:\glob.exe; ....  
whitespace is ignored if it's found (1) after a separator and before the next variable name, (2) after the variable name and before the assignment character, (3) after the assignment character and before the value. this unfortunately implies that white space cannot begin or end a value. NOTE: unpredictable results will occur: if one's variables are improperly formed, if assignment operators are missing or misplaced, or if the separator character is used within the value. NOTE: carriage returns are considered white-space and can exist in the string as described above. NOTE: parse is additive; if multiple calls to parse() occur, then the symbol_table will be built from the most recent values found in the parameters to parse(). if this is not desired, the symbol table's reset() function can be used to empty out all variables.

Definition at line 166 of file tokenizer.cpp.

References symbol_table< contents >::add(), assignment(), CHOP, comment_char(), COOL, istring::end(), FUNCDEF, parser_bits::is_eol(), is_eol_a_separator(), stack< contents >::kind(), okay_for_variable_name(), stack< contents >::pop(), stack< contents >::push(), quote_mark(), istring::reset(), separator(), stack< contents >::size(), SPECIAL_VALUE, STRTAB_COMMENT_PREFIX, istring::t(), stack< contents >::top(), parser_bits::white_space(), and parser_bits::white_space_no_cr().

Referenced by ini_parser::add(), write_build_config::execute(), ini_configurator::get_section(), write_build_config::output_definition_macro(), and application_config::parse_startup_entry().

istring tokenizer::find ( const istring name  )  const

locates the value for a variable named "name" if it exists.

if "name" doesn't exist, then it returns an empty string. note that an empty string might also indicate that the value is blank; locate is the way to tell if a field is really missing. also note that when a variable name is followed by an assignment operator and an empty value (e.g., "avversione=" has no value), then a value of a single space character will be stored. this ensures that the same format is used on the output side, but it also means that if you access the table directly, then you will get a space as the value. however, this function returns an empty string for those entries to keep consistent with expectations.

Definition at line 123 of file tokenizer.cpp.

References symbol_table< contents >::find(), and SPECIAL_VALUE.

Referenced by application_config::parse_startup_entry().

bool tokenizer::exists ( const istring name  )  const

returns true if the "name" exists in the tokenizer.

Definition at line 102 of file tokenizer.cpp.

References symbol_table< contents >::find().

istring tokenizer::text_form (  )  const [virtual]

creates a new token list as a string of text.

the first separator and assignment characters in each set are used to generate it. note that the whitespace that existed in the original parsed string might not be exactly the same in the generated string.

Reimplemented from object_base.

Definition at line 381 of file tokenizer.cpp.

Referenced by ini_parser::add(), and ini_configurator::put_section().

void tokenizer::text_form ( istring to_fill  )  const

like text_form() above, but stores into "to_fill".

Definition at line 347 of file tokenizer.cpp.

References istring::end(), istring::get(), string_table::is_comment(), is_eol_a_separator(), symbol_table< contents >::name(), log_base::platform_ending(), istring::reset(), symbol_table< contents >::symbols(), and istring::zap().

bool tokenizer::add_spaces (  )  const [inline]

Definition at line 142 of file tokenizer.h.

void tokenizer::add_spaces ( bool  add_them  )  [inline]

Definition at line 143 of file tokenizer.h.

bool tokenizer::okay_for_variable_name ( char  to_check  )  const

true if "to_check" is a valid variable name character.

this includes any characters besides separators and assignments.

Definition at line 133 of file tokenizer.cpp.

References assignment(), and separator().

Referenced by parse().

const istring & tokenizer::assignments (  )  const

provides a peek at the assignments list.

Definition at line 96 of file tokenizer.cpp.

const istring & tokenizer::separators (  )  const

provides a peek at the separators list.

Definition at line 98 of file tokenizer.cpp.

const istring & tokenizer::quotes (  )  const

provides a peek at the quotes list.

Definition at line 100 of file tokenizer.cpp.

bool tokenizer::assignment ( char  to_check  )  const

true if "to_check" is a valid assignment operator.

Definition at line 148 of file tokenizer.cpp.

References istring::matches().

Referenced by okay_for_variable_name(), and parse().

bool tokenizer::separator ( char  to_check  )  const

true if "to_check" is a valid separator.

Definition at line 139 of file tokenizer.cpp.

References parser_bits::is_eol(), and istring::matches().

Referenced by okay_for_variable_name(), and parse().

bool tokenizer::comment_char ( char  to_check  )  const

true if "to_check" is a registered comment character.

Definition at line 154 of file tokenizer.cpp.

References istring::matches().

Referenced by parse().

bool tokenizer::is_eol_a_separator (  )  const

reports whether any of the separators are an EOL character.

Definition at line 337 of file tokenizer.cpp.

References istring::get(), parser_bits::is_eol(), and istring::length().

Referenced by parse(), and text_form().

bool tokenizer::quote_mark ( char  to_check  )  const

true if "to_check" is a member of the quotes list.

Definition at line 151 of file tokenizer.cpp.

References istring::matches().

Referenced by parse().


The documentation for this class was generated from the following files:
Generated on Fri Sep 5 04:30:54 2008 for HOOPLE Libraries by  doxygen 1.5.1