Warehouses some functions that are often useful during text parsing. More...
#include <parser_bits.h>
Public Types | |
| enum | line_ending { LF_AT_END = -15, CRLF_AT_END, NO_ENDING } |
Line endings is an enumeration of the separator character(s) used for text files. More... | |
Static Public Member Functions | |
| static const char * | eol_to_chars (line_ending ending) |
| returns the C string form for the "ending" value. | |
| static line_ending | platform_eol () |
| provides the appropriate ending on the current OS platform. | |
| static const char * | platform_eol_to_chars () |
| provides the characters that make up this platform's line ending. | |
| static void | translate_CR_for_platform (basis::astring &to_translate) |
| flips embedded EOL characters for this platform's needs. | |
| static basis::astring | substitute_env_vars (const basis::astring &text, bool leave_unknown=true) |
| resolves embedded environment variables in "text". | |
| static bool | is_printable_ascii (char to_check) |
| returns true if "to_check" is a normally visible ASCII character. | |
| static bool | white_space_no_cr (char to_check) |
| reports if "to_check" is white space but not a carriage return. | |
| static bool | is_eol (char to_check) |
| returns true if "to_check" is part of an end-of-line sequence. | |
| static bool | white_space (char to_check) |
| returns true if the character "to_check" is considered a white space. | |
| static bool | is_alphanumeric (char look_at) |
| returns true if "look_at" is one of the alphanumeric characters. | |
| static bool | is_alphanumeric (const char *look_at, int len) |
| returns true if the char ptr "look_at" is all alphanumeric characters. | |
| static bool | is_alphanumeric (const basis::astring &look_at, int len) |
| returns true if the string "look_at" is all alphanumeric characters. | |
| static bool | is_numeric (char look_at) |
| returns true if "look_at" is a valid numerical character. | |
| static bool | is_numeric (const char *look_at, int len) |
| returns true if "look_at" is all valid numerical characters. | |
| static bool | is_numeric (const basis::astring &look_at, int len) |
| returns true if the "look_at" string has only valid numerical chars. | |
| static bool | is_hexadecimal (char look_at) |
| returns true if "look_at" is one of the hexadecimal characters. | |
| static bool | is_hexadecimal (const char *look_at, int len) |
| returns true if "look_at" is all hexadecimal characters. | |
| static bool | is_hexadecimal (const basis::astring &look_at, int len) |
| returns true if the string "look_at" is all hexadecimal characters. | |
| static bool | is_identifier (char look_at) |
| returns true if "look_at" is a valid identifier character. | |
| static bool | is_identifier (const char *look_at, int len) |
| returns true if "look_at" is composed of valid identifier character. | |
| static bool | is_identifier (const basis::astring &look_at, int len) |
| like is_identifier() above but operates on a string. | |
Warehouses some functions that are often useful during text parsing.
Definition at line 24 of file parser_bits.h.
Line endings is an enumeration of the separator character(s) used for text files.
on unix, every line in a text file has a line feed (LF) character appended to the line. on ms-dos and ms-windows, each line has a carriage return (CR) and line feed (LF) appended instead. a synonym for the line_ending is "eol" which stands for "end of line".
| LF_AT_END |
Unix standard is LF_AT_END ("\n"). |
| CRLF_AT_END |
DOS standard is CRLF_AT_END ("\r\n"). |
| NO_ENDING |
No additional characters added as line endings. |
Definition at line 31 of file parser_bits.h.
| const char * textual::parser_bits::eol_to_chars | ( | line_ending | ending | ) | [static] |
returns the C string form for the "ending" value.
Definition at line 45 of file parser_bits.cpp.
References CRLF_AT_END, LF_AT_END, and NO_ENDING.
Referenced by loggers::eol_aware::get_ending(), and platform_eol_to_chars().
| bool textual::parser_bits::is_alphanumeric | ( | const basis::astring & | look_at, | |
| int | len | |||
| ) | [static] |
returns true if the string "look_at" is all alphanumeric characters.
Definition at line 135 of file parser_bits.cpp.
References is_alphanumeric(), and basis::astring::observe().
| bool textual::parser_bits::is_alphanumeric | ( | const char * | look_at, | |
| int | len | |||
| ) | [static] |
returns true if the char ptr "look_at" is all alphanumeric characters.
Definition at line 128 of file parser_bits.cpp.
References is_alphanumeric().
| bool textual::parser_bits::is_alphanumeric | ( | char | look_at | ) | [static] |
returns true if "look_at" is one of the alphanumeric characters.
This includes a to z in either case and 0 to 9.
Definition at line 121 of file parser_bits.cpp.
References basis::range_check().
Referenced by is_alphanumeric().
| bool textual::parser_bits::is_eol | ( | char | to_check | ) | [static] |
returns true if "to_check" is part of an end-of-line sequence.
this returns true for both the '' and '
' characters.
Definition at line 68 of file parser_bits.cpp.
Referenced by textual::string_manipulation::carriage_returns_to_spaces(), textual::string_manipulation::split_lines(), and white_space().
| bool textual::parser_bits::is_hexadecimal | ( | const basis::astring & | look_at, | |
| int | len | |||
| ) | [static] |
returns true if the string "look_at" is all hexadecimal characters.
Definition at line 118 of file parser_bits.cpp.
References is_hexadecimal(), and basis::astring::observe().
| bool textual::parser_bits::is_hexadecimal | ( | const char * | look_at, | |
| int | len | |||
| ) | [static] |
returns true if "look_at" is all hexadecimal characters.
Definition at line 111 of file parser_bits.cpp.
References is_hexadecimal().
| bool textual::parser_bits::is_hexadecimal | ( | char | look_at | ) | [static] |
returns true if "look_at" is one of the hexadecimal characters.
This includes a to f in either case and 0 to 9.
Definition at line 104 of file parser_bits.cpp.
References basis::range_check().
Referenced by is_hexadecimal().
| bool textual::parser_bits::is_identifier | ( | const basis::astring & | look_at, | |
| int | len | |||
| ) | [static] |
like is_identifier() above but operates on a string.
Definition at line 154 of file parser_bits.cpp.
References is_identifier(), and basis::astring::observe().
| bool textual::parser_bits::is_identifier | ( | const char * | look_at, | |
| int | len | |||
| ) | [static] |
returns true if "look_at" is composed of valid identifier character.
additionally, identifiers cannot start with a number.
Definition at line 146 of file parser_bits.cpp.
References is_identifier(), and is_numeric().
| bool textual::parser_bits::is_identifier | ( | char | look_at | ) | [static] |
returns true if "look_at" is a valid identifier character.
this just allows alphanumeric characters and underscore.
Definition at line 138 of file parser_bits.cpp.
References basis::range_check().
Referenced by is_identifier(), and substitute_env_vars().
| bool textual::parser_bits::is_numeric | ( | const basis::astring & | look_at, | |
| int | len | |||
| ) | [static] |
returns true if the "look_at" string has only valid numerical chars.
Definition at line 171 of file parser_bits.cpp.
References is_numeric(), and basis::astring::observe().
| bool textual::parser_bits::is_numeric | ( | const char * | look_at, | |
| int | len | |||
| ) | [static] |
returns true if "look_at" is all valid numerical characters.
this allows the '-' character for negative numbers also (but only for first character if the char* or astring versions are used). does not support floating point numbers or exponential notation yet.
Definition at line 162 of file parser_bits.cpp.
References is_numeric().
| bool textual::parser_bits::is_numeric | ( | char | look_at | ) | [static] |
returns true if "look_at" is a valid numerical character.
Definition at line 157 of file parser_bits.cpp.
References basis::range_check().
Referenced by is_identifier(), and is_numeric().
| bool textual::parser_bits::is_printable_ascii | ( | char | to_check | ) | [static] |
returns true if "to_check" is a normally visible ASCII character.
this is defined very simply by it being within the range of 32 to 126. that entire range should be printable in ASCII. before 32 we have control characters. after 126 we have potentially freakish looking characters. this is obviously not appropriate for utf-8 or unicode.
Definition at line 62 of file parser_bits.cpp.
| parser_bits::line_ending textual::parser_bits::platform_eol | ( | ) | [static] |
provides the appropriate ending on the current OS platform.
Definition at line 31 of file parser_bits.cpp.
References CRLF_AT_END, and LF_AT_END.
Referenced by platform_eol_to_chars(), and translate_CR_for_platform().
| const char * textual::parser_bits::platform_eol_to_chars | ( | ) | [static] |
provides the characters that make up this platform's line ending.
Definition at line 59 of file parser_bits.cpp.
References eol_to_chars(), and platform_eol().
Referenced by textual::xml_generator::add_content(), textual::xml_generator::generate(), and textual::string_manipulation::split_lines().
| astring textual::parser_bits::substitute_env_vars | ( | const basis::astring & | text, | |
| bool | leave_unknown = true | |||
| ) | [static] |
resolves embedded environment variables in "text".
replaces the names of any environment variables in "text" with the variable's value and returns the resulting string. the variable names are marked by a single dollar before an alphanumeric identifier (underscores are valid), for example: $PATH if the "leave_unknown" flag is true, then any unmatched variables are left in the text with a question mark instead of a dollar sign. if it's false, then they are simply replaced with nothing at all.
Definition at line 174 of file parser_bits.cpp.
References basis::astring::find(), basis::astring::insert(), is_identifier(), basis::astring::length(), basis::negative(), basis::astring::substring(), basis::astring::t(), and basis::astring::zap().
| void textual::parser_bits::translate_CR_for_platform | ( | basis::astring & | to_translate | ) | [static] |
flips embedded EOL characters for this platform's needs.
runs through the string "to_translate" and changes any CR or CRLF combinations into the EOL (end-of-line) character that's appropriate for this operating system.
Definition at line 74 of file parser_bits.cpp.
References CRLF_AT_END, basis::astring::end(), basis::astring::insert(), platform_eol(), and basis::astring::zap().
| bool textual::parser_bits::white_space | ( | char | to_check | ) | [static] |
returns true if the character "to_check" is considered a white space.
this set includes tab (''), space (' '), carriage return ('
'), and line feed ('').
Definition at line 71 of file parser_bits.cpp.
References is_eol(), and white_space_no_cr().
Referenced by textual::list_parsing::parse_csv_line(), and textual::string_manipulation::split_lines().
| bool textual::parser_bits::white_space_no_cr | ( | char | to_check | ) | [static] |
reports if "to_check" is white space but not a carriage return.
returns true if the character "to_check" is considered a white space, but is not part of an end of line combo (both '
' and '' are disallowed). the allowed set includes tab ('') and space (' ') only.
Definition at line 65 of file parser_bits.cpp.
Referenced by white_space().
1.6.3