UCommon
Protected Member Functions
ucommon::UString Class Reference

A copy-on-write utf8 string class that operates by reference count. More...

#include <unicode.h>

Inheritance diagram for ucommon::UString:
Inheritance graph
[legend]
Collaboration diagram for ucommon::UString:
Collaboration graph
[legend]

Protected Member Functions

void add (const unicode_t unicode)
 Add (append) unicode to a utf8 encoded string. More...
 
ucs4_t at (int position) const
 Return unicode character found at a specific codepoint in the string. More...
 
unsigned ccount (ucs4_t character) const
 Count occurrences of a unicode character in string. More...
 
UString copy (size_t offset, size_t size) const
 Convenience method for substring extraction. More...
 
size_t count (void) const
 Count codepoints in current string. More...
 
void cut (size_t offset, size_t size=0)
 Cut (remove) text from string using codepoint offsets. More...
 
const char * find (ucs4_t character, size_t start=0) const
 Find first occurrence of character in string. More...
 
UString get (size_t codepoint, size_t size=0) const
 Get a new string object as a substring of the current object. More...
 
size_t get (unicode_t unicode, size_t size) const
 Extract a unicode byte sequence from utf8 object. More...
 
UString left (size_t size) const
 Convenience method for left of string. More...
 
size_t operator() (unicode_t unicode, size_t size) const
 Extract a unicode byte sequence from utf8 object. More...
 
UString operator() (int codepoint, size_t size) const
 Get a new substring through object expression. More...
 
const char * operator() (int offset) const
 Reference a string in the object by codepoint offset. More...
 
ucs4_t operator[] (int position) const
 Reference a unicode character in string object by array offset. More...
 
void paste (size_t offset, const char *text, size_t size=0)
 Insert (paste) text into string using codepoint offsets. More...
 
const char * rfind (ucs4_t character, size_t end=npos) const
 Find last occurrence of character in string. More...
 
UString right (size_t offset) const
 Convenience method for right of string. More...
 
void set (const unicode_t unicode)
 Set a utf8 encoded string based on unicode data. More...
 
 UString ()
 Create a new empty utf8 aware string object.
 
 UString (size_t size)
 Create an empty string with a buffer pre-allocated to a specified size. More...
 
 UString (const unicode_t text)
 Create a utf8 aware string for a null terminated unicode string. More...
 
 UString (const char *text, size_t size)
 Create a string from null terminated text up to a maximum specified size. More...
 
 UString (const unicode_t *text, const unicode_t *end)
 Create a string for a substring. More...
 
 UString (const UString &existing)
 Construct a copy of a string object. More...
 
virtual ~UString ()
 Destroy string. More...
 
- Protected Member Functions inherited from ucommon::String
virtual cstring * c_copy (void) const
 Return cstring to use in copy constructors. More...
 
virtual void cow (size_t size=0)
 Copy on write operation for cstring. More...
 
cstring * create (size_t size) const
 Factory create a cstring object of specified size. More...
 
bool equal (const char *string) const
 Test if two string values are equal. More...
 
size_t getStringSize (void) const
 
virtual void release (void)
 Decrease retention of our reference counted cstring. More...
 
virtual void retain (void)
 Increase retention of our reference counted cstring. More...
 
- Protected Member Functions inherited from ucommon::utf8
 utf8 (const utf8 &copy)
 

Additional Inherited Members

- Public Types inherited from ucommon::String
enum  { SENSITIVE = 0x00, INSENSITIVE = 0x01 }
 This is an internal class which contains the actual string data along with some control fields. More...
 
- Public Member Functions inherited from ucommon::String
int __isoc99_scanf (const char *format,...)
 Scan input items from a string object. More...
 
int __isoc99_vscanf (const char *format, va_list args)
 Scan input items from a string object. More...
 
void add (const char *text)
 Append null terminated text to our string buffer. More...
 
void add (char character)
 Append a single character to our string buffer. More...
 
char at (int position) const
 Return character found at a specific position in the string. More...
 
const char * begin (void) const
 Get pointer to first character in string for iteration. More...
 
char * c_mem ()
 
const char * c_str (void) const
 Get character text buffer of string object. More...
 
size_t ccount (const char *list) const
 Count number of occurrences of characters in string. More...
 
void chop (const char *list)
 Chop trailing characters from the string. More...
 
void chop (size_t count=1)
 Chop trailing characters from text. More...
 
const char * chr (char character) const
 Find pointer in string where specified character appears. More...
 
void clear (size_t offset)
 Clear a field of a filled string with filler. More...
 
void clear (void)
 Clear string by setting to empty.
 
virtual int compare (const char *string) const
 Compare the values of two string. More...
 
String copy (size_t offset, size_t size) const
 Convenience method for substring extraction. More...
 
size_t count (void) const
 Count all characters in the string (strlen). More...
 
void cut (size_t offset, size_t size=0)
 Cut (remove) text from string. More...
 
char * data (void)
 Get memory text buffer of string object. More...
 
const char * end (void) const
 Get pointer to last character in string for iteration. More...
 
void erase (void)
 Erase string memory.
 
void fill (size_t size, char fill)
 
const char * find (const char *list, size_t offset=0) const
 Find a character in the string. More...
 
bool full (void) const
 Test if the string's allocated space is all used up. More...
 
String get (size_t offset, size_t size=0) const
 Get a new string object as a substring of the current object. More...
 
String left (size_t size) const
 Convenience method for left of string. More...
 
size_t len (void) const
 Get length of string. More...
 
void lower (void)
 Convert string to lower case.
 
size_t offset (const char *pointer) const
 Find offset of a pointer into our string buffer. More...
 
Stringoperator & (const char *text)
 Concatenate null terminated text to our object. More...
 
const char * operator * () const
 Reference raw text buffer by pointer operator. More...
 
bool operator *= (const char *substring)
 
bool operator *= (regex &expr)
 
Stringoperator *= (size_t number)
 Delete a specified number of characters from start of string. More...
 
 operator bool () const
 Test if string has data. More...
 
 operator const char * () const
 Casting reference to raw text string. More...
 
bool operator! () const
 Test if string is empty. More...
 
bool operator!= (const char *text) const
 Compare our object with null terminated text. More...
 
Stringoperator% (short &value)
 Parse short integer value from a string. More...
 
Stringoperator% (unsigned short &value)
 Parse short integer value from a string. More...
 
Stringoperator% (long &value)
 Parse long integer value from a string. More...
 
Stringoperator% (unsigned long &value)
 Parse long integer value from a string. More...
 
Stringoperator% (double &value)
 Parse double value from a string. More...
 
Stringoperator% (const char *text)
 Parse text from a string in a scan expression. More...
 
Stringoperator&= (const char *text)
 
String operator() (int offset, size_t size) const
 Get a new substring through object expression. More...
 
const char * operator() (int offset) const
 Reference a string in the object by relative offset. More...
 
const String operator+ (const char *text) const
 Concatenate null terminated text to our object. More...
 
Stringoperator++ (void)
 Delete first character from string.
 
Stringoperator+= (const char *text)
 Concatenate text to an existing string object. More...
 
Stringoperator+= (size_t number)
 Delete a specified number of characters from start of string. More...
 
Stringoperator-- (void)
 Delete last character from string.
 
Stringoperator-= (size_t number)
 Delete a specified number of characters from end of string. More...
 
bool operator< (const char *text) const
 Compare our object with null terminated text. More...
 
Stringoperator<< (const char *text)
 
Stringoperator<< (char code)
 
bool operator<= (const char *text) const
 Compare our object with null terminated text. More...
 
Stringoperator= (const String &object)
 Assign our string with the cstring of another object. More...
 
Stringoperator= (const char *text)
 Assign text to our existing buffer. More...
 
bool operator== (const char *text) const
 Compare our object with null terminated text. More...
 
bool operator> (const char *text) const
 Compare our object with null terminated text. More...
 
bool operator>= (const char *text) const
 Compare our object with null terminated text. More...
 
const char operator[] (int offset) const
 Reference a single character in string object by array offset. More...
 
Stringoperator^= (const String &object)
 Create new cow instance and assign value from another string object. More...
 
Stringoperator^= (const char *text)
 Create new cow instance and assign value from null terminated text. More...
 
Stringoperator| (const char *text)
 Concatenate null terminated text to our object. More...
 
Stringoperator|= (const char *text)
 Concatenate text to an existing string object. More...
 
void paste (size_t offset, const char *text, size_t size=0)
 Insert (paste) text into string. More...
 
size_t printf (const char *format,...)
 Print items into a string object. More...
 
const char * rchr (char character) const
 Find pointer in string where specified character last appears. More...
 
unsigned replace (const char *string, const char *text=NULL, unsigned flags=0)
 
unsigned replace (regex &expr, const char *text=NULL, unsigned flags=0)
 
virtual bool resize (size_t size)
 Resize and re-allocate string memory. More...
 
const char * rfind (const char *list, size_t offset=npos) const
 Find last occurrence of character in the string. More...
 
String right (size_t offset) const
 Convenience method for right of string. More...
 
void rset (const char *text, char overflow, size_t offset, size_t size=0)
 Set a text field within our string object offset from the end of buffer. More...
 
const char * rskip (const char *list, size_t offset=npos) const
 Skip trailing characters in the string. More...
 
void rsplit (const char *pointer)
 Split the string by a pointer position. More...
 
void rsplit (size_t offset)
 Split the string at a specific offset. More...
 
const char * search (const char *string, unsigned instance=0, unsigned flags=0) const
 Search for a substring in the string. More...
 
const char * search (regex &expr, unsigned instance=0, unsigned flags=0) const
 
void set (const char *text)
 Set string object to text of a null terminated string. More...
 
void set (size_t offset, const char *text, size_t size=0)
 Set a portion of the string object at a specified offset to a text string. More...
 
void set (const char *text, char overflow, size_t offset, size_t size=0)
 Set a text field within our string object. More...
 
size_t size (void) const
 Get the size of currently allocated space for string. More...
 
const char * skip (const char *list, size_t offset=0) const
 Skip lead characters in the string. More...
 
void split (const char *pointer)
 Split the string by a pointer position. More...
 
void split (size_t offset)
 Split the string at a specific offset. More...
 
 String ()
 Create a new empty string object.
 
 String (size_t size)
 Create an empty string with a buffer pre-allocated to a specified size. More...
 
 String (size_t size, const char *format,...)
 Create a string by printf-like formating into a pre-allocated space of a specified size. More...
 
 String (const char *text)
 Create a string from null terminated text. More...
 
 String (const char *text, size_t size)
 Create a string from null terminated text up to a maximum specified size. More...
 
 String (const char *text, const char *end)
 Create a string for a substring. More...
 
 String (const String &existing)
 Construct a copy of a string object. More...
 
void strip (const char *list)
 Strip lead and trailing characters from the string. More...
 
double tod (char **pointer=NULL)
 Convert string to a double value. More...
 
char * token (char **last, const char *list, const char *quote=NULL, const char *end=NULL)
 A thread-safe token parsing routine for strings objects. More...
 
long tol (char **pointer=NULL)
 Convert string to a long value. More...
 
void trim (const char *list)
 Trim lead characters from the string. More...
 
void trim (size_t count=1)
 Trim lead characters from text. More...
 
bool unquote (const char *quote)
 Unquote a quoted string. More...
 
void upper (void)
 Convert string to upper case.
 
size_t vprintf (const char *format, va_list args)
 Print items into a string object. More...
 
virtual ~String ()
 Destroy string. More...
 
- Public Member Functions inherited from ucommon::ObjectProtocol
ObjectProtocolcopy (void)
 Retain (increase retention of) object when copying.
 
void operator++ (void)
 Increase retention operator.
 
void operator-- (void)
 Decrease retention operator.
 
virtual ~ObjectProtocol ()
 Required virtual destructor.
 
- Static Public Member Functions inherited from ucommon::String
static char * add (char *buffer, size_t size, const char *text)
 Safely append a null terminated string into an existing string in memory. More...
 
static char * add (char *buffer, size_t size, const char *text, size_t max)
 Safely append a null terminated string into an existing string in memory. More...
 
static String b64 (const uint8_t *binary, size_t size)
 Standard radix 64 string encoding. More...
 
static size_t b64count (const char *str, bool ws=false)
 
static size_t b64decode (uint8_t *binary, const char *string, size_t size, bool ws=false)
 Standard radix 64 decoding. More...
 
static size_t b64encode (char *string, const uint8_t *binary, size_t size, size_t width=0)
 Standard radix 64 encoding. More...
 
static size_t b64size (size_t size)
 
static unsigned ccount (const char *text, const char *list)
 Count instances of characters in a list in a text buffer. More...
 
static bool check (const char *string, size_t maximum, size_t minimum=0)
 Check if string is valid and in specific constraints. More...
 
static char * chop (char *text, const char *list)
 Strip trailing characters from the text string. More...
 
static int collate (const char *text1, const char *text2)
 
static int compare (const char *text1, const char *text2)
 Safe string collation function. More...
 
static int compare (const char *text1, const char *text2, size_t size)
 Depreciated string comparison function. More...
 
static char * copy (const char *text, size_t offset, size_t len)
 
static size_t count (const char *text)
 Safe version of strlen function. More...
 
static uint16_t crc16 (uint8_t *binary, size_t size)
 ccitt 16 bit crc for binary data. More...
 
static uint32_t crc24 (uint8_t *binary, size_t size)
 24 bit crc as used in openpgp. More...
 
static void cut (char *text, size_t offset, size_t len)
 
static char * dup (const char *text)
 Duplicate null terminated text into the heap. More...
 
static bool eq_case (const char *text1, const char *text2)
 Simple case insensitive equal test for strings. More...
 
static bool eq_case (const char *text1, const char *text2, size_t size)
 Simple case insensitive equal test for strings. More...
 
static bool equal (const char *text1, const char *text2)
 Simple equal test for strings. More...
 
static bool equal (const char *text1, const char *text2, size_t size)
 Simple equal test for strings. More...
 
static void erase (char *text)
 Erase string memory. More...
 
static char * fill (char *text, size_t size, char character)
 Fill a section of memory with a fixed text character. More...
 
static const char * find (const char *text, const char *key, const char *optional)
 Find position of substring within a string. More...
 
static char * find (char *text, const char *list)
 Find the first occurrence of a character in a text buffer. More...
 
static void fix (String &object)
 Fix and reset string object filler. More...
 
static String hex (const uint8_t *binary, size_t size)
 Convert binary data buffer into hex string. More...
 
static size_t hex2bin (const char *string, uint8_t *binary, size_t maxsize, bool wsflag=false)
 
static size_t hexcount (const char *str, bool ws=false)
 
static size_t hexdump (const uint8_t *binary, char *string, const char *format)
 Dump hex data to a string buffer. More...
 
static size_t hexpack (uint8_t *binary, const char *string, const char *format)
 Pack hex data from a string buffer. More...
 
static size_t hexsize (const char *format)
 
static const char * ifind (const char *text, const char *key, const char *optional)
 Find position of case insensitive substring within a string. More...
 
static char * left (const char *text, size_t size)
 Duplicate null terminated text of specific size to heap. More...
 
static void lower (char *text)
 Convert null terminated text to lower case. More...
 
static void paste (char *text, size_t max, size_t offset, const char *data, size_t len=0)
 
static const char * pos (const char *text, ssize_t offset)
 Compute position in string. More...
 
static char * rfind (char *text, const char *list)
 Find the last occurrence of a character in a text buffer. More...
 
static char * right (const char *text, size_t size)
 
static char * rset (char *buffer, size_t size, const char *text)
 Set a field in a null terminated string relative to the end of text. More...
 
static char * rskip (char *text, const char *list)
 Skip before trailing characters in a null terminated string. More...
 
static size_t seek (char *text, const char *list)
 Offset until next occurrence of character in a text or length. More...
 
static char * set (char *buffer, size_t size, const char *text)
 Safely set a null terminated string buffer in memory. More...
 
static char * set (char *buffer, size_t size, const char *text, size_t max)
 Safely set a null terminated string buffer in memory. More...
 
static char * skip (char *text, const char *list)
 Skip after lead characters in a null terminated string. More...
 
static char * strip (char *text, const char *list)
 Skip lead and remove trailing characters from a text string. More...
 
static void swap (String &object1, String &object2)
 Swap the cstring references between two strings. More...
 
static double tod (const char *text, char **pointer=NULL)
 Convert text to a double value. More...
 
static char * token (char *text, char **last, const char *list, const char *quote=NULL, const char *end=NULL)
 A thread-safe token parsing routine for null terminated strings. More...
 
static long tol (const char *text, char **pointer=NULL)
 Convert text to a long value. More...
 
static char * trim (char *text, const char *list)
 Return start of string after characters to trim from beginning. More...
 
static char * unquote (char *text, const char *quote)
 Unquote a quoted null terminated string. More...
 
static void upper (char *text)
 Convert null terminated text to upper case. More...
 
- Static Public Member Functions inherited from ucommon::utf8
static unsigned ccount (const char *string, ucs4_t character)
 Count occurrences of a unicode character in string. More...
 
static size_t chars (const unicode_t string)
 How many chars requires to encode a given wchar string. More...
 
static size_t chars (ucs4_t character)
 How many chars requires to encode a given unicode character. More...
 
static ucs4_t codepoint (const char *encoded)
 Convert a utf8 encoded codepoint to a ucs4 character value. More...
 
static size_t count (const char *string)
 Count ut8 encoded ucs4 codepoints in string. More...
 
static const char * find (const char *string, ucs4_t character, size_t start=0)
 Find first occurance of character in string. More...
 
static ucs4_t get (const char *cp)
 Get a unicode character from a character protocol. More...
 
static char * offset (char *string, ssize_t position)
 Get codepoint offset in a string. More...
 
static size_t pack (unicode_t unicode, const char *cp, size_t len)
 Convert a utf8 string into a unicode data buffer. More...
 
static void put (ucs4_t character, char *buf)
 Push a unicode character to a character protocol. More...
 
static const char * rfind (const char *string, ucs4_t character, size_t end=(size_t) -1l)
 Find last occurrence of character in string. More...
 
static unsigned size (const char *codepoint)
 Compute character size of utf8 string codepoint. More...
 
static ucs4_tudup (const char *string)
 Dup a utf8 string into a ucs4_t string.
 
static size_t unpack (const unicode_t string, char *text, size_t size)
 Convert a unicode string into utf8. More...
 
static ucs2_twdup (const char *string)
 Dup a utf8 string into a ucs2_t representation.
 
- Static Public Attributes inherited from ucommon::String
static const char eos = '\0'
 
static const size_t npos = ((size_t)-1)
 
- Static Public Attributes inherited from ucommon::utf8
static const char * nil
 A convenient NULL pointer value.
 
static const unsigned ucsize
 Size of "unicode_t" character codes, may not be ucs4_t size.
 
- Protected Attributes inherited from ucommon::String
cstring * str
 cstring instance our object references. More...
 

Detailed Description

A copy-on-write utf8 string class that operates by reference count.

This is derived from the classic uCommon String class by adding operations that are utf8 encoding aware.

Author
David Sugar dyfet.nosp@m.@gnu.nosp@m.telep.nosp@m.hony.nosp@m..org

Definition at line 203 of file unicode.h.

Constructor & Destructor Documentation

◆ UString() [1/5]

ucommon::UString::UString ( size_t  size)
protected

Create an empty string with a buffer pre-allocated to a specified size.

Parameters
sizeof buffer to allocate.

◆ UString() [2/5]

ucommon::UString::UString ( const unicode_t  text)
protected

Create a utf8 aware string for a null terminated unicode string.

Parameters
textof ucs4 encoded data.

◆ UString() [3/5]

ucommon::UString::UString ( const char *  text,
size_t  size 
)
protected

Create a string from null terminated text up to a maximum specified size.

Parameters
textto use for string.
sizelimit of new string.

◆ UString() [4/5]

ucommon::UString::UString ( const unicode_t text,
const unicode_t end 
)
protected

Create a string for a substring.

The end of the substring is a pointer within the substring itself.

Parameters
textto use for string.
endof text in substring.

◆ UString() [5/5]

ucommon::UString::UString ( const UString existing)
protected

Construct a copy of a string object.

Our copy inherits the same reference counted instance of cstring as in the original.

Parameters
existingstring to copy from.

◆ ~UString()

virtual ucommon::UString::~UString ( )
protectedvirtual

Destroy string.

De-reference cstring. If last reference to cstring, then also remove cstring from heap.

Member Function Documentation

◆ add()

void ucommon::UString::add ( const unicode_t  unicode)
protected

Add (append) unicode to a utf8 encoded string.

Parameters
unicodetext to add.

◆ at()

ucs4_t ucommon::UString::at ( int  position) const
protected

Return unicode character found at a specific codepoint in the string.

Parameters
positionof codepoint in string, negative values computed from end.
Returns
character code at specified position in string.

◆ ccount()

unsigned ucommon::UString::ccount ( ucs4_t  character) const
protected

Count occurrences of a unicode character in string.

Parameters
charactercode to search for.
Returns
count of occurrences.

◆ copy()

UString ucommon::UString::copy ( size_t  offset,
size_t  size 
) const
inlineprotected

Convenience method for substring extraction.

Parameters
offsetinto string.
sizeof string to return.
Returns
string object holding substring.

Definition at line 329 of file unicode.h.

◆ count()

size_t ucommon::UString::count ( void  ) const
inlineprotected

Count codepoints in current string.

Returns
count of codepoints.

Definition at line 370 of file unicode.h.

Here is the call graph for this function:

◆ cut()

void ucommon::UString::cut ( size_t  offset,
size_t  size = 0 
)
protected

Cut (remove) text from string using codepoint offsets.

Parameters
offsetto start of text field to remove.
sizeof text field to remove or 0 to remove to end of string.

◆ find()

const char* ucommon::UString::find ( ucs4_t  character,
size_t  start = 0 
) const
protected

Find first occurrence of character in string.

Parameters
charactercode to search for.
startoffset in string in codepoints.
Returns
pointer to first instance or NULL if not found.

◆ get() [1/2]

UString ucommon::UString::get ( size_t  codepoint,
size_t  size = 0 
) const
protected

Get a new string object as a substring of the current object.

Parameters
codepointoffset of substring.
sizeof substring in codepoints or 0 if to end.
Returns
string object holding substring.

◆ get() [2/2]

size_t ucommon::UString::get ( unicode_t  unicode,
size_t  size 
) const
protected

Extract a unicode byte sequence from utf8 object.

Parameters
unicodedata buffer.
sizeof data buffer.
Returns
codepoints copied.

◆ left()

UString ucommon::UString::left ( size_t  size) const
inlineprotected

Convenience method for left of string.

Parameters
sizeof substring to gather in codepoints.
Returns
string object holding substring.

Definition at line 310 of file unicode.h.

◆ operator()() [1/3]

size_t ucommon::UString::operator() ( unicode_t  unicode,
size_t  size 
) const
inlineprotected

Extract a unicode byte sequence from utf8 object.

Parameters
unicodedata buffer.
sizeof data buffer.
Returns
codepoints copied.

Definition at line 293 of file unicode.h.

◆ operator()() [2/3]

UString ucommon::UString::operator() ( int  codepoint,
size_t  size 
) const
protected

Get a new substring through object expression.

Parameters
codepointoffset of substring.
sizeof substring or 0 if to end.
Returns
string object holding substring.

◆ operator()() [3/3]

const char* ucommon::UString::operator() ( int  offset) const
protected

Reference a string in the object by codepoint offset.

Positive offsets are from the start of the string, negative from the end.

Parameters
offsetto string position.
Returns
pointer to string data or NULL if invalid offset.

◆ operator[]()

ucs4_t ucommon::UString::operator[] ( int  position) const
inlineprotected

Reference a unicode character in string object by array offset.

Parameters
positionof codepoint offset to character.
Returns
character value at offset.

Definition at line 362 of file unicode.h.

Here is the call graph for this function:

◆ paste()

void ucommon::UString::paste ( size_t  offset,
const char *  text,
size_t  size = 0 
)
protected

Insert (paste) text into string using codepoint offsets.

Parameters
offsetto start paste.
textto paste.
sizeof text to paste.

◆ rfind()

const char* ucommon::UString::rfind ( ucs4_t  character,
size_t  end = npos 
) const
protected

Find last occurrence of character in string.

Parameters
charactercode to search for.
endoffset to start from in codepoints.
Returns
pointer to last instance or NULL if not found.

◆ right()

UString ucommon::UString::right ( size_t  offset) const
inlineprotected

Convenience method for right of string.

Parameters
offsetof substring from right in codepoints.
Returns
string object holding substring.

Definition at line 319 of file unicode.h.

◆ set()

void ucommon::UString::set ( const unicode_t  unicode)
protected

Set a utf8 encoded string based on unicode data.

Parameters
unicodetext to set.

The documentation for this class was generated from the following file: