D.10.7 Additional functions
- Function: unsigned * utf8_wc_strdup (const unsigned *s)
Returns a pointer to a new wide character string which is a duplicate of the string s. Memory for the new string is obtained with
malloc
(3), and can be freed withfree
(3).
- Function: unsigned * utf8_wc_quote (const unsigned *s)
Quotes occurrences of backslash and double-quote in s by prefixing each of them with a backslash. The return value is allocated using
malloc
(3).
- Function: int utf8_quote (const char *str, char **sptr)
Quotes occurrences of backslash and double-quote in s by prefixing each of them with a backslash. On success stores the result (allocated with
malloc
(3)) in sptr, and returns 0. On error, returns -1 and setserrno
to the one of the following:- ENOMEM
Not enough memory to allocate the return buffer.
- EILSEQ
An invalid wide character is encountered.
- Function: size_t utf8_wc_hash_string (const unsigned *ws, size_t n)
Compute a hash code of ws for a symbol table of n buckets.
- Function: int dico_levenshtein_distance (const char *a, const char *b, int flags)
Computes Levenshtein distance between UTF-8 strings a and b. The flags argument is a bitwise or of one or more flags:
0
Default - compute Levenstein distance, treating both arguments literally.
DICO_LEV_NORM
Treat runs of one or more whitespace characters as a single space character (ASCII 32).
DICO_LEV_DAMERAU
Compute Damerau-Levenshtein distance. This distance takes into account transpositions.
- Function: int dico_soundex (const char *word, char code[DICO_SOUNDEX_SIZE])
Computes the Soundex code for the given word. The code is stored in code. Returns 0 on success, -1 if word is not a valid UTF-8 string.
- Define: DICO_SOUNDEX_SIZE
This macro definition expands to the size of Soundex code buffer, including the terminal zero.
Note that this function silently ignores all characters, except Latin letters.