String
ClassObjects of String
class contain arrays of ASCII characters.
The value of a String
object is similar to the C language
organization of strings as a NUL terminated array of char
values.
In most cases, String
objects can be used like a collection of
Character
objects. The overloaded operators ++
,
--
, +
, and -
, all work similarly to the operators in
List
or AssociativeArray
objects.
Some of String
classes’ methods add semantics to operators,
like the +=
method, which behaves differently depending on
whether its argument is another String
object, or an
Integer
object.
myString = "Hello, "; /* The resulting value is, */ myString += "world!"; /* "Hello, world!" */ myString = "Hello, "; /* The resulting value is, */ myString += 3; /* "lo, " */
The main exception to this is the map
method, which doesn’t
allow incrementing self
within an argument block. This is
because String
objects don’t use Key
objects internally
to order a String
object’s individual Character
objects.
If it’s necessary to treat a String
object as a collection, the
asList
method will organize the receiver String
into a
List
of Character
objects.
Conversely, Array
and List
classes contain the
asString
method, which translates an Array
or List
into a String object.
In addition, methods like matchRegex
, =~
, and !~
can
accept as arguments strings that contain regular expression
metacharacters and use them to perform regular expression matches on
the receiver String
. See Pattern Matching.
value
The value is a pointer to the character string.
*
(void
)
When used as a prefix operator, overloads C’s ‘*’ dereference
operator and returns the first element of the receiver, a
Character
object.
=
(char *
s)
Set the value of the receiver object to s.
==
(char *
s)
Return TRUE
if s and the receiver are identical,
FALSE
otherwise.
=~
(char *
pattern)
Returns a Boolean
value of true
if the receiver contains
the regular expression pattern, false otherwise.
See Pattern Matching.
!~
(char *
pattern)
Returns a Boolean
value of false
if the receiver
does not contain the argument, pattern, which may
contain regular expression metacharacters.
See Pattern Matching.
!=
(char *
s)
Return FALSE
if s and the receiver are not identical,
TRUE
otherwise.
!=
(char *
s)
Return FALSE
if s and the receiver are not identical,
TRUE
otherwise.
+
(String
s)
+
(Integer
i)
If the argument is a String
, concatenate the receiver and
s and return the new String.
If the argument is an
Integer
, return a reference to the receiver plus i.
++
(void
)
Increment the value of the receiver as a char *
. This method
uses __ctalkIncStringRef () to handle the pointer math.
In other words, this method effectively sets the receiver
String's
value from, for example, ‘Hello, world!’ to
‘ello, world!’. If the receiver is incremented to the end of its
contents, then its value is NULL
.
+=
(String
s)
+=
(Integer
i)
If the argument is an Integer
, increment the reference to the
receiver by that amount. If the argument is a String
or any
other class, concatenate the argument to the receiver and return the
receiver, formatting it as a string first if necessary.
-
(Integer
i)
Return a reference to the receiver String
minus i.
If the reference is before the start of the string, return NULL.
That means the method is only effective after a call to ++
or a similar method.
String new str; str = "Hello, world!"; str += 1; printf ("%s\n", str); /* Prints, "ello, world!" */ --str; printf ("%s\n", str); /* Prints, "Hello, world!" */
--
(void
)
Decrement the value of the receiver as a char *
. The effect is
the converse of ++
, above. The method doesn’t decrement the
reference so that it points before the beginning of the String
object’s contents. That means, like -
above, the method only
returns a pointer to somewhere in the receiver’s value after a
previous call to ++
or a similar method. For example,
String new str; str = "Hello, world!"; ++str; printf ("%s\n", str); /* Prints, "ello, world!" */ --str; printf ("%s\n", str); /* Prints, "Hello, world!" */
-=
(Integer
i)
If the argument is an Integer
, decrement the reference
to the receiver’s value by the amount given as the argument,
an Integer
. Like the other methods that decrement
the reference to the receiver’s value, the program must first
have incremented it further than the start of the string.
asExpanded
(void
)
Return the expanded directory path for a directory glob pattern contained in the receiver.
asInteger
(void
)
Return an Integer
object with the value of the receiver.
asList
(List
newList)
Store each character of the receiver String
as
Character
object members of newList.
at
(int
index)
Return the character at index. The first character of the string is at index 0. If index is greater than the length of the string, return ‘NULL’.
atPut
(int
n, char
c)
Replace the n’th character of the receiver with c.
Has no effect and returns NULL
if n is greater than the length
of the receiver.
The atPut
method interprets the following character sequences
(with their ASCII values)
Sequence ASCII Value \0 0 \a 7 \b 7 \n 10 \e 27 \f 10 \r 13 \t 9 \v 11
The ‘\e’ escape sequence is an extension to the C language standard.
The method returns the receiver (with the new value) if successful.
You should note that the method does not do any conversion of the argument;
that is, if c isn’t a Character
object, then the results
are probably not going to be what you want. For example, if you try
to store an Integer
in a String
, like this:
myInt = 1; myString atPut 0, myInt + '0';
The results aren’t going to be what you want; adding ASCII ‘'0'’
doesn’t convert myInt
to a Character
object. You
still need to use the asCharacter
method from Magnitude
class to create a Character
object, as in this example.
myInt = 1; myString atPut 0, (myInt + '0') asCharacter;
The parentheses in the second argument are necessary; otherwise,
asCharacter
would use ‘'0'’ as its receiver because
asCharacter
, which is a method message, has a higher precedence
than ‘+’. Instead, asCharacter's
receiver should be the
value of ‘myInt + '0'’, so we enclose the first part expression
in parentheses so it gets evaluated first.
callStackTrace
(void
)
Print a call stack trace.
charPos
(char
c)
Return an Integer
with the position of c in the receiver.
Returns an Integer
between 0 (the first character) and the
receiver’s length, minus one (the last character). If the receiver
does not contain c, returns -1.
charPosR
(char
c)
Return an Integer
with the position of the last occurence of
c in the receiver. Returns an Integer
between 0
(the first character) and the receiver’s length, minus one
(the last character). If the receiver does not contain c,
returns -1.
chomp
(void
)
Removes a trailing newline character (‘\n’) if the receiver contains one. Named after Perl’s very useful string trimming function.
consoleReadLine
(String
promptStr)
Print the promptStr on the terminal and wait for the user to
enter a line of text. If Ctalk is built with the GNU readline
libraries, adds readline’s standard line editing and command history
facilities. In that case, Ctalk also defines the
HAVE_GNU_READLINE
preprocessor definition to ‘1’. You can
build Ctalk with or without readline; see the options to
./configure
for further information.
Here is a sample program that shows how to use consoleReadLine
.
int main (int argc, char **argv) String new s; String new promptStr; if (argc > 1) promptStr = argv[1]; else promptStr = "Prompt "; printf ("Readline test. Type ^C or, \"quit,\" to exit.\n"); #if HAVE_GNU_READLINE printf ("Ctalk built with GNU Readline Support.\n"); #else printf ("Ctalk built without GNU Readline Support.\n"); #endif while (1) s consoleReadLine promptStr; printf ("You typed (or recalled), \"%s.\"\n", s); /* * Matches both, "quit," and, "quit\n." */ if (s match "quit") break; } }
contains
(String
pattern)
contains
(String
pattern, Integer
starting_offset)
Returns a Boolean
value of True if the receiver string contains
an exact match of the text in pattern, False otherwise.
With a second argument n, an Integer
, the method begins its
search from the n’th character in the receiver string.
envVarExists
(char *
envVarName)
Test for the presence of an environment variable. Return
TRUE
if the variable exists, FALSE
otherwise.
eval
(void
)
Evaluate the content of the receiver String’s value as if
it were an argument to eval
.
getEnv
(char *
envVarName)
Return the value of environment variable envVarName as
the value of the receiver, or (null).
Note that this
method generates an internal exception of the environment
variable does not exist. To test for the presence of an
environment variable without generation an exception, see
envVarExists
, above.
getRS
(void
)
Returns a Character
with the current record separator.
The record separator determines whether the regular expression metacharacters ‘^’ and ‘$’ recognize line endings. The default value of the record separator is a newline ‘\n’ character, which means that a ‘^’ character will match an expression at the start of a string, or starting at the beginning of a text line. Likewise, a ‘$’ metacharacter matches both the end of a line and the end of the string.
To match only at the beginning and end of the string, set the record separator to a NUL character (‘\0’). See Pattern Matching.
isXLFD
(void
)
Returns a Boolean value of True if the receiver is
a XLFD font descriptor, False otherwise. For more information
about font selection, refer to the X11Font
class See X11Font,
and the X11FreeTypeFont
class See X11FreeTypeFont.
length
(void
)
Return an object of class Integer
with the length of the
receiver in characters.
map
(OBJECT *(*
method)()
)
Execute method, an instance method of class String,
for
each character of the receiver object. For example,
String instanceMethod printSpaceChar (void) { printf (" %c", self); /* Here, for each call to the printSpaceChar method, "self" is each of myString's successive characters. */ } int main () { String new myString; myString = "Hello, world!"; myString map printSpaceChar; printf ("\n"); }
The argument to map
can also be a code block:
int main () { String new myString; myString = "Hello, world!"; myString map { printf (" %c", self); } printf ("\n"); }
match
(char *
pattern)
Returns TRUE
if pattern matches the receiver
String
regardless of case, false otherwise. Both
match
and matchCase
, below, are being superceded
by matchRegex
and quickSearch
, also below.
matchAt
(Integer
idx)
Returns the text of the idx’th parenthesized match
resulting from a previous call to matchRegex
, =~
,
or !~
. See Pattern Matching.
matchCase
(char *
pattern)
Returns TRUE
if pattern matches the receiver
case- sensitively, false otherwise. Like match
,
above, matchCase
is being superceded by matchRegex
and quickSearch
, below.
matchIndexAt
(Integer
idx)
Returns the character position in the receiver String
of the
idx’th parenthesized match resulting from a previous call to
matchRegex
, =~
, or !~
. See Pattern Matching.
matchLength
(void
)
Returns the length of a regular expression match from the
previous call to the matchRegex
method, below.
matchRegex
(String
pattern, Array
offsets)
Searches the receiver, a String
object, for all occurrences of
pattern. The matchRegex
method places the positions
of the matches in the offsets array, and returns an
Integer
that contains the number of matches. See Pattern Matching.
The quickSearch
method, below, matches exact text
only, but it uses a much faster search algorithm.
nMatches
(void
)
Returns an Integer
with the number matches from
the last call to the matchRegex
method.
printMatchToks (Integer
yesNo)
If the argument is non-zero, print the tokens of regular expression patterns and the matching text after each regular expression match. This can be useful when debugging regular expressions. See DebugPattern.
printOn
(char *
fmt, ...)
Format and print the method’s arguments to the receiver.
quickSearch
(String
pattern, Array
offsets)
Searches the receiver, a String
object, for all occurrences of
pattern. The quickSearch
method places the positions
of the matches in the offsets array, and returns an
Integer
that contains the number of matches.
Unlike matchRegex
, above, quickSearch
matches exact
text only, but it uses a much faster search algorithm.
readFormat
(char *
fmt, ...)
Scan the receiver into the arguments, using fmt.
search
(String
pattern, Array
offsets)
This method is a synonym for matchRegex
, above, and is here for
backward compatibility.
setRS
(char
record_separator_char
)
Sets the current application’s record separator character, which determines how regular expression metacharacters match line endings, among other uses. See RecordSeparator. See Pattern Matching.
split
(char
delimiter, char **
resultArray)
Split the receiver at each occurrence of delimiter, and save the
result in resultArray. The delimiter argument can be
either a Character
object or a String object. If
delimiter is a String,
it uses Ctalk’s pattern matching
library to match the delimiter string. See Pattern Matching.
However, the pattern matching library only records the length of the last match, so if you use a pattern like ‘" *"’ then the results may be inaccurate if all of the delimiters are not the same length.
subString
(int
index, int
length)
Return the substring of the receiver of length characters beginning at index. String indexes start at 0. If index + length is greater than the length of the receiver, return the substring from index to the end of the receiver.
sysErrnoStr
(void
)
Sets the receiver’s value to the text message of the last system error (the value of errno(3)).
tokenize
(List
tokens)
Splits the receiver String
at each whitespace character
or characters (spaces, horizontal and vertical tabs, or newlines)
and pushes each non-whitespace set of characters (words, numbers,
and miscellaneous punctuation) onto the List
given as the
argument. The method uses ispunct(3) to separate punctuation,
except for ‘_’ characters, which are used in labels.
Note that this method can generate lists with hundreds or even
thousands of tokens, so you need to take care with large (or
even medium sized) input Strings
as receivers.
tokenizeLine
(List
tokens)
Similar to tokenize, above. This method also treats newline characters as tokens, which makes it easier to parse input that relies on newlines (for example, C++ style comments, preprocessor directives, and some types of text files).
vPrintOn
(String
calling_methods_fmt_arg)
This function formats the variable arguments of its calling method
on the receiver String
object.
The argument is the format argument of the calling method. When
vPrintOn
is called, it uses the argument as the start of the
caller’s variable argument list.
Here is an example of vPrintOn's
use.
Object instanceMethod myPrint (String fmt, ...) { String new s; s vPrintOn fmt; return s; } int main () { Object new obj; Integer new i; String new str; i = 5; str = obj myPrint "Hello, world no. %d", i; printf ("%s\n", str); }
writeFormat
(char *
fmt,...)
Write the formatted arguments using fmt to the receiver. Note that Ctalk stores scalar types as formatted strings. See Variable arguments.
String
class defines a number of methods for searching and
matching String
objects. The matchRegex
method
recognizes some basic metacharacters to provide regular expression
search capabilities. The quickSearch
method searches
String
objects for exact text patterns, but it uses a much
faster search algorithm.
The operators, =~
and !~
return true or false depending on
whether the receiver contains the pattern given as the argument. If
the argument contains metacharacters, then Ctalk conducts a regular
expression search; otherwise, it tries to match (or not match, in the
case of !~
) the receiver and the pattern exactly.
If you want more thorough information about the search, the
matchRegex
and quickSearch
methods allow an additional
argument after the text pattern: an Array
object that the
methods use to return the character positions of the matches within
the receiver. After the method is finished searching, the second
argument contains the position of the first character wherever the text
pattern matched text in the receiver. The last offset is ‘-1’,
indicating that there are no further matches. The methods also return
an Integer
object that contains the number of matches.
Here is an example from LibrarySearch
class that contains
the additional ‘offsets’ argument.
if ((inputLine match KEYPAT) && (inputLine matchRegex (pattern, offsets) != 0)) { ... }
Searches can provide even more information than this, however. Pattern strings may contain backreferences, which save the text and position of any of the receiver string’s matched text that the program needs. The sections just below describe backreferences in detail.
All of these methods (except quickSearch
) recognize a few
regular expression metacharacters. They are:
Matches text at the beginning of the receiver
String's
text.
Matches text at the end of the receiver String's
text, or
the end of a line (that is, the character before a ‘\n’ or ‘\r’
newline character).
Matches zero or more occurrences of the character or expression it follows.
Matches one or more occurences of the character or expression it follows.
Matches zero or one occurrence of the character or expression it follows.
Escapes the next character so it is interpreted literally; e.g., the sequence ‘\*’ is interpreted as a literal asterisk. Because Ctalk’s lexical analysis also performs the same task, so if you want a backslash to appear in a pattern, you need to type, ‘\\’, for example,
myPat = "\\*"; /* The '\\' tells Ctalk's lexer that we really want a '\' to appear in the pattern string, so it will still be there when we use myPat as a regular expression. */
However, Ctalk also recognizes patterns, which only need to be
evaluated by the regular expression parser. Patterns do not get
checked immediately for things like for balanced quotes and ASCII
escape sequences; instead, they get evaluated by the regular expression
parser when the program actually tries to perform some pattern
matching. Otherwise, patterns are identical to Strings
.
Expressed as a pattern, myPat
in the example above would look
like this.
myPat = /\*/;
Pattern strings are described in their own section, below. See Pattern Strings.
Begin and end a match reference (i.e., a
backreference). Matched text between ‘(’ and ‘)’ is
saved, along with its position in the receiver String
, and can
be retrieved with subsequent calls to the matchAt
and
matchIndexAt
methods. The match information is saved until the
program performs another pattern match.
In patterns, these escape sequences match characters of different types. The escape sequences have the following meanings.
Character Class Matches --------------- ------ \W 'Word' Characters (A-Z, a-z) \d Decimal Digits (0-9) \w White Space (space, \t, \n, \f, \v) \p Punctuation (Any other character.) \l 'Label' Characters (A-Z, a-z, 0-9, and _) \x Hexadecimal Digits (0-9, a-f, A-F, x, and X)
The following program contains a pattern that looks for alphabetic characters, punctuation, and whitespace.
int main (int argc, char **argv) { String new str; str = "Hello, world!"; if (str =~ /e(\W*\p\w*\W)/) { printf ("match - %s\n", str matchAt 0); } }
When run, the expression,
str =~ /e(\W*\p\w*\W)/
Produces the following output.
match - llo, w
Matches either of the expressions on each side of the ‘|’. The expressions may be either a character expression, or a set of characters enclosed in parentheses. Here are some examples of alternate patterns.
a|b a*|b* a+|b+ \W+|\d+ (ab)|(cd)
When matching alternate expressions, using ‘*’ in the expressions can produce unexpected results because a ‘*’ can provide a zero-length match, and the ‘|’ metacharacter is most useful when there is some text to be matched.
If one or both expressions are enclosed in parentheses, then
the expression that matches is treated as a backreference, and
the program can retrieve the match information with the matchAt
and matchIndexAt
methods.
The following example shows how to use some of the matching featues in an actual program. This program saves the first non-label character (either a space or parenthesis) of a function declaration, and its position, so we can retrieve the function name and display it separately.
int main (argc, argv) { String new text, pattern, fn_name; List new fn_list; fn_list = "strlen ()", "strcat(char *)", "strncpy (char *)", "stat (char *, struct stat *)"; /* Match the first non-label character: either a space or a parenthesis. The double backslashes cause the content of 'pattern' (after the normal lexical analysis for the string) to be, "( *)\(" So the regular expression parser can check for a backslashed opening parenthesis (i.e., a literal '(', not another backreference delimiter). */ pattern = "( *)\\("; fn_list map { if (self =~ pattern) { printf ("Matched text: \"%s\" at index: %d\n", self matchAt 0, self matchIndexAt 0); fn_name = self subString 0, self matchIndexAt 0; printf ("Function name: %s\n", fn_name); } } return 0; }
When run, the program should produce results like this.
Matched text: " " at index: 6 Function name: strlen Matched text: "" at index: 6 Function name: strcat Matched text: " " at index: 7 Function name: strncpy Matched text: " " at index: 4 Function name: stat
Note that the first backreference is numbered ‘0’, in the
expression ‘self matchAt 0’. If there were another set of
(unescaped) parentheses in pattern
, then its text would be
refered to as ‘self matchAt 1’.
You should also note that the second function match saved an empty string. That’s because the text that the backreferenced pattern referred to resulted in a zero-length match. That’s because ‘*’ metacharacters can refer to zero or more occurrences of the character that precedes it.
The program could also use the charPos
method to look for the
‘ ’ and/or ‘(’ characters, but using a regular expression
gives us information about which non-label character appears first
more efficiently.
Here’s another example. The pattern contains only one set of parentheses, but Ctalk saves a match reference every time the pattern matches characters in the target string.
int main () { String new string, pattern; Array new offsets; Integer new nMatches, i; pattern = "(l*o)"; string = "Hello, world! Hello, world, Hello, world!"; nMatches = string matchRegex pattern, offsets; printf ("nMatches: %d\n", nMatches); offsets map { printf ("%d\n", self); } for (i = 0; i < nMatches; ++i) { printf ("%s\n", string matchAt i); } }
When run, the program produces output like this.
nMatches: 6 2 8 16 22 30 36 -1 llo o llo o llo o
The character classes match anywhere they find text in a target string, including control characters like ‘\n’ and ‘\f’, regardless of the record separator character. For a brief example, refer to the section, The Record Separator Character, below.
This example matches one of two patterns joined by a ‘|’ metacharacter.
int main () { String new s, pat; Array new matches; Integer new n_matches, n_th_match; pat = "-(mo)|(ho)use"; s = "-mouse-house-"; n_matches = s matchRegex pat, matches; for (n_th_match = 0; n_th_match < n_matches; ++n_th_match) { printf ("Match %d. Matched %s at character index %ld.\n", n_th_match, s matchAt n_th_match, s matchIndexAt n_th_match); } matches delete; }
When run, the program should produce output like this.
Match 0. Matched mo at character index 0. Match 1. Matched ho at character index 6.
You should note that if a pattern in a backreference results in a zero length match, then that backreference contains a zero length string. While not incorrect, it can produce confusing results when examining matched text. The following program shows one way to indicate a zero-length backreference. It prints the string ‘(null)’ whenever a backreference contains a zero-length string.
int main () { String new s; String new pat; Integer new n_matches; Array new offsets; Integer new i; s = "1.mobile 2mobile mobile"; pat = "(\\d\\p)?m"; n_matches = s matchRegex pat, offsets; for (i = 0; i < n_matches; ++i) { printf ("%Ld\n", offsets at i); } for (i = 0; i < n_matches; ++i) { if ((s matchAt i) length == 0) { printf ("%d: %s\n", s matchIndexAt i, "(null)"); } else { printf ("%d: %s\n", s matchIndexAt i, s matchAt i); } } }
When run, the program should produce output that looks like this.
0 10 17 0: 1. 17: (null) 22: (null)
When writing a regular expression, it’s necessary to take into account all of the processing that String objects encounter when they are evaluated, before they reach the Ctalk library’s regular expression parser. To help facilitate lexical analysis and parsing, Ctalk also provides pattern strings, which allow Ctalk to defer the evaluation of a pattern until the regular expression parser actually performs the text matching.
Ctalk also provides operators that provide shorthand methods to match patterns with text, the =~ and !~ operators.
Pattern constants at this time may only follow the =~ and !~
operators, but you can use the matchAt
and matchIndexAt
,
and nMatches
methods to retrieve the match information. You
must, as with Strings
that are used as patterns, enclose the
pattern in ‘(’ and ‘)’ metacharacters in order to create
a backreference.
Here is a simple string matching program that matches text against a pattern constant.
int main () { String new s; Integer new n_offsets; Integer new i; s = "Hello?"; if (s =~ /(o\?)/) { printf ("match\n"); i = 0; n_offsets = s nMatches; while (i < n_offsets) { printf ("%d: %s\n", s matchIndexAt i, s matchAt i); ++i; } } }
The most obvious example of how a pattern provides an advantage for text matching is when writing backslash escapes. To make a backslash appear in a pattern string, you need to write at least two backslashes in order for a backslash to appear when it’s needed to escape the following character. If you want to match an escaped backslash, then you need to write at least four backslashes.
String Pattern "\\*" /\*/ # Matches a literal '*'. "\\\\*" /\\*/ # Matches the expression '\*'.
To create a pattern, you delimit the characters of the pattern with slashes (‘//’) instead of double quotes. Other delimiters can signify patterns also if the pattern starts with a ‘m’ character, followed by the delimiter character, which must be non-alphanumeric.
String Pattern Alternate Pattern "\\*" /\*/ m|\*| "\\\\*" /\\*/ m|\\*|
There is no single rule that governs how often String
objects
are evaluated when a program runs. So writing patterns helps take
some of the work out of testing an application’s pattern matching
routines.
Ctalk allows you to view the parsed pattern tokens, and the
text that each token matches. Token printing is enabled using the
printMatchToks
method, like this.
myString printMatchToks TRUE;
When token printing is enabled, then Ctalk’s pattern matching routines print the tokens of the pattern and the text that each token matches after every pattern match attempt.
If we have a program like the following:
int main () { String new s; s printMatchToks TRUE; s = "192.168.0.1"; if (s =~ /\d+\.(\d+)\.\d+\.\d+/) { printf ("match!\n"); } }
Then, when this program is run with token printing enabled, the output should look similar to this.
joeuser@myhost:~$ ./mypatprogram PATTERN: /\d+\.(\d+)\.\d+\.\d+/ TEXT: "192.168.0.1" TOK: d+ (character class) MATCH: "192" TOK: . (literal character) MATCH: "." TOK: ( (backreference start) MATCH: "" TOK: d+ (character class) MATCH: "168" TOK: ) (backreference end) MATCH: "" TOK: . (literal character) MATCH: "." TOK: d+ (character class) MATCH: "0" TOK: . (literal character) MATCH: "." TOK: d+ (character class) MATCH: "1" match! joeuser@myhost:~$
The processed token text is followed by any attributes that the regular expression parser finds (for example, then a pattern like ‘\d+’ becomes the token ‘d+’ with the attribute of a character class identifier, or the ‘(’ and ‘)’ characters’ backreference attributes). Then, finally, the library prints the text that matches each token.
Successful matches have text matched by each token in the pattern (except for zero-length metacharacters like ‘(’, ‘)’, ‘^’, or ‘$’).
Unsuccessful matches, however, may display text that matches where you don’t expect it. That’s because the regular expression parser scans along the entire length of the text, trying to match the first pattern token, then the second pattern token, and so on.
Although this doesn’t always pinpoint the exact place that a match first failed, it can provide a roadmap to help build a complex pattern from simpler, perhaps single-metachar patterns, which shows what the regular expression parser is doing internally.
Ctalk uses a record separator character to determine how the metacharacters ‘^’ and ‘$’ match line endings, among other uses.
The default record separator character is a newline (‘\n’). In this case a ‘^’ metacharacter in an expression matches the beginning of a string as well as the character(s) immediately following a newline. Similarly, a ‘$’ metacharacter anchors a match to the characters at the end of a line and at the end of a string.
Setting the record separator character to NUL (‘\0’) causes ‘^’ and ‘$’ to match only the beginning and the end of a string.
Here is an example that prints the string indexes of matches with the default newline record separator and with a NUL record separator character.
When the record separator is ‘'\n'’, the ‘$’ metacharacter in our pattern matches the text immediately before a ‘\n’ character, as well as the text at the end of the string.
int main () { String new s; Integer new n_indexes; Array new match_indexes; String new pattern; printf ("\tMatch Indexes\n"); /* Begin with the default record separator ('\n'). */ s = "Hello, world!\nHello, wo\nHello, wo"; pattern = "wo$"; n_indexes = s matchRegex pattern, match_indexes; printf ("With newline record separator:\n"); match_indexes map { printf ("%d\n", self); } s setRS '\0'; /* Set the record separator to NUL ('\0'). */ match_indexes delete; /* Remember to start with an empty Array again. */ n_indexes = s matchRegex pattern, match_indexes; printf ("With NUL record separator:\n"); match_indexes map { printf ("%d\n", self); } }
When run, the program should produce output like this.
Match Indexes With newline record separator: 21 31 -1 With NUL record separator: 31 -1
Likewise, a ‘^’ metacharacter matches text immediately after the ‘\n’ record separator, or at the beginning of a string.
It’s also possible, though, to match newlines (and other ASCII escape characters) in patterns, either with a character class match, or by adding the escape sequence to the pattern. To do that, the program should use a double backslash with the ASCII escape sequence, as with the newline escape sequence in this example.
int main () { String new s; s = "Hello,\nworld!"; if (s =~ /(\W\p\\n)/) printf ("%s\n", s matchAt 0); }