Looking for the previous guiStuff?
It's still here, the content didn't go anywhere. You may want to check out this new guiStuff though -- It's rather informative.
References/Tutorials:
Intro Documents:
guiStuff:
::Stuff for the multi-spec coder;
Coding, formats, standards, and other practical things.
<!-- Guides & Articles
Previous Page 1 2 3 Next Page
Modifiers are another type of metacharacters that change the behavior of Regular Expressions. The usage of modifiers may be elucidated with the help of line separators explained above.
The role of line separators can be rendered expansive and thus more helpful within a string with the help of "modifiers". Switching on modifier
Another noteworthy modifier that needs explanation is
This also explicates the need to escape or encode white spaces or '#' characters using octal or hex escapes in the pattern, to get real white spaces and '#' characters, outside a character class where they are unaffected by
To facilitate recall, we may arrange the modifiers such that it forms a word "gismox", which entails each of the six most frequently used modifiers. The following table enumerates the modifiers for your perusal:
In regular expressions, special characters like new line, tab, alarm, carriage return, form feed etc. may be specified using escape sequence syntax, which are same as used in C and Perl. To exemplify a few,
In regular expressions, you can spell out a character class by enclosing a list of characters in square parenthesizes
Further, character dash (or '-') is used to specify a range in a given list. For example,
Examples of usage of dash ('-') can be looked up from the following table, including the ones just explained:
Predefined classes constitute of metacharacters that explicitly group the target into alphanumeric, non-alphanumeric, numeric, non-numeric, non-space etc. To substantiate, lets examine the following table for the various examples of predefined classes and study them against the examples closely following the table.
Metacharacter viz.
The following table enumerates some instances of usage of the predefined classes:
tentat\dve Matches strings like tentat1ve, tentat6ve etc, but not 'tentative', 'tentatuve etc. tentat[\w\s]ve Matches strings like 'tentative', 'tentat ve', ' tentatuve' etc, but definitely not 'tentat5ve' or tentat=ve etc.
A word boundary (
Previous Page 1 2 3 Next Page
Metacharacters - modifiers
Modifiers are another type of metacharacters that change the behavior of Regular Expressions. The usage of modifiers may be elucidated with the help of line separators explained above.
The role of line separators can be rendered expansive and thus more helpful within a string with the help of "modifiers". Switching on modifier
/m treats the string in question as a multi-line buffer, which in turn enables metacharacter ^ to match after any line separator within the string and $ to match before any line separator. As . Metacharacter matches any character by default, the modifier /s enables it to match the embedded line separators as well.Another noteworthy modifier that needs explanation is
/x, which tells the Regular Expression to ignore the white space that is neither back slashed nor within a character class. You can use this to break up your regular expression to render it more legible. The '#' metacharacter, which is used to introduce a comment, can be used within to enhance the readability of your regex further. To illustrate:
(
(abc) # comment 1
| # You can use spaces to format regex, as Regular Expression ignores them
(efg) # comment 2
)
This also explicates the need to escape or encode white spaces or '#' characters using octal or hex escapes in the pattern, to get real white spaces and '#' characters, outside a character class where they are unaffected by
/x, by default.To facilitate recall, we may arrange the modifiers such that it forms a word "gismox", which entails each of the six most frequently used modifiers. The following table enumerates the modifiers for your perusal:
g |
A non-standard modifier to match globally. Switching it Off switches all operators into non-greedy mode. By default this modifier is On. So, if modifier /g is Off then + works as +?, * as *? and so on. |
i |
Does case-insensitive pattern matching. |
s |
Treats string as single line. That is, change . to match any character whatsoever, even a line separators, which it normally would not match. |
m |
Treats string as multiple lines. That is, change ^ and $ from matching at only the very start or end of the string to the start or end of any line anywhere within the string. |
o |
Only compiles pattern once. |
x |
Extends the pattern's legibility by permitting white space and comments. |
Metacharacters - escape sequences
In regular expressions, special characters like new line, tab, alarm, carriage return, form feed etc. may be specified using escape sequence syntax, which are same as used in C and Perl. To exemplify a few,
\n matches a newline, \t a tab and \xnn matches the character whose ASCII value is "nn", where "nn" is a string of hexadecimal digits. If You need wide (Unicode) character code, You can use \x{nnnn}, where "nnnn" - one or more hexadecimal digits. The following table enumerates various escape sequences with their utility:\xnn |
Char with hex code nn |
\x{nnnn} |
Char with hex code nnnn (one byte for plain text and two bytes for Unicode) |
\t |
Tab (HT/TAB), same as \x09 |
\n |
New line (NL), same as \x0a |
\r |
Car. Return (CR), same as \x0d |
\f |
Form feed (FF), same as \x0c |
\a |
Alarm (bell) (BEL), same as \x07 |
\e |
Escape (ESC), same as \x1b |
Metacharacters - character classes
In regular expressions, you can spell out a character class by enclosing a list of characters in square parenthesizes
[], which will eventually match any one character from the list. Occurrence of ^ character preceding the list of identified alphabets within the square brackets will mean that the class matches any character excluding the list. To exemplify, suppose you are trying to find fan, fin, fen and fun. To find these instances in one single expression, you can use f[aeiu]n. However, if you just precede the "aeiu" within the square brackets with a ^, it would change the entire set of target results. In other words, f[^aeiu]n will search down 'fon', 'frn', 'fbn' etc., but definitely not 'fen', 'fan', 'fin' and 'fun'.Further, character dash (or '-') is used to specify a range in a given list. For example,
[d-x] means all numbers between 'd' and 'x', with 'd' and 'x' inclusive. If your search requirement is such that the character dash ('-') constitutes a member of the class, all you need to do is to put it at the start or the end of the list, devoid of any need to escape its functional meaning with a backslash ('\'). Very simple logic behind it too! If you have not gathered it already, lets examine deeper into the syntax. A syntax like [dx-] matches 'd','x' and '-'; this is obviously because the syntax does not outline a range as it is written. On similar lines, the syntax [-dx] also fetches you 'd','x' and '-' as the matching results. Further, backslash ('\') may also be used to deliver the good of the same quality with the syntax [d\-x]!Examples of usage of dash ('-') can be looked up from the following table, including the ones just explained:
[-dx] |
Matches 'd', 'x' and '-' |
[dx-] |
Matches 'd', 'x' and '-' |
[d\-x] |
Matches 'd', 'x' and '-' |
[d-x] |
Matches all twenty six small characters from 'a' to 'z' |
[\n-\x0D] |
Matches any of #10, #11, #12, #13 |
[\d-t] |
Matches any digit, '-' or 't' |
[]-a] |
Matches any char from ']' to 'a' |
Metacharacters - predefined classes
Predefined classes constitute of metacharacters that explicitly group the target into alphanumeric, non-alphanumeric, numeric, non-numeric, non-space etc. To substantiate, lets examine the following table for the various examples of predefined classes and study them against the examples closely following the table.
\w |
An alphanumeric character including '_' |
\W |
A non-alphanumeric character |
\d |
A numeric character |
\D |
A non-numeric character |
\s |
Any space that can be same as [ \t\n\r\f] |
\S |
A non space character |
Metacharacter viz.
\w, \d and \s can also be used within custom character classes.The following table enumerates some instances of usage of the predefined classes:
tentat\dve Matches strings like tentat1ve, tentat6ve etc, but not 'tentative', 'tentatuve etc. tentat[\w\s]ve Matches strings like 'tentative', 'tentat ve', ' tentatuve' etc, but definitely not 'tentat5ve' or tentat=ve etc.
Metacharacters - word boundaries
A word boundary (
\b) is a spot between two characters that has a \w (an alphanumeric character) on one side of it and a \W (non-alphanumeric character) on the other side, in any order. While \b matches a word boundary, \B matches a non-\b.Previous Page 1 2 3 Next Page
Return to the Guides & Articles section, or go the to Main page.