What is Regular
expression
•
A
regular expression is a pattern that the regular expression engine attempts to
match in input text. A pattern consists of one or more character literals, operators,
or constructs.
Why Regular Expression
•
Regular expression provides powerful, flexible and efficient method for
processing text.
•
The
extensive pattern-matching notation of regular expressions enables you to
quickly parse large amounts of text to find specific character patterns.
•
To
validate text to ensure that it matches a predefined pattern (such as an e-mail
address); to extract, edit, replace, or delete text substrings.
•
To
add the extracted strings to a collection in order to generate a report.
How Regex Works
•
The
centre piece of text processing with regular expressions is the regular
expression engine, which is represented by the System.Text.RegularExpressions.Regex
object in the .NET Framework.
•
At a
minimum, processing text using
regular expressions requires two items
of information:
– The regular expression pattern to
identify in the text.
– The text to parse for the regular
expression pattern.
– determine whether the regular
expression pattern occurs in the input text by calling the IsMatch
method
– Retrieve one or all occurrences of
text that matches the regular expression pattern by calling the Match
or Matches
method.
Regex Sections
Character Escapes
•
The
character that follows it is a special character, as shown in the table in the
following section. For example, \b is an anchor that indicates that a regular
expression match should begin on a word boundary and \t represents a tab. The
backslash (\) in a regular expression indicates one of the following:
Escaped Character
|
Pattern
|
Description
|
Example
|
\a
|
\a
|
Matches a bell character, \u0007.
|
"\u0007" in "Error!" +
'\u0007'
|
\b
|
[\b]{3,}
|
In a character class, matches a backspace,
\u0008.
|
"\b\b\b\b" in "\b\b\b\b"
|
\t
|
(\w+)\t
|
Matches a tab, \u0009.
|
"item1\t", "item2\t" in
"item1\titem2\t"
|
\r
|
\r\n(\w+)
|
Matches a carriage return, \u000D. (\r is not
equivalent to the newline character, \n.)
|
"\r\nThese" in "\r\nThese
are\ntwo lines."
|
Character Classes :-
•
A
character class defines a set of characters, any one of which can occur in an
input string for a match to succeed. The regular expression language in the
.NET Framework supports the following character classes:
– Positive character groups. A
character in the input string must match one of a specified set of characters.
– Negative character groups. A
character in the input string must not match one of a specified set of
characters.
– Any character. The . (dot or period)
character in a regular expression is a wildcard character that matches any
character except \n.
– A word character. A character in the
input string can belong to any of the Unicode categories that are appropriate
for characters in words.
– A non-word character. A character in
the input string can belong to any Unicode category that is not a word
character.
– A white-space character. A character
in the input string can be any Unicode separator character, as well as any one
of a number of control.
– A non-white-space character. A
character in the input string can be any character that is not a white-space
character.
– A decimal digit. A character in the
input string can be any of a number of characters classified as Unicode decimal
digits.
– A non-decimal digit. A character in
the input string can be anything other than a Unicode decimal digit.
Character Classes regex :-
Escaped
Character
|
Pattern
|
Example
|
[Character group]
|
[ae]
|
"a" in "gray
|
[^ character_group ]
|
[^aei]
|
"r", "g", "n" in
"reign"
|
[ first - last ]
|
[A-Z]
|
A", "B" in "AB123"
|
.
|
a.E
|
"ave" in "nave"
|
\p{ name }
|
\p{Lu}
|
"C", "L" in "City
Lights"
|
\P{ name }
|
\P{Lu}
|
"i", "t", "y" in
"City"
|
\w
|
\w
|
"I", "D", "A" in
"ID A"
|
\W
|
\W
|
" ", "." in "ID A1.3"
|
\s
|
\w\s
|
"D " in "ID A1.3"
|
\S
|
\s\S
|
" _" in "int __ctr"
|
\d
|
\d
|
"4" in "4 = IV"
|
\D
|
\D
|
" ", "=", " ",
"I", "V" in "4 = IV"
|
Anchor:
•
Anchors
specify a position in the string where a match must occur.
•
When
you use an anchor in your search expression, it looks for a match in the
specified position only. For example, ^ specifies that the match must start at
the beginning of a line or string.
Anchor
|
Description
|
^
|
The match must occur at the beginning of the string
or line
|
$
|
The match must occur at the end of the string or
line, or before \n at the end of the string or line.
|
\A
|
The match must occur at the beginning of the string
only (no multiline support).
|
\Z
|
The match must occur at the end of the string, or
before \n at the end of the string.
|
\z
|
The match must occur at the end of the string only.
|
\G
|
The match must start at the position where the
previous match ended.
|
\b
|
The match must occur on a word boundary.
|
\B
|
The match must not occur on a word boundary.
|
No comments:
Post a Comment