What is Regular expression

• A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.

Why Regular Expression

• Regular expression provides powerful, flexible and efficient method for processing text.

• The extensive pattern-matching notation of regular expressions enables you to quickly parse large amounts of text to find specific character patterns.

• To validate text to ensure that it matches a predefined pattern (such as an e-mail address); to extract, edit, replace, or delete text substrings.

• To add the extracted strings to a collection in order to generate a report.

How Regex Works

• The centre piece of text processing with regular expressions is the regular expression engine, which is represented by the System.Text.RegularExpressions.Regex object in the .NET Framework.

• At a minimum, processing text using regular expressions requires two items of information:

– The regular expression pattern to identify in the text.

– The text to parse for the regular expression pattern.

• The methods of the Regex class let you perform the following operations:

– determine whether the regular expression pattern occurs in the input text by calling the IsMatch method

– Retrieve one or all occurrences of text that matches the regular expression pattern by calling the Match or Matches method.

– Replace text that matches the regular expression pattern by calling the Replace method.

Regex Sections

• Character escapes

• Character classes

• Anchors

• Grouping constructs

• Quantifiers

• Backreference constructs

• Alternation constructs

• Substitutions

• Regular expression options

• Miscellaneous constructs

Character Escapes

• The character that follows it is a special character, as shown in the table in the following section. For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary and \t represents a tab. The backslash (\) in a regular expression indicates one of the following:

Escaped Character	Pattern	Description	Example
\a	\a	Matches a bell character, \u0007.	"\u0007" in "Error!" + '\u0007'
\b	[\b]{3,}	In a character class, matches a backspace, \u0008.	"\b\b\b\b" in "\b\b\b\b"
\t	(\w+)\t	Matches a tab, \u0009.	"item1\t", "item2\t" in "item1\titem2\t"
\r	\r\n(\w+)	Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.)	"\r\nThese" in "\r\nThese are\ntwo lines."

Character Classes :-

• A character class defines a set of characters, any one of which can occur in an input string for a match to succeed. The regular expression language in the .NET Framework supports the following character classes:

– Positive character groups. A character in the input string must match one of a specified set of characters.

– Negative character groups. A character in the input string must not match one of a specified set of characters.

– Any character. The . (dot or period) character in a regular expression is a wildcard character that matches any character except \n.

– A word character. A character in the input string can belong to any of the Unicode categories that are appropriate for characters in words.

– A non-word character. A character in the input string can belong to any Unicode category that is not a word character.

– A white-space character. A character in the input string can be any Unicode separator character, as well as any one of a number of control.

– A non-white-space character. A character in the input string can be any character that is not a white-space character.

– A decimal digit. A character in the input string can be any of a number of characters classified as Unicode decimal digits.

– A non-decimal digit. A character in the input string can be anything other than a Unicode decimal digit.

Character Classes regex :-

Escaped Character	Pattern	Example
[Character group]	[ae]	"a" in "gray
[^ character_group ]	[^aei]	"r", "g", "n" in "reign"
[ first - last ]	[A-Z]	A", "B" in "AB123"
.	a.E	"ave" in "nave"
\p{ name }	\p{Lu}	"C", "L" in "City Lights"
\P{ name }	\P{Lu}	"i", "t", "y" in "City"
\w	\w	"I", "D", "A" in "ID A"
\W	\W	" ", "." in "ID A1.3"
\s	\w\s	"D " in "ID A1.3"
\S	\s\S	" _" in "int __ctr"
\d	\d	"4" in "4 = IV"
\D	\D	" ", "=", " ", "I", "V" in "4 = IV"

Anchor:

• Anchors specify a position in the string where a match must occur.

• When you use an anchor in your search expression, it looks for a match in the specified position only. For example, ^ specifies that the match must start at the beginning of a line or string.

Anchor	Description
^	The match must occur at the beginning of the string or line
$	The match must occur at the end of the string or line, or before \n at the end of the string or line.
\A	The match must occur at the beginning of the string only (no multiline support).
\Z	The match must occur at the end of the string, or before \n at the end of the string.
\z	The match must occur at the end of the string only.
\G	The match must start at the position where the previous match ended.
\b	The match must occur on a word boundary.
\B	The match must not occur on a word boundary.

. Net Discussion

Latest in Sports

Tuesday, October 14, 2014

What is Regular expression-Why Regular Expression-How Regex Works

No comments:

Post a Comment

Business Categories

Subscribe Us

Popular

Tags

Extra Ads

Games Torrent

Applications Torrent

Featured Post

TypeError: Failed to execute 'fetch' on 'Window': Request with GET/HEAD method cannot have body.

Archive

Labels

My Blog List

Categories

Breaking

Social

About

Contact Us