C#,vb.net,MVC,Jquery,javascript,jscript,vbscript,html,vb,sharepoint,COM,WPF,WCF,Wwf,Asp,Asp.net,questions & answers,

Latest in Sports

Tuesday, October 14, 2014

What is Regular expression-Why Regular Expression-How Regex Works

What is Regular expression

          A regular expression is a pattern that the regular expression engine attempts to match in input text. A pattern consists of one or more character literals, operators, or constructs.

Why Regular Expression

          Regular expression provides powerful, flexible and efficient method for processing text.
          The extensive pattern-matching notation of regular expressions enables you to quickly parse large amounts of text to find specific character patterns.
          To validate text to ensure that it matches a predefined pattern (such as an e-mail address); to extract, edit, replace, or delete text substrings.
          To add the extracted strings to a collection in order to generate a report.

How Regex Works

          The centre piece of text processing with regular expressions is the regular expression engine, which is represented by the System.Text.RegularExpressions.Regex object in the .NET Framework.
            At a minimum, processing text using regular expressions requires two items  of information:
      The regular expression pattern to identify in the text.
      The text to parse for the regular expression pattern.
           The methods of the Regex class let you perform the following operations:
      determine whether the regular expression pattern occurs in the input text by calling the IsMatch method
      Retrieve one or all occurrences of text that matches the regular expression pattern by calling the Match or Matches method.
      Replace text that matches the regular expression pattern by calling the Replace method.

Regex Sections

       Character escapes
       Character classes
       Anchors
       Grouping constructs
       Quantifiers
       Backreference constructs
       Alternation constructs
       Substitutions
       Regular expression options
       Miscellaneous constructs

Character Escapes

          The character that follows it is a special character, as shown in the table in the following section. For example, \b is an anchor that indicates that a regular expression match should begin on a word boundary and \t represents a tab. The backslash (\) in a regular expression indicates one of the following:
Escaped Character
Pattern
Description
Example
\a
\a
Matches a bell character, \u0007.
"\u0007" in "Error!" + '\u0007'
\b
[\b]{3,}
In a character class, matches a backspace, \u0008.
"\b\b\b\b" in "\b\b\b\b"
\t
(\w+)\t
Matches a tab, \u0009.
"item1\t", "item2\t" in "item1\titem2\t"
\r
\r\n(\w+)
Matches a carriage return, \u000D. (\r is not equivalent to the newline character, \n.)
"\r\nThese" in "\r\nThese are\ntwo lines."

Character Classes :-

          A character class defines a set of characters, any one of which can occur in an input string for a match to succeed. The regular expression language in the .NET Framework supports the following character classes:
      Positive character groups. A character in the input string must match one of a specified set of characters.
      Negative character groups. A character in the input string must not match one of a specified set of characters.
      Any character. The . (dot or period) character in a regular expression is a wildcard character that matches any character except \n.
      A word character. A character in the input string can belong to any of the Unicode categories that are appropriate for characters in words.
      A non-word character. A character in the input string can belong to any Unicode category that is not a word character.
      A white-space character. A character in the input string can be any Unicode separator character, as well as any one of a number of control.
      A non-white-space character. A character in the input string can be any character that is not a white-space character.
      A decimal digit. A character in the input string can be any of a number of characters classified as Unicode decimal digits.
      A non-decimal digit. A character in the input string can be anything other than a Unicode decimal digit.

Character Classes regex :-

Escaped Character
Pattern
Example
[Character group]
[ae]
"a" in "gray
[^ character_group ]
[^aei]
"r", "g", "n" in "reign"
[ first - last ]
[A-Z]
A", "B" in "AB123"
.
a.E
"ave" in "nave"
\p{ name }
\p{Lu}
"C", "L" in "City Lights"
\P{ name }
\P{Lu}
"i", "t", "y" in "City"
\w
\w
"I", "D", "A" in "ID A"
\W
\W
" ", "." in "ID A1.3"
\s
\w\s
"D " in "ID A1.3"
\S
\s\S
" _" in "int __ctr"
\d
\d
"4" in "4 = IV"
\D
\D
" ", "=", " ", "I", "V" in "4 = IV"

Anchor:

          Anchors specify a position in the string where a match must occur.
          When you use an anchor in your search expression, it looks for a match in the specified position only. For example, ^ specifies that the match must start at the beginning of a line or string.

Anchor
Description
^
The match must occur at the beginning of the string or line
$
The match must occur at the end of the string or line, or before \n at the end of the string or line.
\A
The match must occur at the beginning of the string only (no multiline support).
\Z
The match must occur at the end of the string, or before \n at the end of the string.
\z
The match must occur at the end of the string only.
\G
The match must start at the position where the previous match ended.
\b
The match must occur on a word boundary.
\B
The match must not occur on a word boundary.



No comments:

Post a Comment