C # regular expression


Release date:2023-09-01 Update date:2023-10-13 Editor:admin View counts:196

Label:

C # regular expression

A regular expression is a pattern that matches input text.

.Net framework provides a regular expression engine that allows thiskind of matching.

A pattern consists of one or more characters, operators, and structures.

If you don’t already understand regular expressions, you can read our regular expressions tutorial.

Define regular expressions

The characters, operators, and structures used to define various categories of regular expressions are listed below.

  • Character escape

  • Character class

  • Anchor point

  • Grouping construction

  • Qualifier

  • Reverse reference construction

  • Standby structure

  • Replace

  • Miscellaneous construction

Character escape

The backslash character () in a regular expression indicates that the character that follows it is a special character or that it should be interpreted literally.

The following table lists the escape characters:

Escape character

Description

Pattern

Match

\a

Matches the alarm (bell) characteru0007.

a

“u0007” in “Warning!” + “u0007”

\b

In the character class, match the backspace key u0008.

[b]{3,}

“bbbb” in “bbbb”

\t

Matches the tab u0009.

(w +)t

“NametAddrt” in “Namet” and “Addrt”

\r

Matches the carriage return character u000D. (r is not equivalent to the newline character n. )

rn (w +)

“rnHellonWorld.” in “rnHello”

\v

Matches the vertical tab character u000B.

[v]{2,}

“vvv” in “vvv”

\f

Match the feed character u000C.

[f]{2,}

“fff” in “fff”

\n

Matches the newline character u000A.

rn (w +)

“rnHellonWorld.” in “rnHello”

\e

Matches the escape character u001B.

e

“x001B” in “x001B”

\ nnn

Specify a character using octal representation (nnn consists of two to threedigits).

w040w

“a b” and “c d” in “a bc d”

\x nn

Specify characters in hexadecimal representation (nn happens to be made up of two digits).

wx20w

“a b” and “c d” in “a bc d”

\c X \c x

Matches the ASCII control character specified by X or x, where X or x is theletter of the control character.

cC

“x0003” (Ctrl-C) in “x0003”

\u nnnn

Matches a Unicode character (four digits represented by nnnn) using a hexadecimal representation.

wu0020w

“a b” and “c d” in “a bc d”

\

Matches an unrecognized escape character after it.

d+[+-x*]d+d+[+-x*d+

“2+2” and “3*9” in “(2+2) * 3*9”

Character class

The character class matches any one of a set of characters.

The following table lists the character classes:

Character class

Description

Pattern

Match

[character_group]

Matches any single character in character_group. By default, matches are case sensitive.

[mn]

“m” in “mat”, “m” and “n” in “moon”

[^character_group]

Non: matches any single character that is not in the character_group. By default, characters in character_group are case sensitive.

[^aei]

“v” and “l” in avail

[ first - last ]

Character range: matches any single character in the range from first to last.

[b-d]

[b-d]irds can match Birds, Cirds, Dirds

.

Wildcard: matches any single character except n. To match the originalmeaning period character (. or u002E), you must precede the character withan escape character (.).

a.e

Ave in “have” and “ate” in “mate”

\p{ name }

Matches any single character in the Unicode generic category or named block specified by name.

p{Lu}

“C” and “L” in City Lights

\P{ name }

Matches any single character that is not in the Unicode generic category or named block specified by name.

P{Lu}

“I”, “t” and “y” in “City”

\w

Matches any word character.

w

“R”, “o”, “m” and “1” in Room#1

\W

Matches any non-word character.

W

“#” in “Room#1”

\s

Matches any white space character.

ws

“D” in “ID A1.3”

\S

Matches any non-white space character.

sS

“_” in “int _ _ ctr”

\d

Matches any decimal number.

d

“4” in “4 = IV”

\D

Matches any character that is not a decimal number.

D

“”, “=”, “”, “I” and “V” in “4 = IV”

Anchor point

Anchor points or atomic zero-width assertions make the match successful or failed, depending on the current position in the string, but they do not cause the engine to advance or use characters in the string.

The following table lists the anchor points:

Assertion

Description

Pattern

Match

^

The match must start at the beginning of a string or line.

^d{3}

“567” in “567-777 -”

$

The match must appear at the end of the string or before nat the end of the line or string.

-d{4} $

“- 2012” in “8-12-2012”

\A

The match must appear at the beginning of the string.

aw{4}

“Code” in Code-007-

\Z

The match must appear at the end of the string or before nat the end of the string.

-d{3}Z

“- 007” in “Bond-901-007”

\z

The match must appear at the end of the string.

-d{3}z

“- 333” in “- 901-333”

\G

The match must appear at the end of the previous match.

G(d)

“(1) (3) (5) [7] (9) “(1)”, “(3)” and “(5)” in “

\b

Match a word boundary, that is, the position between the word and the space.

Erb

Matches er in never, but not er in verb.

\B

Matches non-word boundaries.

ErB

Matches er in verb, but not er in never.

Grouping construction

The grouping construction describes the subexpression of the regular expression and is usually used to capture the substring of the input string.

This section is difficult to understand, and you can read regular expression-selection, antecedent assertions of regular expressions (lookahead), and later assertions (lookbehind) to help understand.

The following table lists the grouping construction:

Grouping construction

Description

Pattern

Match

(subexpression)

Capture matching subexpressions and assign them to a zero-based sequence number.

(w)1

“ee” in deep

(?< name >subexpression)

Capture matching subexpressions into a named group.

(? < double >w)k< double>

“ee” in deep

(?<name1-name2>subexpression)

Define the balance group definition.

(((?’Open’()[^()]*)+((?’Close-Open’))[^()]*)+)*(?(Open)(?!))$

“((1-3)*(3-1))in “3+2^((1-3)*(3-1))”

(?: subexpression)

Define a non-capture group.

Write(?:Line)

WriteLine in “Console.WriteLine()”

(?imnsx-imnsx:subexpression)

Applies or disables the options specified in subexpression.

Ad{2}(?i:w+)b

“A12xl” and “A12XL” in “A12xl A12XL a12xl”

(?= subexpression)

Zero width is predicting advance assertions.

w+(?=.)

“is”、 “ran” and “out” in “He is. The dog ran. The sun is out.”

(?! subexpression)

Zero width negative prediction antecedent assertion.

b(?!un)w+b

“sure” and “used” in “unsure sure unity used”

(?<=subexpression)

Zero width is being reviewed and asserted.

(?<=19)d{2}b

“99”、”50” and “05” in “1851 1999 1950 1905 2003”

(?<! subexpression)

Make an assertion after a negative review of zero width.

(?<!wo)manb

“man” in “Hi woman Hi man”

(?> subexpression)

Non-backtracking (also known as “greedy”) subexpression.

[13579](?>A+B+)

“1ABB”, “3ABB” and “5AB” in 1ABB 3ABBC 5AB 5AC

Example

using System;
using System.Text.RegularExpressions;
public class Example
{
   public static void Main()
   {
      string input = "1851 1999 1950 1905 2003";
      string pattern = @"(?<=19)\d{2}\b";
      foreach (Match match in Regex.Matches(input, pattern))
         Console.WriteLine(match.Value);
   }
}

Qualifier

The qualifier specifies how many instances of the previous element (which can be a character, group, or character class) must exist in the input string for a match to occur. Qualifiers include the language elements listedin the following table.

The following table lists the qualifiers:

Qualifier

Description

Pattern

Match

*

Matches the previous element zero or more times.

d*.d

“.0”, “19.9”, “219.9”

+

Matches the previous element one or more times.

“be+”

Bee in “been” and “be” in “bent”

?

Matches the previous element zero or once.

“rai?n”

“ran”, “rain”

{ n }

Match the last element exactly n times.

“,d {3}”

“043” in “1043.6”, “876”, “543” and “9876543210” in 9876543210

{ n ,}

Match the previous element at least n times.

“d {2,}”

“166,29,1930”

{ n , m }

Match the previous element at least n times, but not more than m times.

“d{3,5}”

“19302” in “166”, “17668” and “193024”

*?

Match the previous element zero or more times, but as few times as possible.

d*?.d

“.0”, “19.9”, “219.9”

+?

Match the previous element one or more times, but as few times as possible.

“be+?”

“bein “been” and “be” in “bent”

??

Match the previous element zero or once, but as few times as possible.

“rai??n”

“ran”, “rain”

{ n }?

Match the leading element exactly n times.

“d{3}?”

“043” in “1043.6”, “876”, “543” and “9876543210” in 9876543210

{ n ,}?

Match the previous element at least n times, but as few times as possible.

“d{2,}?”

“166,29” and “1930”

{ n , m }?

The number of matches to the previous element is between n and m, but as fewtimes as possible.

“d{3,5}?”

“193s” and “024s” in “17668”, “193024”

Reverse reference construction

Back references allow you to subsequently identify previously matched sub expressions in the same regular expression.

The following table lists the back reference constructions:

Reverse reference construction

Description

Pattern

Match

\ number

Reverse reference. Matches the value of the numbering subexpression.

(w)1

“ee” in seek

\k< name >

Name the back reference. Matches the value of a named expression.

(?< char>w)k< char>

“ee” in seek

Standby structure

Alternate constructs are used to modify regular expressions to enable either/or matching.

The following table lists the alternate constructions:

Standby structure

Description

Pattern

Match

|

Matches any element separated by a vertical bar (|) character.

th(e|is|at)

“the” and “this” in “this is the day.”

(?( expression )yes | no )

If the regular expression pattern is specified by expression match, match yes; otherwise matches the optional no part. Expression is interpreted as a zero-width assertion.

(?(A)Ad{2}b|bd{3}b)

“A10” and “910” in “A10 C103 910”

(?( name )yes | no )

If name or named or numbered capture groups have a match, match yes; otherwise match optional no.

(?< quoted>")?(?(quoted).+?"|\S+\s)

Dogs.jpg and Yiska playing.jpg in “Dogs.jpg” Yiska playing.jpg “”

Replace

Substitution is the regular expression used in the replacement pattern.

The following table lists the characters used for replacement:

Character

Description

Pattern

Replacement mode

Input string

Result string

$number

Replaces substrings that match by group number.

b(w+)(s)(w+)b

$3$2$1

“one two”

“two one”

${name}

Replaces substrings that match by named group name.

b(?< word1>w+)(s)(?< word2>w+)b

${word2} ${word1}

“one two”

“two one”

$$

Replace the character “$”.

b(d+)s?USD

$$$1

“103 USD”

“$103”

$&

Replaces a copy of the entire match.

($*(d*(.+d+)?){1})

**$&

“$1.30”

"**$1.30"

$`

Replaces all text of the input string before the match.

B+

$`

“AABBCC”

“AAAACC”

$’

Replaces all the text of the matched input string.

B+

$’

“AABBCC”

“AACCCC”

$+

Replace the last captured group.

B+(C+)

$+

“AABBCCDD”

AACCDD

$_

Replaces the entire input string.

B+

$_

“AABBCC”

“AAAABBCCCC”

Miscellaneous construction

The following table lists the various miscellaneous constructs:

Construction

Description

Example

(?imnsx-imnsx)

Options such as case insensitivity are set or disabled in the middle of the mode.

bA(?i)bw+b matches “ABA” and “Able” in “ABA Able Act”

(?#Notes)

Inline comments. The comment terminates at the first right bracket.

bA (?# matches words that begin with A)w+b

#[end of the line]

The comment begins with a non-escaped # and continues to the end of the line.

(?x)bAw+b# matches words that begin with A.

Regex class

Regex class is used to represent a regular expression.

The following table lists some commonly used methods in the Regex class:

Serial number

Method & description

1

Public bool IsMatch (string input) indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string.

2

Public bool IsMatch (string input, int startat) indicates whether the regular expression specified in the Regex constructor finds a match in the specified input string, starting at the start position specified in the string.

3

Public static bool IsMatch (string input, string pattern) indicates whether the specified regular expression finds a match in the specified input string.

4

Public MatchCollection Matches (string input) searches the specified input string for all matches of the regular expression.

5

Public string Replace (string input, string replacement) replaces all matching strings that match the regular expression pattern with the specified replacement string in the specified input string.

6

Public string [] Split (string input) splits the input string into an array of substrings, based on the location defined by the regular expression pattern specified in the Regex constructor.

If you need to know Regex for a complete list of properties of the class, please refer to Microsoft’s C# documentation.

Example 1

The following example matches a word that starts with ‘S’:

Example

using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "A Thousand Splendid Suns";
         Console.WriteLine("Matching words that start with 'S': ");
         showMatch(str, @"\bS\S*");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following results:

Matching words that start with 'S':
The Expression: \bS\S*
Splendid
Suns

Example 2

The following example matches a word that starts with ‘m’ and ends with ‘e’:

Example

using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
   class Program
   {
      private static void showMatch(string text, string expr)
      {
         Console.WriteLine("The Expression: " + expr);
         MatchCollection mc = Regex.Matches(text, expr);
         foreach (Match m in mc)
         {
            Console.WriteLine(m);
         }
      }
      static void Main(string[] args)
      {
         string str = "make maze and manage to measure it";
         Console.WriteLine("Matching words start with 'm' and ends with
'e':");
         showMatch(str, @"\bm\S*e\b");
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following results:

Matching words start with 'm' and ends with 'e':
The Expression: \bm\S*e\b
make
maze
manage
measure

Example 3

The following example replaces extra spaces:

Example

using System;
using System.Text.RegularExpressions;
namespace RegExApplication
{
   class Program
   {
      static void Main(string[] args)
      {
         string input = "Hello   World   ";
         string pattern = "\\\\s+";
         string replacement = " ";
         Regex rgx = new Regex(pattern);
         string result = rgx.Replace(input, replacement);
         Console.WriteLine("Original String: {0}", input);
         Console.WriteLine("Replacement String: {0}", result);
         Console.ReadKey();
      }
   }
}

When the above code is compiled and executed, it produces the following results:

Original String: Hello   World
Replacement String: Hello World

Powered by TorCMS (https://github.com/bukun/TorCMS).