A regular expression defines a search pattern for strings. This pattern may match one or several times or not at all for a given string. The abbreviation for regular expression is regex. Regular expressions can be used to search, edit, and manipulate text.
Regex meta characters are special symbols that carry a particular meaning and define the pattern in a regular expression. These characters allow us to create flexible and powerful search patterns for text processing.
1. List of Regex Meta Characters
Here’s a list of common regex meta-characters and their meanings:
Meta Character | Description |
---|---|
. | Matches any single character except a newline. |
^ | Matches the start of a string. |
$ | Matches the end of a string. |
* | Matches zero or more repetitions of the preceding element. |
+ | Matches one or more repetitions of the preceding element. |
? | Matches zero or one repetition of the preceding element (makes it optional). |
{n} | Matches exactly n repetitions of the preceding element. |
{n,} | Matches n or more repetitions of the preceding element. |
{n,m} | Matches between n and m repetitions of the preceding element. |
\ | Escapes a meta-character, allowing it to be treated as a literal character. |
[] | Matches any single character within the brackets. |
[^] | Matches any single character not within the brackets. |
() | Groups multiple tokens together and captures the matched text. |
\d | Matches any digit, equivalent to [0-9] . |
\D | Matches any non-digit character, equivalent to [^0-9] . |
\w | Matches any word character (alphanumeric + underscore), equivalent to [a-zA-Z0-9_] . |
\W | Matches any non-word character, equivalent to [^a-zA-Z0-9_] . |
\s | Matches any whitespace character (spaces, tabs, line breaks). |
\S | Matches any non-whitespace character. |
2. Regex Meta Characters Example
Let us see a few examples of using the meta characters in regular expressions and matching them.
.
) Meta Character
2.1. Dot (The dot meta-character matches any single character except for a newline (\n
). It is useful to match a pattern where the character can be anything.
Pattern pattern = Pattern.compile(".at");
Matcher matcher = pattern.matcher("cat bat rat sat mat");
while (matcher.find()) {
System.out.println(matcher.group()); // cat, bat, rat, sat, mat
}
^
) and Dollar ($
) Meta Characters
2.2. Caret (The caret (^
) matches the start of a string, and the dollar sign ($
) matches the end of a string. These are used to ensure that the pattern matches the beginning or the end of the string, respectively.
// Matches "Hello" only if it is at the start of the string
Pattern pattern = Pattern.compile("^Hello");
Matcher matcher = pattern.matcher("Hello world");
System.out.println(matcher.find()); // true
// Matches "world" only if it is at the end of the string
pattern = Pattern.compile("world$");
matcher = pattern.matcher("Hello world");
System.out.println(matcher.find()); // true
*
), Plus (+
), and Question Mark (?
) Meta Characters
2.3. Asterisk (*
: Matches zero or more repetitions of the preceding element.+
: Matches one or more repetitions of the preceding element.?
: Matches zero or one repetition of the preceding element (makes it optional).
// Matches "a", "aa", "aaa", etc.
Pattern pattern = Pattern.compile("a*");
Matcher matcher = pattern.matcher("aaab");
while (matcher.find()) {
System.out.println(matcher.group()); // aaa
}
// Matches "a", "aa", "aaa", etc., but at least one "a"
pattern = Pattern.compile("a+");
matcher = pattern.matcher("aaab");
while (matcher.find()) {
System.out.println(matcher.group()); // aaa
}
// Matches "a" or "ab"
pattern = Pattern.compile("ab?");
matcher = pattern.matcher("ab");
while (matcher.find()) {
System.out.println(matcher.group()); // ab
}
{}
) Meta Characters
2.4. Braces (Braces are used to specify the exact number of repetitions for the preceding element.
{n}
: Exactlyn
repetitions.{n,}
: At leastn
repetitions.{n,m}
: Betweenn
andm
repetitions.
// Matches exactly 3 "a"s
Pattern pattern = Pattern.compile("a{3}");
Matcher matcher = pattern.matcher("aaab");
while (matcher.find()) {
System.out.println(matcher.group()); // aaa
}
// Matches 2 or more "a"s
pattern = Pattern.compile("a{2,}");
matcher = pattern.matcher("aaaa");
while (matcher.find()) {
System.out.println(matcher.group()); // aaaa
}
// Matches between 2 and 3 "a"s
pattern = Pattern.compile("a{2,3}");
matcher = pattern.matcher("aaa");
while (matcher.find()) {
System.out.println(matcher.group()); // aaa
}
[]
) Meta Characters
2.5. Square Brackets (Square brackets are used to define a character class, matching any single character within the brackets.
[abc]
: Matches any single character ‘a’, ‘b’, or ‘c’.[^abc]
: Matches any single character except ‘a’, ‘b’, or ‘c’.
// Matches "a", "b", or "c"
Pattern pattern = Pattern.compile("[abc]");
Matcher matcher = pattern.matcher("a1b2c3");
while (matcher.find()) {
System.out.println(matcher.group()); // Matches 'a', 'b', 'c'
}
// Matches any character except "a", "b", or "c"
pattern = Pattern.compile("[^abc]");
matcher = pattern.matcher("a1b2c3");
while (matcher.find()) {
System.out.println(matcher.group()); // Matches '1', '2', '3'
}
\
)
3. Escaping Meta Characters with Backslash (A backslash is used to escape a meta-character, making it a literal character in the pattern. For example, \\. matches a literal dot (‘.’) character.
// Matches the literal dot character
Pattern pattern = Pattern.compile("\\.");
Matcher matcher = pattern.matcher("1.2.3");
while (matcher.find()) {
System.out.println(matcher.group());
}
In this Java regex example, we learned to use meta characters in regular expressions to evaluate text strings.
Happy Learning !!
Comments