Skip to main content

RegEx Tutorial

Summary

Welcome to the regeEx Tutorial. This template is used to help better understand regex and their different uses. Regex or Regular Expressions are defined as a sequence of special characters that describe a search pattern. The regex that will be used in this tutorial is Matching a HTML tag: or /^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/.

Regex Component

The regular expression or regex is characterizied as a literal, which means the sequence has to be surrounded by slash characters "/". If you take a look at the Matching a HTML tag below you will see notice how its surrounded by slash characters :

/^<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)$/

Anchors

In the regular expression take notice of the ^ and the $ these symbols are known as anchors. The ^ anchor represents a string that starts with the characters that follow it. This could be in one of two ways: . An exact string match, like ^The, with strings "The" or "The dog" will be a match but remember regex is case sensitive so "the" and "the dog" will not work. . The second way is a scope of possible matches, presented using bracket expressions.

The $ anchor represents a string that closes with the characters that begin it. Similiar to the ^ anchor, it can be begun with an exact string or scope of possible matches. In our Matching a HTML tag regex, the string begins and ends with values that match the pattern below: <([a-z]+)([^<]+)*

Quantifiers

Quantifiers are important beucase they impose the restraints on the string that your regex matches. They are known to incorporate the lowest and highest number of characters that your regex is searching for. It's worth noting that quantifiers are selfish in the sense that they match with as many events of certains patterns as possible. In our example we can name some quantifiers listed : + matches the sequence one or more times * matches the pattern zero or more times

Grouping Constructs

Grouping Constructs act by examining many parts of a string to decipher the necessary sections that meet the needs of different requirements. The standard way of utilizing the grouping constructor is denoted with the (()). Also important to note is every section in the paranthesis is described as a sub expression.

Bracket Expressions

Values that are inside square brackets signifies an assortment of chracters we want to match. While they are called bracket expressions they are also recognized as positive character group, because they highlight the characters we want to incorporate. It's also important to note that its common practice to use a hyphen between alphanumeric characters to signify a scope of those possible characters. So for example in our "Matching an HTML tag" regex you would notice the bracket expression [a-z]. This signifies the string may contain any lowercase letter between a-z. Its imporant to note that this applies to only lowercase letters. [a-z] or [abcdefghijklmnopqrstuvwxyz] written out.

Character Classes

In regex character classes are declared as a group of characters, that can reside in an input string for fulfilling a match. In our example for Matching an HTML tag regex we see two distinct instances of the character classes. We see the . which means matches all characters with an exception to the newline character. We also see the \s which means matches a single whitespace character, incorporating tabs and line breaks.

The OR Operator

The OR operator acts as another way of writing logic. The OR operator is typcially used with grouping one or different grouping conventions. For example our regex we could use or operator for [a-z] and say (a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)

Flags

Flags are positioned at the closing of a regex, proceeding the second slash, and they explicate the additional functionality or edge for the regex. The Matching an HTML tag regex dosn't include flags but something to note is there are six optional flags either singular or combined and any in ary order. However there are three that are the most common and thats g which stands for the global search , i stands for the case insensitive search, and lastly m which stands for the multi-line search.

Character Escapes

Character escapes are denoted by the \ backslash. Character escapes function by escaping characters that alternaitvely would be interpreted literally. In our example we can see that the character escape is used \/\1 \s+\/.