RegEx Reference (todo)
Based on modern regex-directed engines with features like lazy quatifiers and backreferences.
Useful
^(?=.*?\bbubble\b)(?=.*?\bgum\b).*
^(?!.*bubble).*
^(?!(?:foo|bar)$)\w+$
Matching
[cols="^1,4,3"] |=== |RegEx |Matching|Example
|[abc]|A single character: a, b or c|
|[^abc]|Any single character but a, b, or c|
|[a-z]|Any single character in the range a-z|
|[A-Z] | Uppercase letters from A to Z|
|[a-zA-Z]|Any single character in the range a-z or A-Z|
|[0-9] | Digits from 0 to 9 |
|^|Start of line|
|$|End of line|
|\A|Start of string|
|\z|End of string|
|\b | Word boundary|
|\B | Not word boundary|
|< | Start of word|
|> | End of word|
|. | anything |
|\s|Any whitespace character|
|\S|Any non-whitespace character|
|\d|Any digit|
|\D|Any non-digit|
|\w|Any word character (letter, number, underscore)|
|\W|Any non-word character|
|a | a character |
|ab | ab string |
|a|b | a or b |
|a?|Zero or one of a|
|a*|Zero or more of a|
|a+|One or more of a|
|a{3}|Exactly 3 of a|
|a{3,}|3 or more of a|
|a{3,6}|Between 3 and 6 of a|
|i | ignore case |
|\d | one digit 0-9 | ie: foo_\d\d = file_55
|\D | non-digit |
|\w | one word,letters, digits or _, |ie: \wfoo- = foo-, foo-bar
|\W | non-word|
|\s | whitespace |ie: foo\sbarr = foo bar
|\S | non-whitespace |
|\x | Hexadecimal digit|
|\O | Octal digit|
|===
Flags
Regex may have flags that affect the search. For example there are 6 of them in JavaScript:
[cols="^1,8"] |=== |Flag | Explanation
|i | With this flag the search is case-insensitive: no difference between A and a.
|g |With this flag the search looks for all matches, without it - only the first match is returned.
|m |Multiline mode.
|s |Enables "dotall" mode, that allows a dot . to match newline character \n.
|u |Enables full Unicode support. The flag enables correct processing of surrogate pairs.
|y |“Sticky” mode: searching at the exact position in the text. |===
Character
[cols="^1,4,3"]
|===
|\t | tab|
|\r | return|
|\n | new line|
|\v | Vertical tab|
|\f | Form feed|
|\xxx | Octal character xxx|
|\xhh | Hex character hh|
|\ | escapes regex special character, |ie: \.\*\+\? \$\^\/\\ = .*\|+? $^/\
|===
Quantifer
[cols="^1,2,4"]
|===
|+ | once or more, |ie: \w-\d+ = img-1, img-23, img-1001
|? | once or none, |ie: \w-\d? = img-1, img-2, img-
|* | zero or more, |ie: \w*-\d = imgimg-1, imgimg-2
|{3} | three times, |ie: \w{3}-\d = imgimgimg-1
|{2,} | two or more times, |ie: \w-\d{2,} = img-23, img-1001
|[123] | matches class, |ie: [1-3] = 1,2,3
|q[^x] | negates matches, |ie: q[^x] = question but not iraq
|===
Anchor
[cols="^1,2,4"]
|===
|^ | beginning of line, not in [] |ie: ^app = appMain.js, appAuth.\|js
|$ | end of line, |ie: .*?Auth.js$ = appAuth.js
|\A | beginning of string|
|\Z | end of string|
|===
Logic
[cols="^1,2,4"]
|===
|() | capturing group, |ie: A(nt\|pple) = pple in Apple
|(?) | non-capuring group|
|===
Modifier
[cols="^1,4,4"]
|===
|(?i) | case-insensitive |ie: (?i)Monday = Monday, monday, monDAY
|(?m) | multiple lines so ^ and $ match in multiple places, |ie: (?m)1\r\n\^2$\r\n\^3$ = 1\n2\n3
|(?x) | whitespace mode|
|===
Lookaround
[cols="^1,2,4"]
|===
|(?= | positive lookahead |ie: (?=\d{10})\d{5} = 01234 in 0123456789
|(?<= | positive lookbehind |ie: (?<=\d)cat = cat in 1cat
|(?! | negative lookahead |ie: (?!theatre)the\w+ = theme
|(?<! | negative lookbehind |ie: \w{3}(?<!mon)ster = Munster
|===
Replacement
[cols="^1,2,4"] |=== $foo - inserts foo $& - inserts entire match $` - inserts preceding string $' - inserts following string $Y - insert Y'th captured group |===
REGULAR EXPRESSIONS YOU SHOULD KNOW:
[cols="2,6"] |=== |Maching | RegEx
|Username| /^[a-z0-9_-]{3,16}$/ |Password| /^[a-z0-9_-]{6,18}$/ |Hex Value| /^#?([a-f0-9]{6}|[a-f0-9]{3})$/ |Slug| /^[a-z0-9-]+$/ |Email| /^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/ |URL| /^(https?://)?([\da-z.-]+).([a-z.]{2,6})([/\w .-])/?$/ |IP Address| /^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$/ |HTML Tag| /^<([a-z]+)([^<]+)(?:>(.)</\1>|\s+/>)$/ |===
Example & Breakdown
.find html hexidecimal colors like #fff, #aa0099, #CCCCCC [source,bash]
^#(?:[0-9a-fA-F]{3}){1,2}$
[cols="1,4"] |=== |^|anchor for start of string |#|the literal # |(|start of group ^|?:|indicate a non-capturing group that doesn't generate backreferences ^| [0-9a-fA-F]|hexadecimal digit ^| {3}|three times |)|end of group |{1,2}|repeat either once or twice |$|anchor for end of string |===
Examples
[cols="4,3"]
|===
|sep[ae]r[ae]te | find separate even if it's misspelled seperate
|<div\b[^>]>(.?) | match open and closing html div tag
|\b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}\b. | match valid email
|(?(?=condition)(then1|then2|then3)|(else1|else2|else3)). | if/then/else conditionals
|===
.example [source,bash]
[a-f0-9]{3}
- Matches any string consisting of lowercase characters between
a-fand digits between0-9 - At 3 total characters {3}.
.example [source,bash]
#?
- Another Quantifier ?,
- matches between 0 and 1 characters of the preceeding expression boundaries #
.example [source,bash]
[a-f0-9]{6}|[a-f0-9]{3}
- Matches any string consisting of lowercase characters between
a-fand digits between0-9At 6 total characters{6}
OR +
- Matching any string consisting of lowercase characters between
a-fand digits between0-9At 3 total characters{3}.
.example [source,bash]
([a-f0-9]{6}|[a-f0-9]{3})
This specifies within the brackets that we are looking for and extracting strings
consisting of a-z and 0-9 criteria and plating them as Capture Groups.
.example [source,bash]
(/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/)
- In the above pattern,
/^([a-z0-9_\.-]+)says that email must begin with alpha-numeric characters with only lower case, it may also include ., _, -. -
@meaning there must be @ sign after first chars. - The pattern
[\da-z\.-]+says that after @ char there must be lowercase alpahabet and may also contain ., -. -
\.indicates that there should be a period to separate domain and sub-domain names in the email. - The pattern
[a-z\.]{2,6})$/says that the email should end with 2 to 6 lowercase chars. Here 2 indicates the minimum number and 6 indicates the maximum number of characters.
TODO https://gist.github.com/bashworthj/95eee63656342321c248463e1d052970
References:
http://www.regular-expressions.info/quickstart.html
http://www.rexegg.com/regex-quickstart.html
http://en.wikibooks.org/wiki/Regular_Expressions/syntax/posix_basic_regular_expression
http://blog.codinghorror.com/excluding-matches-with-regular-expressions/