Hold on tight this is going to be a fast paced, but to-the-point tutorial

Ready? Ok let's do it....
Code: Select all
/[\w\s]+\d{1,3}\t\W/
Lets start with metacharacters. Those \d, \s etc are what we refer to as "metacharacters". Metacharacters are characters which represent a particular group of real characters (some exceptions - see below).
The metacharacters and what they stand for:
Code: Select all
Character Matching
. (dot) ANY single character at all
\w Any single alphanumeric character (a-z, 0-9) and underscores
\d Any single digit (0-9)
\s Any single whitespace character
<<Uppercase negates the metacharcter>>
\W Any single non-alphanumeric character
\S Any single non-whitespace character
\D Any single non-digit
<<Something else to note>>
[x-y] Any single character in the range x to z (e.g. [A-Z])
[abc123] Any single character from a, b, c, 1, 2 or 3
[a-z125-9] Any single character from a to z, 1, 2 or 5 to 9
<<Negate these with caret "^" at the VERY start of the bracket>>
[^abc0-9] Any single character EXCEPT a, b, c and 0 to 9
You'll see some other metacharacters which don't actually match anything thats really there. They match invisible boundaries so we call them "zero-width assertions".
Code: Select all
Assertion Matching
^ (caret) The start of the string
$ The end of the string
\b A word boundary (the point between a non-alphanumeric character and an alphanumeric character)
Next, we can specify how many times a character should occur. We could do this to match a string of four digits:
Code: Select all
/\d\d\d\d/
Code: Select all
/\d{4}/
Code: Select all
Quantifier Meaning
+ One or more times
* None or more times
? None or one time
{n,m} Between n and m times
{y,} y or more times
{x} x times only
To delimit a regex we start and end with the EXACT same character. The two standards are (but you can use most non-alphanumeric characters):
Code: Select all
/pattern/
#pattern#
Code: Select all
$string = "Hello, I'm d11wtq and I'm 22 years old!";
if (preg_match("/\w+\W I'm \w\d{2}wtq and I'm \d+ years old\W/", $string)) {
echo "d11wtq is 22";
} else {
echo "d11wtq didn't tell me his age";
}
"\w+" matches an alphanumeric or underscore character one or more times
Hello
"\W" matches any single non-alphanumeric character
Hello,
" I'm " is just plain old string
Hello, I'm
"\w" is any single alphanumeric character
Hello, I'm d
"\d{2}" is two digits
Hello, I'm d11
"wtq and I'm " is just plain old string again
Hello, I'm d11wtq and I'm
"\d+" is one or more digits
Hello, I'm d11wtq and I'm 22
" years old" is plain old string
Hello, I'm d11wtq and I'm 22 years old
"\W" is any single non-alphanumeric charactcer
Hello, I'm d11wtq and I'm 22 years old!
If you understand that then let's move onto some "modifiers". If not, then read it again, and if you still don't get it, read it again.....
Note: When starting out in regex don't try and jump in with both feet. Match a tiny part of the string, then test it. Then add some more to your regex to match more of the string and test again. Repeat until the regex works.
Regex modifiers:
Code: Select all
/^pattern$/mis
Code: Select all
Modifier Effect
i Case insensitive
s Ignore whitespace
g Global search (not valid in PHP [use preg_match_all()] but handy if you're using JS or Perl). Tells the regex to keep looking after it's matched once
m Multi-line mode (^ and $ now match start and end of LINE not start and end of STRING)
Modifiers go on the right hand side of the closing delimiter.
Quick example:
Code: Select all
$string = "Hello World!";
if (preg_match('/^[a-z]/i', $string)) {
echo "Starts with a letter";
} else {
echo "Doesn't start with a letter";
}
"[a-z]" means match a lowercase a to z
Nothing matched - BUT
The "i" modifier makes the regex case insensitive - SO
H is all that is matched but this means it returns true anyway.
There are some things you should remember when working with regular expressions.
1. Escape characters with a backslash
2. Remeber to use quantifiers to match multiple times
3. Remember to match a dot "." you need to escape it "\." because dot "." is a metacharacter itself
4. Regex are case sensitive by default
5. "*" and "+" are what we call "greedy" (Read the follow up to this tutorial to learn more)
Next... Parentheses have more than one use in regex. They:
a) Group characters together
b) Extract the characters they surround into memory (to match a parenthesis itself you must escape it "\(" )
Something useful:
Code: Select all
//Check string represents a URL
$string = "http://www.foo.bar/";
if (preg_match("#^\w+://(www\.)?\w+\.\w+#i", $string)) {
echo "String is a URL";
} else {
echo "String isn't a URL";
}
A vertical bar character "|" is used to mean OR.
Code: Select all
$string = "abcdefg123456";
//abcdefgh23456 OR abcdefg123456
if (preg_match("/abcdefg(h|1)23456/", $string)) {
//True
} else {
//False
}
Sometimes you'll need to match part(s) of a string and extract them to use elsewhere. You do this using parentheses. Indexing starts at 1 and goes up by one for each parens used. The order follows this pattern with regards to nesting parens together:
Code: Select all
( 1 ( 2 ) ( 3 ( 4 ) ) ( 5 ( 6 ( 7 ) ( 8 ) ) ) ) ( 9 )
The best way to refer to an extracted part of a string is by the dollar sign "$" followed by the index of the part you extracted. (e.g. "$4" ).
However, that said, PHP handles things slightly differently with the preg_match() function. Indexing starts at zero (the entire string) and then from 1 as expected for the extracted parts. preg_match() also requires a third parameter to do this so that it can dump "$1", "$2", "$3" etc into an array.
Code: Select all
$string = "There's a number in here 123456 somewhere but I don't know what it is!";
preg_match("/[a-z\s]+(\d+)[a-z\s]+/i", $string, $matches); //s a number 123456 in here somewhere but I don
echo "The number in the string is " . $matches[1]; //The number in the string is 123456
preg_match() - I guess I have that one covered. Tests if the pattern is matched in the string. Returns TRUE if matched, FALSE if not. If the optional third parameter is given the function extracts parentheses enclosed parts of the pattern into a given array.
preg_match_all() - Same as preg_match() except that the regex doesn't stop when a match is found... it continues to find as many matches as exist in the string. The extracted array is a multi-dimensional array where all occurences of $1 are placed in $array[1] and all occurrences of $2 in $array[2] etc...
preg_replace() - Like str_replace() except it takes regex patterns as arguments:
Code: Select all
$string = "This is foo and that is bar";
$new_string = preg_replace('/f(\w+)/', "g$1", $string); //This is goo and that is bar
Code: Select all
$string = "lots of *@><& symbols &^% in this £! string";
$parts = preg_split('/[^\s\w]+/', $string);
print_r($parts);
/*
Array (
[0] => Lots of
[1] => symbols
[2] => in this
[3] => string
)
*/
ereg_replace() - Like preg_replace() without the advantages if Perl style patterns and slightly slower (use preg_replace() instead).
I guess that covers all the basics of using regex but believe me there's a lot more than this to learn if you have got this under your belt first.
[I'll follow this crash course up with an advanced regex tutorial given some time to write it]
Good luck and happy regex'ing!
