Help in Regular Expression

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
RuchiSaini
Forum Newbie
Posts: 5
Joined: Sat May 16, 2015 3:49 am

Help in Regular Expression

Post by RuchiSaini »

Hi,

Can you please advice me on following regular expressions

1)(green|blue)?+.+
i understood that it will take either green or blue which is optional but +.+
changes its meaning and it accepts colors which are not green and blue
but i dont understand how is it affecting

2)^([\"']?)\\d\\d:\\d\\d\\1,([\"']?)[A-Z]\\w+\\2,.*$"
It accepts
10:23,Added,Queue,7432e01
10:53,"Removed","Queue","7432e01"
10:23,Added,,queue 2,7432e01
i believe backreferences are used here then they shd be using
value in capturing group only , why are these two value accepted -> Added,Queue.
Also if you check the third line, if i dont give any value then that is also accepted
Please guide

Thanks
Ruchi
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Help in Regular Expression

Post by requinix »

1) Adding a second quantifier changes how the repetition works. Adding a + means that the engine will not backtrack through it when trying to match the rest of the regex.

Code: Select all

(green|blue)?+
That will match "green" or "blue" optionally, but if it does match and the rest of the regex cannot match, then the entire regex will fail. For example,

Code: Select all

var_dump(preg_match('/(green|blue)?+green/', 'green')); // int(0)
Normally the regex would match the "green" in the first part, fail to match the "green" in the second part, backtrack so that the first part does not match (since it was optional), and successfully match the second "green". Adding the + means that as soon as it reaches the backtracking part the regex fails.

.+ works normally: there must be one or more characters. All together the regex is... well, it's unnecessarily complicated. There's three cases of input:
a) The string contains "greenX", which will match with $0=greenX and $1=green
b) Same with blueX: $0=blueX, $1=blue
c) If the string doesn't contain either then it will still be matched because of the .+

Adding the + is basically for performance so don't worry about it too much. Actually adding it is more likely to break a regex than help it because backtracking plays a significant role in how regexes are generally used.

2) Your regex only really matches against the "10:23,Added,". What's happening is that final .* is matching the rest of the line. Try removing it (and the $ with it).
RuchiSaini
Forum Newbie
Posts: 5
Joined: Sat May 16, 2015 3:49 am

Re: Help in Regular Expression

Post by RuchiSaini »

2) I got it ...thanks :)

1) For RegEx (green|blue)?+.+
and String value : green
Match fails
and String value : red
Match passes
So actually I didnt get your answer or maybe i have not posted my question properly

Thank you so much
Ruchi
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Help in Regular Expression

Post by requinix »

Two parts
1. (green|blue)?+
2. .+

"green" matches the first part but since there isn't anything after it does not match the second part. Because of the ?+ the engine will not backtrack to undo the first match (which was optional) so that it can make the second instead. If you change it to just ? then it would backtrack and the string would match.
"red" does not match the optional first part but does match the second part.
RuchiSaini
Forum Newbie
Posts: 5
Joined: Sat May 16, 2015 3:49 am

Re: Help in Regular Expression

Post by RuchiSaini »

okay got it...thank u so much :)
RuchiSaini
Forum Newbie
Posts: 5
Joined: Sat May 16, 2015 3:49 am

Re: Help in Regular Expression

Post by RuchiSaini »

Hi,

You have said that backtracking plays a significant role in how regexes are generally used.

I am not aware of these guidelines...can u guide me any link/site which is a good reference point to understand that

Thanks
User avatar
requinix
Spammer :|
Posts: 6617
Joined: Wed Oct 15, 2008 2:35 am
Location: WA, USA

Re: Help in Regular Expression

Post by requinix »

Not really - backtracking is one of those things most people rely upon without realizing it. Which is why things like possessive quantifiers (?+, *+, ++) or their nicer relative once-only subpatterns (?>...) can easily break someone's regex.

If you want to learn, regular-expressions.info is a good place to start. The documentation for Perl's perlre is another place to get more technical information, and of course there's PHP's own PCRE documentation.
RuchiSaini
Forum Newbie
Posts: 5
Joined: Sat May 16, 2015 3:49 am

Re: Help in Regular Expression

Post by RuchiSaini »

okay...will go thru these sites...thanks
Post Reply