PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Sun Nov 23, 2014 10:52 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Mon Mar 05, 2012 12:10 pm 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
Hi
i'm trying to build a regular expression that will look for a specific combination of strings but must not include a specific combination, and i'm having problems :(
what i need is to find all strings that:
1. contain the phrase "*contain*love*" (meaning, the sentence "contain love" is good, the sentence "contain big love" is good, the sentence "must contain great love" is good, the sentence "might contain very big love indeed" is good, etc.) - this part is easy and i don't have an issue with it.
2. DO NOT contain the exact combination "love child" (meaning, the sentence "might contain very big love sick child" is ok, but the sentence "might contain very big live child today" should not match - this is where i'm stumped

can anyone help ?


Top
 Profile  
 
PostPosted: Mon Mar 05, 2012 1:20 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
Hi Shambuv,

This will do it:
Syntax: [ Download ] [ Hide ]
(?!.*?love child)contain.*?love

It uses a lookaround (negative lookahead) to ensure "love child" is not present.
Please let me know if you have any questions. :)

For the record, you could also place the negative lookahead after love:
Syntax: [ Download ] [ Hide ]
contain.*?love(?!.*?child)


But although this would forbid "contain little love child", it would still allow "contain love child love".


Top
 Profile  
 
PostPosted: Wed Mar 07, 2012 9:03 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
Works like a charm !
Many many thanks !!


Top
 Profile  
 
PostPosted: Wed Mar 07, 2012 9:22 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
OK, now i need to add another small twist to this:

1. contain the phrase "*contain*love*" (meaning, the sentence "contain love" is good, the sentence "contain big love" is good, the sentence "must contain great love" is good, the sentence "might contain very big love indeed" is good, etc.)
2. DO NOT contain the exact combination "love child" (meaning, the sentence "might contain very big love sick child" is ok, but the sentence "might contain very big love child today" should not match
3. the word "might" cannot appear before "contain" - so "may contain love" is ok, "may contain crazy love tonight" is ok, but "might contain love" should not match, "might contain crazy love" should not match, "might contain very big love child today" should not match

Help?


Top
 Profile  
 
PostPosted: Wed Mar 07, 2012 1:38 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
Hi Shanbuv,

Delighted that it works for you.
For the twist you are asking for, it's the same idea: we add a negative lookbehind.
You can add a number of lookaheads and lookbehinds to specify what a string must look like, that's a common technique for password validation.

Syntax: [ Download ] [ Hide ]
(?!.*?love child)(?<!might\s)contain.*?love


Please let me know if this is what you need. :)


Top
 Profile  
 
PostPosted: Thu Mar 08, 2012 10:46 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
awesome ! works perfect
will run this against all the sentences to make sure everything is covered
many thanks


Top
 Profile  
 
PostPosted: Thu Mar 08, 2012 2:34 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
You're welcome, shanbuv, please don't hesitate to ask again.
Wishing you a fun day.


Top
 Profile  
 
PostPosted: Sun Mar 11, 2012 4:51 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
OK, found one "hole"...

the following sentence:
"contains bliss and might contain love rumors"

is identified by:
(?!.*?love child)(?<!might\s)contain.*?love

which is not what i wanted :-(

the word "might" cannot appear before "contain" - so "may contain love" is ok, "may contain crazy love tonight" is ok, but "might contain love" should not match, "might contain crazy love" should not match, "might contain very big love child today" should not match

i guess this is because the first contains is matched with the last love ?

how do i go about solving this ?


Top
 Profile  
 
PostPosted: Sun Mar 11, 2012 2:17 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
Hi Shanbuv,

Let's see what happens if we change it to:

Syntax: [ Download ] [ Hide ]
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain)(?>.*?contain)(?>.*?love)


- For now read it without paying attention to the four "?>" in the expression, I added these atomic groups because the expression is getting heavy with dot-stars and the four "?>" will help it fail faster when it needs to fail.
- The expression now has three rules:
1. Cannot contain "love child"
2. Cannot contain "might contain"
3. Must contain "contain .... love"

This works and fails with everything you have specified so far. But note that this will reject
might contain bliss and does contain love

Please confirm that this is what you intend, otherwise we'll tweak it again.


Top
 Profile  
 
PostPosted: Mon Mar 12, 2012 7:15 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
Hi,
First, thanks for the efforts, my head is spinning just trying to understand what you're generating...

the sentence "contains bliss and might contain love rumors" is handles properly, but "contains bliss and might contain love rumors" is not match - upon reading the rules you specified, i see the problem.

the rules i want are slightly different:
1. Must contain "contain .... love"
2. Cannot contain "love child"
3. Cannot contain "*might contain*love" - meaning "contains bliss and might contain love rumors" should not match, but "might contain bliss and contain love" should match (i don't care if "might" appears, just not before the "contain*love" section)
some more examples
"might contains bliss, affection or anything else but might contain love rumors" should not be matched
"might contains bliss, affection or anything else but contain love naturally" should be matched
"might contains bliss, affection or anything else but contain love child forever" should not be matched

many thanks


Top
 Profile  
 
PostPosted: Mon Mar 12, 2012 3:19 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
Hi Shanbuv,

I am interpreting this:
Quote:
3. Cannot contain "*might contain*love"


as:
Cannot contain "might contain[space]love"

because you later say
i don't care if "might" appears, just not before the "contain*love" section

If so, we just add space-love to the "might contain" negative lookahead in our previous regex:
Syntax: [ Download ] [ Hide ]
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain\slove)(?>.*?contain)(?>.*?love)


Let me know if that works for you.


Top
 Profile  
 
PostPosted: Tue Mar 13, 2012 10:25 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
Hi
when i said, cannot contain "*might contain*love" , i meant cannot contain "might[space]contain[anything]love" , but in a non greedy way

for example
"might contains bliss, affection or anything else but might contain love rumors" should not be matched, since "might contain love" appears
"might contains bliss, affection or anything else but might contain big big love" should not be matched, since "might contain[anything]love" appears

the problematic example
"might contains bliss, affection or anything else but contain big big love" SHOULD match since the second "contain" part is ok.

the rule here, and i hope i explain correctly:
cannot contain "*might[space]contain[anything]love" unless there's another "contain" in the [anything] part, in which case, if other rules apply (contains[anything]love and does not have "love child") then it should match (=if you find another "contain" in the [anything], start checking again...)

do i make sense?
thanks
S.


Top
 Profile  
 
PostPosted: Tue Mar 13, 2012 4:00 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
Syntax: [ Download ] [ Hide ]
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain(?:.(?!(?<!might )contain))+?love)(?>.*?contain)(?>.*?love)


Have fun with that.
It made sense to me a second ago, but don't ask me to explain it as there's a triple negative.
:)


Top
 Profile  
 
PostPosted: Tue Mar 13, 2012 4:05 pm 
Offline
Forum Commoner
User avatar

Joined: Thu Dec 15, 2011 2:40 pm
Posts: 85
Location: Nelson, NZ
By the way, the triple negative is a sign that the "say what you DON'T want approach" has maxed out on this regex.
At this stage, to refine the regex for readability, I might switch to a "say what you DO want" approach: match (expression without "might contain") OR (expression with "might contain" in a way that is acceptable).
In the meantime, the expression as it is should work. Let me know if you need further help on it. :)

Technically, though, the expression above is quite interesting (for someone learning regex) because it showcases the use of a lookaround within a lookaround (specifically, a negative loobehind within a negative lookahead within a negative lookahead).


Top
 Profile  
 
PostPosted: Thu Mar 15, 2012 6:56 am 
Offline
Forum Newbie

Joined: Mon Mar 05, 2012 12:08 pm
Posts: 8
Good god.... how in god's name did you manage to come up with this ?
will test this and let you know
10x again
S


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group