help with regular expression problem

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

help with regular expression problem

Postby shanbuv » Mon Mar 05, 2012 12:10 pm

Hi
i'm trying to build a regular expression that will look for a specific combination of strings but must not include a specific combination, and i'm having problems :(
what i need is to find all strings that:
1. contain the phrase "*contain*love*" (meaning, the sentence "contain love" is good, the sentence "contain big love" is good, the sentence "must contain great love" is good, the sentence "might contain very big love indeed" is good, etc.) - this part is easy and i don't have an issue with it.
2. DO NOT contain the exact combination "love child" (meaning, the sentence "might contain very big love sick child" is ok, but the sentence "might contain very big live child today" should not match - this is where i'm stumped

can anyone help ?
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby ragax » Mon Mar 05, 2012 1:20 pm

Hi Shambuv,

This will do it:
Syntax: [ Download ] [ Hide ]
(?!.*?love child)contain.*?love

It uses a lookaround (negative lookahead) to ensure "love child" is not present.
Please let me know if you have any questions. :)

For the record, you could also place the negative lookahead after love:
Syntax: [ Download ] [ Hide ]
contain.*?love(?!.*?child)


But although this would forbid "contain little love child", it would still allow "contain love child love".
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby shanbuv » Wed Mar 07, 2012 9:03 am

Works like a charm !
Many many thanks !!
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby shanbuv » Wed Mar 07, 2012 9:22 am

OK, now i need to add another small twist to this:

1. contain the phrase "*contain*love*" (meaning, the sentence "contain love" is good, the sentence "contain big love" is good, the sentence "must contain great love" is good, the sentence "might contain very big love indeed" is good, etc.)
2. DO NOT contain the exact combination "love child" (meaning, the sentence "might contain very big love sick child" is ok, but the sentence "might contain very big love child today" should not match
3. the word "might" cannot appear before "contain" - so "may contain love" is ok, "may contain crazy love tonight" is ok, but "might contain love" should not match, "might contain crazy love" should not match, "might contain very big love child today" should not match

Help?
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby ragax » Wed Mar 07, 2012 1:38 pm

Hi Shanbuv,

Delighted that it works for you.
For the twist you are asking for, it's the same idea: we add a negative lookbehind.
You can add a number of lookaheads and lookbehinds to specify what a string must look like, that's a common technique for password validation.

Syntax: [ Download ] [ Hide ]
(?!.*?love child)(?<!might\s)contain.*?love


Please let me know if this is what you need. :)
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby shanbuv » Thu Mar 08, 2012 10:46 am

awesome ! works perfect
will run this against all the sentences to make sure everything is covered
many thanks
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby ragax » Thu Mar 08, 2012 2:34 pm

You're welcome, shanbuv, please don't hesitate to ask again.
Wishing you a fun day.
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby shanbuv » Sun Mar 11, 2012 4:51 am

OK, found one "hole"...

the following sentence:
"contains bliss and might contain love rumors"

is identified by:
(?!.*?love child)(?<!might\s)contain.*?love

which is not what i wanted :-(

the word "might" cannot appear before "contain" - so "may contain love" is ok, "may contain crazy love tonight" is ok, but "might contain love" should not match, "might contain crazy love" should not match, "might contain very big love child today" should not match

i guess this is because the first contains is matched with the last love ?

how do i go about solving this ?
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby ragax » Sun Mar 11, 2012 2:17 pm

Hi Shanbuv,

Let's see what happens if we change it to:

Syntax: [ Download ] [ Hide ]
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain)(?>.*?contain)(?>.*?love)


- For now read it without paying attention to the four "?>" in the expression, I added these atomic groups because the expression is getting heavy with dot-stars and the four "?>" will help it fail faster when it needs to fail.
- The expression now has three rules:
1. Cannot contain "love child"
2. Cannot contain "might contain"
3. Must contain "contain .... love"

This works and fails with everything you have specified so far. But note that this will reject
might contain bliss and does contain love

Please confirm that this is what you intend, otherwise we'll tweak it again.
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby shanbuv » Mon Mar 12, 2012 7:15 am

Hi,
First, thanks for the efforts, my head is spinning just trying to understand what you're generating...

the sentence "contains bliss and might contain love rumors" is handles properly, but "contains bliss and might contain love rumors" is not match - upon reading the rules you specified, i see the problem.

the rules i want are slightly different:
1. Must contain "contain .... love"
2. Cannot contain "love child"
3. Cannot contain "*might contain*love" - meaning "contains bliss and might contain love rumors" should not match, but "might contain bliss and contain love" should match (i don't care if "might" appears, just not before the "contain*love" section)
some more examples
"might contains bliss, affection or anything else but might contain love rumors" should not be matched
"might contains bliss, affection or anything else but contain love naturally" should be matched
"might contains bliss, affection or anything else but contain love child forever" should not be matched

many thanks
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby ragax » Mon Mar 12, 2012 3:19 pm

Hi Shanbuv,

I am interpreting this:
3. Cannot contain "*might contain*love"


as:
Cannot contain "might contain[space]love"

because you later say
i don't care if "might" appears, just not before the "contain*love" section

If so, we just add space-love to the "might contain" negative lookahead in our previous regex:
Syntax: [ Download ] [ Hide ]
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain\slove)(?>.*?contain)(?>.*?love)


Let me know if that works for you.
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby shanbuv » Tue Mar 13, 2012 10:25 am

Hi
when i said, cannot contain "*might contain*love" , i meant cannot contain "might[space]contain[anything]love" , but in a non greedy way

for example
"might contains bliss, affection or anything else but might contain love rumors" should not be matched, since "might contain love" appears
"might contains bliss, affection or anything else but might contain big big love" should not be matched, since "might contain[anything]love" appears

the problematic example
"might contains bliss, affection or anything else but contain big big love" SHOULD match since the second "contain" part is ok.

the rule here, and i hope i explain correctly:
cannot contain "*might[space]contain[anything]love" unless there's another "contain" in the [anything] part, in which case, if other rules apply (contains[anything]love and does not have "love child") then it should match (=if you find another "contain" in the [anything], start checking again...)

do i make sense?
thanks
S.
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Re: help with regular expression problem

Postby ragax » Tue Mar 13, 2012 4:00 pm

Syntax: [ Download ] [ Hide ]
^(?!(?>.*?love) child)(?!(?>.*?might)\scontain(?:.(?!(?<!might )contain))+?love)(?>.*?contain)(?>.*?love)


Have fun with that.
It made sense to me a second ago, but don't ask me to explain it as there's a triple negative.
:)
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby ragax » Tue Mar 13, 2012 4:05 pm

By the way, the triple negative is a sign that the "say what you DON'T want approach" has maxed out on this regex.
At this stage, to refine the regex for readability, I might switch to a "say what you DO want" approach: match (expression without "might contain") OR (expression with "might contain" in a way that is acceptable).
In the meantime, the expression as it is should work. Let me know if you need further help on it. :)

Technically, though, the expression above is quite interesting (for someone learning regex) because it showcases the use of a lookaround within a lookaround (specifically, a negative loobehind within a negative lookahead within a negative lookahead).
User avatar
ragax
Forum Commoner
 
Posts: 85
Joined: Thu Dec 15, 2011 2:40 pm
Location: Nelson, NZ

Re: help with regular expression problem

Postby shanbuv » Thu Mar 15, 2012 6:56 am

Good god.... how in god's name did you manage to come up with this ?
will test this and let you know
10x again
S
shanbuv
Forum Newbie
 
Posts: 8
Joined: Mon Mar 05, 2012 12:08 pm

Next

Return to Regex

Who is online

Users browsing this forum: No registered users and 1 guest