PHP Developers Network

A community of PHP developers offering assistance, advice, discussion, and friendship.
 
Loading
It is currently Fri Sep 21, 2018 6:39 am

All times are UTC - 5 hours




Post new topic Reply to topic  [ 31 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject:
PostPosted: Tue Nov 14, 2006 10:58 am 
Offline
Forum Contributor
User avatar

Joined: Mon Nov 13, 2006 5:19 am
Posts: 137
Location: Argentina and Italy
I use this regexp, it's a bit (bit? :) ) permissive but works find 'til now
Syntax: [ Download ] [ Hide ]
 
function validate_email($email_string) {
 
    if(eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$", trim($email_string))){
        return true;
    }
    else{
        echo "my error messege";
        return false;
    }
}

whet do you think about it?
is it necessary to implement a strict one?
in which cases do you recommend to improve it?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 14, 2006 3:48 pm 
Offline
Neighborhood Spidermoddy
User avatar

Joined: Mon Mar 29, 2004 4:24 pm
Posts: 31559
Location: Bothell, Washington, USA
If you're going to use regex, you may as well use the fully standards compliant one.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 14, 2006 5:42 pm 
Offline
DevNet Master
User avatar

Joined: Mon Oct 25, 2004 9:29 pm
Posts: 3698
Location: New Jersey, US
It's fun to see people being pedantic about web standards, as it's a problem I run into all the time when trying to write high-quality code.

I have little doubt that the code supplied correctly verifies RFC 822 addresses: while a bunch of unit tests would be nice, I trust you guys to write the correct stuff.

Whether or not, however, this code is practical to use in a real world setting is not. This is the point I think redmonkey brought up and failed to realize that the discussion, to this point, was purely theoretical.

I think it would be useful to also discuss how practical it would be to use such a monster regex. There were several things brought up already, and also some more topics:

1. Whether or not such complex processing is required for a validation process that will end up being further checked through, say, a verification email (on a similar tack, can you get away with no processing if you send out validation emails?)
2. What would a good, practical and concise email regex be that would adhere to both the RFC and real world usage of email addresses?
3. Under what circumstances would such strict checking be merited? Before you sent an email to the address? For inclusion in a mailto link?
4. How would one go about making the regex faster without sacrificing RFC-compliance? Also, could you make a regex that parses the email into its component parts so that you could do more fine-grained filtering?
5. Why doesn't PHP have a native RFC-compliant email validation function? (okay, maybe filter, but I don't know if it's standards compliant)


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 14, 2006 6:15 pm 
Offline
DevNet Master
User avatar

Joined: Tue Nov 02, 2004 6:43 am
Posts: 2704
Location: Ireland
There are any number of reasons.

1. I really want to validate the email syntax without requiring a verification mail (newsletter signup, temp notices, etc.)
2. To avoid using the common misconceived regex's which are not RFC compliant, but are also not permissive enough to at least allow RFC syntax.
3. The Zend Framework refuses to implement anything else...;)
4. Is the regex that costly for infrequent signups or email submissions?

On ext/filter, the source code implements an RFC822 compliant regex (note the one in this topic is a regex builder - not the actual string regex) which is already in PEAR. See:

http://cvs.php.net/viewvc.cgi/pear/HTML ... iew=markup

See also logical_filters.c in the filter source:
Syntax: [ Download ] [ Hide ]
<div class="c" id="{CB}" style="font-family: monospace;"><ol><li style="" class="li1"><span style="color: #993333;">void</span> php_filter_validate_email<span style="color: #66cc66;">&#40;</span>PHP_INPUT_FILTER_PARAM_DECL<span style="color: #66cc66;">&#41;</span> <span style="color: #808080; font-style: italic;">/* {{{ */</span></li><li style="" class="li2"><span style="color: #66cc66;">&#123;</span></li><li style="" class="li1">    <span style="color: #808080; font-style: italic;">/* From http://cvs.php.net/co.php/pear/HTML_Qui ... .php?r=1.4 */</span></li><li style="" class="li2">    <span style="color: #993333;">const</span> <span style="color: #993333;">char</span> regexp<span style="color: #66cc66;">&#91;</span><span style="color: #66cc66;">&#93;</span> = <span style="color: #ff0000;">"/^((<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\"</span>[^<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\"</span><span style="color: #000099; font-weight: bold;">\\</span>f<span style="color: #000099; font-weight: bold;">\\</span>n<span style="color: #000099; font-weight: bold;">\\</span>r<span style="color: #000099; font-weight: bold;">\\</span>t<span style="color: #000099; font-weight: bold;">\\</span>v<span style="color: #000099; font-weight: bold;">\\</span>b]+<span style="color: #000099; font-weight: bold;">\\</span><span style="color: #000099; font-weight: bold;">\"</span>)|([<span style="color: #000099; font-weight: bold;">\\</span>w<span style="color: #000099; font-weight: bold;">\\</span>!<span style="color: #000099; font-weight: bold;">\\</span>#<span style="color: #000099; font-weight: bold;">\\</span>$<span style="color: #000099; font-weight: bold;">\\</span>%<span style="color: #000099; font-weight: bold;">\\</span>&<span style="color: #000099; font-weight: bold;">\\</span>'<span style="color: #000099; font-weight: bold;">\\</span>*<span style="color: #000099; font-weight: bold;">\\</span>+<span style="color: #000099; font-weight: bold;">\\</span>-<span style="color: #000099; font-weight: bold;">\\</span>~<span style="color: #000099; font-weight: bold;">\\</span>/<span style="color: #000099; font-weight: bold;">\\</span>^<span style="color: #000099; font-weight: bold;">\\</span>`<span style="color: #000099; font-weight: bold;">\\</span>|<span style="color: #000099; font-weight: bold;">\\</span>{<span style="color: #000099; font-weight: bold;">\\</span>}]</span></li><li style="" class="li1"><span style="color: #ff0000;">+(<span style="color: #000099; font-weight: bold;">\\</span>.[<span style="color: #000099; font-weight: bold;">\\</span>w<span style="color: #000099; font-weight: bold;">\\</span>!<span style="color: #000099; font-weight: bold;">\\</span>#<span style="color: #000099; font-weight: bold;">\\</span>$<span style="color: #000099; font-weight: bold;">\\</span>%<span style="color: #000099; font-weight: bold;">\\</span>&<span style="color: #000099; font-weight: bold;">\\</span>'<span style="color: #000099; font-weight: bold;">\\</span>*<span style="color: #000099; font-weight: bold;">\\</span>+<span style="color: #000099; font-weight: bold;">\\</span>-<span style="color: #000099; font-weight: bold;">\\</span>~<span style="color: #000099; font-weight: bold;">\\</span>/<span style="color: #000099; font-weight: bold;">\\</span>^<span style="color: #000099; font-weight: bold;">\\</span>`<span style="color: #000099; font-weight: bold;">\\</span>|<span style="color: #000099; font-weight: bold;">\\</span>{<span style="color: #000099; font-weight: bold;">\\</span>}]+)*))@((<span style="color: #000099; font-weight: bold;">\\</span>[(((25[0-5])|(2[0-4][0-9])</span></li><li style="" class="li2"><span style="color: #ff0000;">|([0-1]?[0-9]?[0-9]))<span style="color: #000099; font-weight: bold;">\\</span>.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))<span style="color: #000099; font-weight: bold;">\\</span>.((25[0-5])|(2[0-4][0-9])</span></li><li style="" class="li1"><span style="color: #ff0000;">|([0-1]?[0-9]?[0-9]))<span style="color: #000099; font-weight: bold;">\\</span>.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))<span style="color: #000099; font-weight: bold;">\\</span>])|(((25[0-5])|(2[0-4][0-9])</span></li><li style="" class="li2"><span style="color: #ff0000;">|([0-1]?[0-9]?[0-9]))<span style="color: #000099; font-weight: bold;">\\</span>.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))<span style="color: #000099; font-weight: bold;">\\</span>.((25[0-5])|(2[0-4][0-9])</span></li><li style="" class="li1"><span style="color: #ff0000;">|([0-1]?[0-9]?[0-9]))<span style="color: #000099; font-weight: bold;">\\</span>.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))|((([A-Za-z0-9<span style="color: #000099; font-weight: bold;">\\</span>-])+<span style="color: #000099; font-weight: bold;">\\</span>.)+[A-Za-z<span style="color: #000099; font-weight: bold;">\\</span>-]+))$/"</span>;</li><li style="" class="li2"> </li><li style="" class="li1">    pcre       *re = <span style="color: #000000; font-weight: bold;">NULL</span>;</li><li style="" class="li2">    pcre_extra *pcre_extra = <span style="color: #000000; font-weight: bold;">NULL</span>;</li><li style="" class="li1">    <span style="color: #993333;">int</span> preg_options = <span style="color: #cc66cc;">0</span>;</li><li style="" class="li2">    <span style="color: #993333;">int</span>         ovector<span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">150</span><span style="color: #66cc66;">&#93;</span>; <span style="color: #808080; font-style: italic;">/* Needs to be a multiple of 3 */</span></li><li style="" class="li1">    <span style="color: #993333;">int</span>         matches;</li><li style="" class="li2"> </li><li style="" class="li1"> </li><li style="" class="li2">    re = pcre_get_compiled_regex<span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#40;</span><span style="color: #993333;">char</span> *<span style="color: #66cc66;">&#41;</span>regexp, &pcre_extra, &preg_options TSRMLS_CC<span style="color: #66cc66;">&#41;</span>;</li><li style="" class="li1">    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>!re<span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></li><li style="" class="li2">        RETURN_VALIDATION_FAILED</li><li style="" class="li1">    <span style="color: #66cc66;">&#125;</span></li><li style="" class="li2">    matches = pcre_exec<span style="color: #66cc66;">&#40;</span>re, <span style="color: #000000; font-weight: bold;">NULL</span>, Z_STRVAL_P<span style="color: #66cc66;">&#40;</span>value<span style="color: #66cc66;">&#41;</span>, Z_STRLEN_P<span style="color: #66cc66;">&#40;</span>value<span style="color: #66cc66;">&#41;</span>, <span style="color: #cc66cc;">0</span>, <span style="color: #cc66cc;">0</span>, ovector, <span style="color: #cc66cc;">3</span><span style="color: #66cc66;">&#41;</span>;</li><li style="" class="li1"> </li><li style="" class="li2">    <span style="color: #808080; font-style: italic;">/* 0 means that the vector is too small to hold all the captured substring offsets */</span></li><li style="" class="li1">    <span style="color: #b1b100;">if</span> <span style="color: #66cc66;">&#40;</span>matches < <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#123;</span></li><li style="" class="li2">        RETURN_VALIDATION_FAILED</li><li style="" class="li1">    <span style="color: #66cc66;">&#125;</span></li><li style="" class="li2"> </li><li style="" class="li1"><span style="color: #66cc66;">&#125;</span></li></ol></div>


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 14, 2006 8:49 pm 
Offline
DevNet Master
User avatar

Joined: Mon Oct 25, 2004 9:29 pm
Posts: 3698
Location: New Jersey, US
Quote:
I really want to validate the email syntax without requiring a verification mail (newsletter signup, temp notices, etc.)


But how do you prove that the email address belongs to the person?

Quote:
To avoid using the common misconceived regex's which are not RFC compliant, but are also not permissive enough to at least allow RFC syntax.


Good reason. I still have not found such a regex. But once again: would it hurt not to validate at all?

Quote:
The Zend Framework refuses to implement anything else...


Haha. They should at least provide a faster alternative, but that's cool. Do you have a mailing list discussion you can point me to so I can delve further?

Quote:
Is the regex that costly for infrequent signups or email submissions?


Not so much for those procedures, but if you're filtering a document of HTML with potentially many mailtos, each regex call is precious.

Quote:
See also logical_filters.c in the filter source:


When in doubt, check the source. But the regex seems very compact for an email validation regex. In contrast, once fully assembled, $mailbox is 7420 chars long; the other is only 604.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 15, 2006 5:35 am 
Offline
DevNet Master
User avatar

Joined: Tue Nov 02, 2004 6:43 am
Posts: 2704
Location: Ireland
No mailing list discussion really, it was decided long ago for Zend_Filter then there was the odd request from users (RFC822 is usually requested) which had the developers pointing out they could only accept a solution from someone who wrote an original version and signed a CLA.

Quote:
But how do you prove that the email address belongs to the person?


That's a different question :). The Regex only validates syntax/format to RFC822, whether it exists or not is beyond its scope.

Quote:
But once again: would it hurt not to validate at all?


Depends on the circumstances. If the email is being used as temporary identification (no/rare emailing of user) then it's probably useful to do so. If you have a long mailing list, then it's also useful to remove invalid addresses before starting a mass mail. Within reason, if the user must be sent a message, and they must validate from a link/address in that email message, then the point is probably moot - it's not strictly necessary then.

Another point is to provide user feedback - what if they submit an invalid email by accident?

Is it necessary? Depends. Is it useful? Definitely. Usage is optional.

Quote:
When in doubt, check the source. But the regex seems very compact for an email validation regex.


So true ;).

I might dig for this myself, but either it's a fantastic Regex or it's extra permissive. RFC822 is pretty complex. Once you get over the basics, there's a ton of detail. I won't rule it out as being a far more optimised regex until its proven either way.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 15, 2006 4:46 pm 
Offline
DevNet Master
User avatar

Joined: Mon Oct 25, 2004 9:29 pm
Posts: 3698
Location: New Jersey, US
Quote:
That's a different question Smile. The Regex only validates syntax/format to RFC822, whether it exists or not is beyond its scope.


But it's a question that comes along often enough to merit consideration. Once again: theory versus practice.

Quote:
Depends on the circumstances. If the email is being used as temporary identification (no/rare emailing of user) then it's probably useful to do so. If you have a long mailing list, then it's also useful to remove invalid addresses before starting a mass mail. Within reason, if the user must be sent a message, and they must validate from a link/address in that email message, then the point is probably moot - it's not strictly necessary then.


Hmm... that's a very interesting assertion that's inline with what I've noticed other programs like browsers and email clients also follow. Firefox has no qualms about sending mailto:@@@ to your mail client.

If this is true, I do not have to validate the contents inside a mailto: link.

Quote:
Another point is to provide user feedback - what if they submit an invalid email by accident?


This is usually addressed by requiring user to enter the email twice.

Quote:
I might dig for this myself, but either it's a fantastic Regex or it's extra permissive. RFC822 is pretty complex. Once you get over the basics, there's a ton of detail. I won't rule it out as being a far more optimised regex until its proven either way.


Would love to see the results.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Nov 15, 2006 10:12 pm 
Offline
Forum Contributor
User avatar

Joined: Fri Oct 06, 2006 8:12 pm
Posts: 294
Ambush Commander wrote:
1. Whether or not such complex processing is required for a validation process that will end up being further checked through, say, a verification email (on a similar tack, can you get away with no processing if you send out validation emails?)

Whether it is required depends on the situation, I would imagine.

Earlier, the comments implied that regex checking ensures that the email is a valid format, not a valid destination, right? So, by checking the format first, you would eliminate emails being sent to an invalid address. That could mean less attacks against the mailserver, including a denial of service where bots sign up thousands of fake emails.

Ambush Commander wrote:
2. What would a good, practical and concise email regex be that would adhere to both the RFC and real world usage of email addresses?

If it allows more than the RFC, then it is too loose. If it doesn't allow enough, then it is too strict. In either case, its not RFC-compliant, I thought.

Ambush Commander wrote:
4. How would one go about making the regex faster without sacrificing RFC-compliance? Also, could you make a regex that parses the email into its component parts so that you could do more fine-grained filtering?

Have you tested the regex and found it unacceptably slow? Whats the timing?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 20, 2007 10:18 am 
Offline
Forum Newbie

Joined: Fri Jul 20, 2007 10:17 am
Posts: 1
Well, that kind of email validation is one way to do it, but theres a great post here on how to check email addresses actual existance. Its pretty kickass check it ouut http://www.static-chaos.net/viewtutorials/Godly_Email_Validation


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jul 20, 2007 11:11 am 
Offline
DevNet Master
User avatar

Joined: Mon Oct 25, 2004 9:29 pm
Posts: 3698
Location: New Jersey, US
I have seen that technique before. As far as I can remember, some hosts don't support this method because it means that emails can be "discovered".


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 25, 2007 2:09 am 
Offline
Forum Commoner

Joined: Tue Jul 03, 2007 8:18 pm
Posts: 55
can I have such these validation with PHP:

* Format: "address@domain.xxx" or "domain.co.uk", ...
* Forbidden characters: ?, !, *, ...
* Valid domain: looser@red.mond is not valid
* Valid user: Verify that the user and mailbox really exist


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 25, 2007 2:18 am 
Offline
Forum Contributor

Joined: Tue Sep 11, 2007 4:19 am
Posts: 104
Woah, old thread.

I've found that roll-your-own email validation is tricky. You don't want to reject someone's valid email address because your code is wrong.

However, there are plenty of places where you want to prevent someone from say, submitting a comma separated email address list, or including headers that will allow them to send spam through you.

I have found that PEAR::Validate is the safest bet. It will reject spammer attempts to mass mail through you, but still allow every valid email address I've run into.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Sep 26, 2007 12:08 am 
Offline
Forum Contributor
User avatar

Joined: Fri Oct 06, 2006 8:12 pm
Posts: 294
cade wrote:
can I have such these validation with PHP:

You can choose to write your validation code to allow/disallow whatever you choose. However, that doesn't make it 'valid' or 'invalid' email - it just means it doesn't pass your criteria.


cade wrote:
* Format: "address@domain.xxx" or "domain.co.uk", ...

Most will allow those..

cade wrote:
* Forbidden characters: ?, !, *, ...

Easy enough

cade wrote:
* Valid domain: looser@red.mond is not valid

Thats not tue. It is a valid email address, especially if you are on the mond domain. Many small businesses do similar for internal mail.

cade wrote:
* Valid user: Verify that the user and mailbox really exist

That you cannot reliably do. Its not PHP's shortcoming - its a protective feature of most mail servers to prevent spam searches.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 10, 2007 12:11 am 
Offline
Forum Commoner

Joined: Tue Jul 03, 2007 8:18 pm
Posts: 55
but i have seen this works in somewhere...but don't know what engine they use to valid the existence of mail user..


Top
 Profile  
 
PostPosted: Wed Apr 09, 2008 4:29 am 
Offline
Forum Newbie

Joined: Wed Apr 09, 2008 4:24 am
Posts: 1
uses this to validate ',' (comma) seperated email addresses. this regular expression not yet performance tuned, but should be good for validating emails.

Expression: ^(\w+(.|_)\w+@\w+\.\w+)(,(\w+(.|_)\w+@\w+\.\w+)|\S)+$
Dont forget to add escape sequence to suit to your env.

-Chandan Benjaram


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 5 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB® Forum Software © phpBB Group