Regex To Extract 2nd Level Domains From All TLDs ?

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

Post Reply
UniqueIdeaMan
Forum Contributor
Posts: 197
Joined: Wed Jan 18, 2017 3:43 pm

Regex To Extract 2nd Level Domains From All TLDs ?

Post by UniqueIdeaMan »

Good Day Folks!

1. Is the following regex ok to extract top level domains and 2nd level domains ?
[^.]*\.[^.]{2,3}(?:\.[^.]{2,3})?$

2. How to write php code to use that regex ?
Any sample code welcome.
UniqueIdeaMan
Forum Contributor
Posts: 197
Joined: Wed Jan 18, 2017 3:43 pm

Re: Regex To Extract 2nd Level Domains From All TLDs ?

Post by UniqueIdeaMan »

Guys,

I'm a complete beginner in regex and so any suitable tutorial suggestions for complete beginners are welcome too!

Anyway, as you know, different webpages would have different internal & external links all over their pages. No matter, what the link looks like, the domain should be extracted. Imagine, I'm running a web crawler, it would encounter unlimited links where some would have just domain and some subdomain and so on.
Eg.

http://domain.com
http://subdomain.domain.com


www.domain.com
http://www.domain.com


http://www.domain.com
http://subdomian.domain.com


domain.com/dir
subdomian.domain.com/dir

domain.com/dir/sub-dir
subdomian.domain.com/dir/sub-dir


Note: No matter how many subdomains or levels of domains (3rd level, 4th level, etc.) or dirs or sub-dirs (regardless of levels) the links contain, the 2nd level domain should be extracted along with it's tld.
From our examples above, the script should extract "domain.com" from all the above mentioned links.
I need an example of the php code too alongside the regex.
Post Reply