extracting numbers from file title and references

Any questions involving matching text strings to patterns - the pattern is called a "regular expression."

Moderator: General Moderators

extracting numbers from file title and references

Postby DrPL » Sun Feb 26, 2012 6:00 am

Hi,
I am not too sure exactly where to put this, as it is mainly a regex question but slightly crosses over into perl syntax.
Hopefully all will become clear.

At the moment, I have a directory full of files called
chapter1.txt, chapter2.txt and so on. Within each of these files are references encased in square brackets which I am trying
to link to external files. The format of the link is c1f1.html for chapter 1 reference 1, c3f5.html for chapter 3 reference 5.

So, in chapter 1,
[1]
becomes <a href="c1f1.html">[1]</a> and so on.

I have come up with a bit of code below

Syntax: [ Download ] [ Hide ]
opendir (DIR, "/home/paul/work/") or die "$!";
my @files = grep {/chapter*txt/}  readdir DIR;
foreach my $file (@files)
{
   open(FH,"/home/paul/work/$file") or die "$!";

   my ($chapnumber) = ($file =~/chapter(\d+).txt/);
       
   while (<FH>)
   {
        $dummyvar = ~s/\[(\d+\)]/<a href=\"c.$chapnumber.f.$1\.html\">\[$1\]<\/a>/g;
   }
   close(FH);
}


- but it falls over when it gets to the regex expression containing the angle brackets (the line starting $dumyvar = ...)
As far as I can see I'm extracting the chapter number from the title correctly, and the regex for replacing within
the file looks OK.
Can someone please suggest what might be wrong?

Many thanks

Paul
DrPL
Forum Commoner
 
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Postby abareplace » Sun Feb 26, 2012 6:40 am

You should not escape ) in the regular expression:

Code: Select all
\[(\d+)]


If I remember Perl syntax correctly, the dot before html and the brackets [] in replacement should NOT be escaped as well.
abareplace
Forum Newbie
 
Posts: 9
Joined: Fri Jan 06, 2012 2:43 am

Re: extracting numbers from file title and references

Postby DrPL » Sun Feb 26, 2012 7:02 am

Looks like I got the backslash in the wrong place. It should have been

Syntax: [ Download ] [ Hide ]

$dummyvar = ~s/\[(\d+)\]/<a href=\"c.$chapnumber.f.$1\.html\">\[$1\]<\/a>/g;

DrPL
Forum Commoner
 
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Postby DrPL » Sun Feb 26, 2012 7:04 am

abareplace wrote:You should not escape ) in the regular expression:

Code: Select all
\[(\d+)]


If I remember Perl syntax correctly, the dot before html and the brackets [] in replacement should NOT be escaped as well.


I think I need the escape, otherwise the dot would be treated as a concat operator (?). I need it to be a punctuation delimeter, as in "blahblah.html"
DrPL
Forum Commoner
 
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Postby abareplace » Sun Feb 26, 2012 9:23 am

It's inside the string, so there is no concatenation operator. The variables are interpolated. You need the following code:

Code: Select all
~s/\[(\d+)]/<a href="c${chapnumber}f$1.html">[$1]<\/a>/g;

See http://ideone.com/moBAf
abareplace
Forum Newbie
 
Posts: 9
Joined: Fri Jan 06, 2012 2:43 am

Re: extracting numbers from file title and references

Postby DrPL » Sun Feb 26, 2012 10:20 am

abareplace wrote:It's inside the string, so there is no concatenation operator. The variables are interpolated. You need the following code:

Code: Select all
~s/\[(\d+)]/<a href="c${chapnumber}f$1.html">[$1]<\/a>/g;

See http://ideone.com/moBAf


Thanks, very interesting. Is this why in your code the ] isn't escaped? I'm also a bit confused about why "chapnumber" is in curly brackets to separate it from the "$", but the grouped $1 isn't.
DrPL
Forum Commoner
 
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Postby DrPL » Sun Feb 26, 2012 12:40 pm

I made a mistake; rather than the link being of the form <a href="blah.html"> it should have been <a href="#blah">.
I've modified my code, and included a few print statements to confirm that the chapter numbers are being stripped out;
and they are, but the replacement regex is still not working.

Syntax: [ Download ] [ Hide ]

#!/usr/bin/perl

@files = <*>;

foreach $file (@files)
{
   open(FH,"/home/paul/kp/$file") or die "cannot open file";

   print $file . "\n";

   my ($chapnumber) = ($file =~/chapter(\d+).txt/);

        print $chapnumber . "\n";
       
   while (<FH>)
   {
        $dummyvar = ~s/\[(\d+)\]/<a href="#c${chapnumber}f$1">[$1]<\/a>/g;

   }
   close(FH);
}
closedir(DIR);

 
DrPL
Forum Commoner
 
Posts: 26
Joined: Wed Oct 07, 2009 4:22 pm

Re: extracting numbers from file title and references

Postby abareplace » Sun Feb 26, 2012 7:25 pm

${chapnumber} is in curly brackets to separate it from f. If you don't put the brackets, it would be $chapnumberf.

AFAIK, $dummyvar is not needed.

The replacement regex is working (as you can see from the program at ideone.com), but you don't write the result anywhere. The file is opened in read-only mode, you are replacing it into $_, but don't print the result.
abareplace
Forum Newbie
 
Posts: 9
Joined: Fri Jan 06, 2012 2:43 am


Return to Regex

Who is online

Users browsing this forum: Bing [Bot] and 1 guest