vankuik.nl: Perl

Help

Help isn't far away:

  $ perldoc -f keyword

Parameters

Parameters (other word: arguments) to a script are stored in the @ARGS array.

 if(scalar(@ARGV != 1)) {
     print "Usage: $0 [MyParameter1]\n\n";
     exit 1;
 }
 $param_1 = $ARGV[0];

Arrays

An array of arrays

Perl doesn't know two- or multidimensional arrays. What's done is to create an array with references to arrays. References to arrays are easiest made by using square brackets -- that way, you don't have to create a new variable each time.

 sub create_2d_arr()
 {
     my @table;
     $table[0] = [1,2,3];
     $table[1] = [4,5,6];
     return @table;
 }

 # First, create an array with references to arrays

 my @table = create_2d_arr();

 # Second, obtain the first subarray from the main array and print it

 my $ref = $table[0];
 my ($one, $two) = @$ref;
 print "Content: $one $two\n";

 # Third, obtain the second subarray from the main array and print it

 $ref = $table[1];
 ($one, $two) = @$ref;
 print "Content: $one $two\n";

Chopping off

If you have an array with 1024, you can chop it down to, say. 256 elements by assigning the last index (not the size!) like this:

  my @xdata;
  # Fill array...
  $#xdata1 = (255);     # The array is cut down to 256 elements

To get all elements from the fifth to the fifteenth index:

  @piece = @xdata[5..15];

Reading a file

There are a gazillion ways to read a text file in Perl, but this one is mine:

 my $filename = "log_temptest-23-05-2007-17-07.csv";
 my $handle;

 PrivoxyWindowOpen($handle, $filename)
     or die "Couldn't open file: $!";

 my $line;
 while(defined($line = <$handle>)) {
     print $line;  # or do whatever you've got to do
 }
 close($handle) or die "Couldn't close: $!";

Concatenating

To concatenate, use push:

  @arr1 = (1, 2, 3);
  @arr2 = (4, 5, 6);
  push(@arr1, @arr2);
  print "@arr1\n";

Output:

  1 2 3 4 5 6

Hashes

There is no operator to append hashes, like push does with arrays. To add a second hash to the first:

    my %conf = read_config_file($config_file);
    my %conf2 = read_config_file($rc_file);

    # Add conf2 to conf
    foreach my $key (keys %conf2) {
        $conf{"$key"} = $conf2{"$key"};
    }

Catching errors

A nice way to do error catching is using eval blocks and die(). Suppose we have function foo() which calls bar(). Error handling can be implemented as follows:

  sub bar()
  {
    # Do something
    # Oh no, this goes wrong!
    die("Sorry, it didn't work out");
  }

  sub foo()
  {
    eval {
      bar();
    };        # Mind the semicolon!!

    if($@) {
      print "bar() error: $@\n";
    }

This also makes it easy to add error handling to existing Perl modules. Just add eval{}; around any calls and read out $@.

If you want to have code that executes no matter what, use two eval blocks:

  eval {
    connect() or die "Couldn't connect";
    do_stuff1();
    do_stuff2();
  };

  # If an error occurred, save it
  if($@) {
    $error = $@;
  }

  eval {
    disconnect() or die "Couldn't disconnect";
  };

  # Save errors occurred within the eval block
  if ($@) {
    $error = $@;
  }

  if($error) {
    print "Error: $error\n";
  }

CGI

Check out the NMS project for a number of Perl CGI scripts.

Feedback

To give feedback without wasting screen real estate, use \r. This returns to the start of the line instead of a \n new line.

  for($i=0; $i<100; $i++)
    printf("%d percent done", $i);

Testing for a regexp

To see whether a string starts with "foo":

  $input = "foo-a-licious";
  if ($input =~ /^foo/)
    print "yes, it matches\n";
  else
    print "no, this is bar-a-licous\n";

Changing a string with a regexp

To change a string with a quick regexp (sometimes I hate Perl):

  $bar="jizz";
  ($bar = $bar) =~ s/i/a/;
  print "$bar\n";

Output:

  jazz

An alternative for the above example:

  $_ = "jizz";
  s/i/a/;
  $bar = $_;
  print "$bar\n";

Stripping cruft from a string

To remove all other characters from a string, use tr as follows. We assume that the variable $string contains garbage and a couple of numbers. To strip off everything that's NOT a number:

  $string =~ tr/[^0-9]//cd;

Searching through a string

You have a string $y which you want to search for a particular pattern. The result must be stored in variable $result. Example: search for a number and everything that follows.

  $y="abc123def";
  $y =~ /\d.*/;              # Match the first 0...9 and everything after that
  $result = $&;

Why the flying duck the pattern matching result is saved in $& instead of $_ is utterly beyond me, but hey, welcome to Perl.

Splitting a line

Splitting a line with column headers:

  $line="     col1 col2   col3    col4    ";     # Example column headers
  ($line=$line) =~ s/^ +//;                      # Remove any leading whitespace
  @cols=split/ +/, $line;
  foreach $col(@cols) {                          # Print column headers with brackets
      print "[$col]\n";                          # so we can spot superfluous spaces
  }

Catch interrupt

A Perl script that catches CTRL-C/interrupt roughly looks like the following:

  $SIG{'INT'} = 'catchsig2';
  
  #
  # do something
  #
  exit(0);

  sub catchsig2
  {
    # Clean up open files, sockets, etc.
    exit(1);
  }

However, since an interrupt by definition breaks off whatever you're doing, you might prefer another less intrusive way, see also: controlling scripts.

Controlling scripts

To control a script from another script, you can use the IPC::ShareLite package. For Debian, it's available as a package. In this setup, you'll have two scripts: one that does the real work and one which is very small and just controls the worker script.

First the controller script, which is called with a parameter like run, stop or pause:

 use IPC::ShareLite;
 my $control = IPC::ShareLite->new(
     -key     => 'myproject',
     -create  => 'yes',
     -destroy => 'yes'
 ) or die ("Couldn't create control variable in shared memory: $!");
 $control->store("run");

 while($control->fetch() ne "stop") {
      # Do your work here
 }

Now the worker script:

 use IPC::ShareLite;

 my @VALID_CMD = ('run', 'stop', 'pause');

 my $control = IPC::ShareLite->new(
     -key     => 'myproject',
     -create  => 'no',
     -destroy => 'no'
 ) or die ("Worker script hasn't been started: $!");

 if(scalar(@ARGV == 0)) {
     printf("Current status: %s\n", $control->fetch());
 } elsif(scalar(@ARGV == 1) && grep($ARGV[0], @VALID_CMD)) {
     $stop_flag->store($ARGV[0]);
     printf("Status set to: %s\n", $control->fetch());
 } else {
     print "Usage: $0 [" . join('|', @VALID_CMD) . "]\n\n";
     exit 1;
 }

Read a file completely into a variable

This can be done by resetting the special variable $/ to undef. This variable contains the line separator.

  open(FD, "<file.txt")
    or die ("Can't open file: $!\n");
  local $/ = undef;
  my $buf = <FD>;
  close(FD);

Change to the current directory

To let your script change to the directory it runs in, place the following lines of code at the start of the script:

  BEGIN
  {
    chdir(dirname($0));
  }

Using constants in Perl

Perl has constants, but they're a bit of a hack. Don't use them if you're using mod_perl.

Actually, you don't want to use them in any other case, too -- instead use global variables and capitalize them. However, if you insist, read on.

To use constants in and outside a module:

  package mymodule;
  use constant ONE => 1;

  ...
  print "1+1=" . ONE + ONE;
  ...

Outside of the module, do:

  use mymodule;

  print "1+1=" . mymodule::ONE + mymodule::ONE;

If you want to refer to the constants without the package prefix, adjust package "mymodule" as follows:

  package mymodule;

  use base 'Exporter';
  
  our @EXPORT = qw(ONE TWO);

  use constant ONE => 1;
  use constant TWO => 2;

Be careful, since this "pollutes" the script that does a "use mymodule".

Writing to stderr

Very useful when writing CGI scripts! This'll end up in Apache's error_log.

  print STDERR "Test error message\n";

Alternatively, concatenate package, filename or line number in:

  print STDERR __FILE__ . ": " . __LINE__ . " [" . __PACKAGE__ . "] Oh Noooos!\n";

Passing strings by reference

  sub changestr
  {
      my $test = shift(@_);
      $$test = "Mein Herz brennt";
  }

  my $str = "Rammstein";
  changestr(\$str);
  print ("$str\n");

Passing arrays by reference

Sometimes you want to change multiple arrays in a subroutine. An example of this:

  sub arraypassref {
      my($tmp, $tmp2) = @_;
      push(@$tmp,  "d");
      push(@$tmp2, "h");
  }
  
  # First initialize our test arrays
  my @inarr  = ("a", "b", "c");
  my @inarr2 = ("e", "f", "g");
  
  # Pass them by reference to the subroutine
  arraypassref(\@inarr, \@inarr2);
  
  # Test whether the "d" was added
  my $elem;
  foreach $elem (@inarr) {
      print "$elem\n";
  }
  
  # Test whether the "h" was added
  foreach $elem (@inarr2) {
      print "$elem\n";
  }

Unittesting

Unit testing in Perl is dead simple and almost every module on CPAN uses it. Create a subdirectory "tests" and create the test files in there. It's customary that they end with the extension ".t". The test could look like the following:

  #!/usr/bin/perl -w

  use Test::More tests => 2; # Increase the number of tests here
  # or:
  # use Test::More "no_plan"

  use_ok('mymodule');        # The module we're going to test, can it be used?
  ok(1 == 1, "Test OK");     # 1st parameter is expression to be tested, 2nd is the message

Since the .t isn't recognized by vim as Perl code, create a subdirectory ~/.vim/ftdetect with a file called perltest.vim. Put the following line in this file:

  au BufRead,BufNewFile *.t set ft=perl

Multiline variable

  my $str = <<EOF;
  line 1
  line 2
  line
  3
  EOF
  print $str;

Show all environment variables

Use this script to show all environment variables in a Perl CGI script: testenv.pl

A simple menu

Sometimes you need to repeatedly fire off a piece of script. A simple menu is implemented below. Type a, b or some other option and let the user press enter.

    while(1) {
        # Code to print out menu goes here
        $c = <STDIN>;
        chop $c;
        if($c eq 'a') {
            print "You pressed the first letter of the alphabet\n\r";
        } elsif($c eq 'q') {
            print "You want to quit\n\r";
            exit 0;
        } else {
            print "Invalid option!";
            sleep 1;
        }
    }

If you can install the package Term::Screen, then the following menu only needs a keypress (not followed by enter):

    require Term::Screen;

    $scr = new Term::Screen;
    unless ($scr) { die " Something's wrong \n"; }
    $scr->clrscr();

    while(1) {
        menu;
        $c = $scr->getch();      # doesn't need Enter key 
        if($c eq 'a') {
            print "You pressed the first letter of the alphabet\n\r";
        } elsif($c eq 'q') {
            print "You want to quit\n\r";
            exit 0;
        } else {
            print "Invalid option!";
            sleep 1;
            $scr->clrscr();
        }
    }

Installing Perl modules in a non-root account

To install perl modules in a non-root account, do:

  $ mkdir -p $HOME/.cpan/CPAN

Now put a file named Config.pm in the new directory, with the following contents:

 $CPAN::Config = {
   'build_cache' => q[5],
   'build_dir' => q[HOMEDIRFIX/.cpan/build],
   'cache_metadata' => q[1],
   'cpan_home' => q[HOMEDIRFIX/.cpan],
   'dontload_hash' => {  },
   'ftp' => q[/usr/bin/ftp],
   'ftp_proxy' => q[],
   'getcwd' => q[cwd],
   'gzip' => q[/usr/bin/gzip],
   'histfile' => q[HOMEDIRFIX/.cpan/histfile],
   'histsize' => q[100],
   'http_proxy' => q[],
   'inactivity_timeout' => q[0],
   'index_expire' => q[1],
   'inhibit_startup_message' => q[0],
   'keep_source_where' => q[HOMEDIRFIX/.cpan/sources],
   'lynx' => q[ ],
   'make' => q[/usr/bin/make],
   'make_arg' => q[],
   'make_install_arg' => q[],
   'makepl_arg' => q[PREFIX=~/ SITELIBEXP=~/lib/perl5
        LIB=~/lib/perl5 INSTALLMAN1DIR=~/share/man/man1
        INSTALLMAN3DIR=~/share/man/man3
        INSTALLSITEMAN1DIR=~/share/man/man1
        INSTALLSITEMAN3DIR=~/share/man/man3],
   'ncftp' => q[ ],
   'ncftpget' => q[ ],
   'no_proxy' => q[],
   'pager' => q[less],
   'prerequisites_policy' => q[ask],
   'proxy_user' => q[],
   'scan_cache' => q[atstart],
   'shell' => q[/bin/sh],
   'tar' => q[/usr/bin/tar],
   'term_is_latin' => q[0],
   'unzip' => q[/usr/bin/unzip],
   'urllist' => [q[http://cpan.llarian.net/],
        q[ftp://cpan.nas.nasa.gov/pub/perl/CPAN/],
        q[ftp://cpan.pair.com/pub/CPAN/],
        q[ftp://ftp.duke.edu/pub/perl/],
        q[ftp://ftp.cs.colorado.edu/pub/perl/CPAN/],
        q[ftp://ftp.sunsite.utk.edu/pub/CPAN/],
        q[http://www.perl.com/CPAN/]],
   'wait_list' => [q[wait://ls6.informatik.uni-dortmund.de:1404]],
   'wget' => q[/usr/bin/wget],
 };
 1;
 __END__

Make it known to all Perl programs where your modules reside:

  $ export PERL5LIB="$HOME/lib/perl5"

Start the CPAN shell and install away!

  $ perl -MCPAN -e shell
  cpan> install Net::OpenID::Consumer

If you want to install a module manually, unpack the archive, go into the new directory and type:

  $ perl Makefile.pl <<contents of makepl_arg line>>
  $ make
  $ make test
  $ make install

Measure performance

For quick measurement of the time spent in a particlar piece of code, do something like this:

  use Time::HiRes qw(time);

  my $t1 = time;
  # Your code here
  printf("Time (seconds) taken: %.3f\n", (time - $t1));

Threading

An example of threading

    $| = 1;

    # Routine that prints a dot every second, for 10 sec.
    sub dots
    {
        for(my $i = 0; $i < 10; $i++) {
            print ".";
            sleep 1;
        }
    }

    # Start dot-printing thread
    my $thr = threads->new(\&dots);
    sleep 5;

    # After 5 seconds, wait for the 10-second thread to finish
    print "Joining";
    $thr->join();
    print "\n";

Decoding base64-encoded files

Sometimes, a client sends me e-mails with attachments from an Apple. Strangely enough, neither Thunderbird nor GMail handles them correctly. You'll get a file called 'noname', which appears to be a textfile with some MIME headers. Strip off the MIME headers and footer manually, then save the file. Then run the following snippet:

  perl -MMIME::Base64 -ne 'print decode_base64($_)' < noname > the_new_filename

In-place editing

If you want to substitute strings in a file, in other words, edit files in-place, then Perl is your friend. And a much better friend than sed, at that:

  perl -p -i -e 's/Click Here/Click There/' your_file_name.txt

It's so easy. Mnemonic: the options form the word pie.

CPAN tricks

To have CPAN questions continue with the default answer, set the following variable:

 $ export PERL_MM_USE_DEFAULT=1

Another way is to add (or reset) the following line to /etc/perl/CPAN/Config.pm (or your $HOME/.cpan/CPAN/MyConfig.pm) file:

 'prerequisites_policy' => q[follow],    # Instead of 'ask'

To run CPAN with sudo, use the -H flag. If you don't do this, then CPAN will write in your home directory its .cpan subdirectory -- but as root. This will cause all sorts of problems. Example:

 $ sudo -H cpan

Parsing hexadecimal strings

If you have a string containing a hexadecimal number, you can just use the built-in function hex():

  $ perl -e 'printf("The number is: %d\n", hex("FF"))';
  The number is: 255
  $

Flipping a bit

To flip a bit, use the ^ operator.

 my $x = 0;
 $x = $x ^ 2;
 print "$x\n";     # Will print 2
 $x = $x ^ 8;
 print "$x\n";     # Will print 10

Reading an XML file

The easiest way to read an XML file in Perl is to use an XPath expression.

 use XML::XPath;
 use XML::XPath::XMLParser;

 my $config_file = "blah.xml"
 my $xp = XML::XPath->new(filename => $config_file);
 my $nodeset = $xp->find('//daemon/log_dir');
 foreach my $node ($nodeset->get_nodelist) {
     $log_dir = $node->string_value;
 }

Fragment of the example XML file:

 <egse version="1.0">
     <daemon>
         <name>FEE1 daemon</name>
         <apid>3</apid>
         <apid_hk>1536</apid_hk>
         <housekeeping>aggregated</housekeeping>
         <update_period>60000</update_period>
         <log_dir>log</log_dir> 
     </daemon>
 </egse>

On Debian and Ubuntu, the Perl packages will be installed with:

 $ sudo apt-get install libxml-sax-perl libxml-sax-expat-perl \
     libxml-simple-perl libxml-xpath-perl

Parsing command-line parameters

The getopt package is a great way to parse command-line parameters without the fuss. The example below shows a script that expects to be called with a configuration file as follows:

 $ thescript.pl --config /etc/configfile.xml

The code is as follows:

 use Getopt::Long qw(:config pass_through);

 my $opt_config_path;

 GetOptions('config=s'     => \$opt_config_path
     ) or die("Invalid command line option");

 die("Missing parameter --config") if !defined $opt_config_path;

Some example parameter definitions: config=s for a string, userlimit=i for an integer, and someflag! for a single flag parameter (i.e. a parameter that can be passed in and of itself without a value, or not).

Dumping a variable in hex

Sometimes, you want to print the contents of a variable in a hexadecimal format. The shortest way is as follows:

  foreach (unpack("C*", $data))
      printf("%02X", $_);

This can even be entered on the prompt in the Perl debugger:

  DB<7> foreach (unpack("C*", $data_header)) {printf("%02X ", $_); }
  00 21 A8 00 00 AF 9F 00 00 21 A8 00 00 A8 CD 00 00 21 A8 00 00 A2 05 00 
  DB<8>

Get timestamp in ISO 8601 date format

We all know that a CSV file is the best thing a man can get! Here's a way to get the current date and time, including milliseconds, in the ISO date format that Excel knows and loves when importing CSV files.

  use Time::HiRes qw(gettimeofday);

  my ($time_sec, $time_msec) = gettimeofday();

  my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = 
      localtime($time_sec);

  my $iso_date = sprintf("%04d-%02d-%02d %02d:%02d:%02d.%03d",
      $year+1900, $mon+1, $mday, $hour, $min, $sec, $time_msec/1000);

Running the above will yield the current date/time in the format YYYY-MM-DD HH:MM:SS followed by the milliseconds:

  2011-01-10 16:35:10.364