Perl note: July 2007

Tuesday, July 31, 2007

File::Glob - Perl extension for BSD glob routine

NAME
SUPPORTED PLATFORMS
SYNOPSIS
DESCRIPTION
DIAGNOSTICS
NOTES
AUTHOR
NAME
File::Glob - Perl extension for BSD glob routine
SUPPORTED PLATFORMS
Linux
Solaris
Windows
SYNOPSIS use File::Glob ':glob';
@list = bsd_glob('*.[ch]');
$homedir = bsd_glob('~gnat', GLOB_TILDE GLOB_ERR);
if (GLOB_ERROR) {
# an error occurred reading $homedir
} ## override the core glob (CORE::glob() does this automatically
## by default anyway, since v5.6.0)
use File::Glob ':globally';
my @sources = <*.{c,h,y}> ## override the core glob, forcing case sensitivity
use File::Glob qw(:globally :case);
my @sources = <*.{c,h,y}> ## override the core glob forcing case insensitivity
use File::Glob qw(:globally :nocase);
my @sources = <*.{c,h,y}>
DESCRIPTION
File::Glob::bsd_glob() implements the FreeBSD glob(3) routine, which is a superset of the POSIX glob() (described in IEEE Std 1003.2 ``POSIX.2''). bsd_glob() takes a mandatory pattern argument, and an optional flags argument, and returns a list of filenames matching the pattern, with interpretation of the pattern modified by the flags variable.
Since v5.6.0, Perl's CORE::glob() is implemented in terms of bsd_glob(). Note that they don't share the same prototype--CORE::glob() only accepts a single argument. Due to historical reasons, CORE::glob() will also split its argument on whitespace, treating it as multiple patterns, whereas bsd_glob() considers them as one pattern.
The POSIX defined flags for bsd_glob() are:
GLOB_ERR
Force bsd_glob() to return an error when it encounters a directory it cannot open or read. Ordinarily bsd_glob() continues to find matches.
GLOB_MARK
Each pathname that is a directory that matches the pattern has a slash appended.
GLOB_NOCASE
By default, file names are assumed to be case sensitive; this flag makes bsd_glob() treat case differences as not significant.
GLOB_NOCHECK
If the pattern does not match any pathname, then bsd_glob() returns a list consisting of only the pattern. If GLOB_QUOTE is set, its effect is present in the pattern returned.
GLOB_NOSORT
By default, the pathnames are sorted in ascending ASCII order; this flag prevents that sorting (speeding up bsd_glob()).
The FreeBSD extensions to the POSIX standard are the following flags:
GLOB_BRACE
Pre-process the string to expand {pat,pat,...} strings like csh(1). The pattern '{}' is left unexpanded for historical reasons (and csh(1) does the same thing to ease typing of find(1) patterns).
GLOB_NOMAGIC
Same as GLOB_NOCHECK but it only returns the pattern if it does not contain any of the special characters ``*'', ``?'' or ``[''. NOMAGIC is provided to simplify implementing the historic csh(1) globbing behaviour and should probably not be used anywhere else.
GLOB_QUOTE
Use the backslash ('\') character for quoting: every occurrence of a backslash followed by a character in the pattern is replaced by that character, avoiding any special interpretation of the character. (But see below for exceptions on DOSISH systems).
GLOB_TILDE
Expand patterns that start with '~' to user name home directories.
GLOB_CSH
For convenience, GLOB_CSH is a synonym for GLOB_BRACE GLOB_NOMAGIC GLOB_QUOTE GLOB_TILDE.
The POSIX provided GLOB_APPEND, GLOB_DOOFFS, and the FreeBSD extensions GLOB_ALTDIRFUNC, and GLOB_MAGCHAR flags have not been implemented in the Perl version because they involve more complex interaction with the underlying C structures.
DIAGNOSTICS
bsd_glob() returns a list of matching paths, possibly zero length. If an error occurred, &File::Glob::GLOB_ERROR will be non-zero and $! will be set. &File::Glob::GLOB_ERROR is guaranteed to be zero if no error occurred, or one of the following values otherwise:
GLOB_NOSPACE
An attempt to allocate memory failed.
GLOB_ABEND
The glob was stopped because an error was encountered.
In the case where bsd_glob() has found some matching paths, but is interrupted by an error, it will return a list of filenames and set &File::Glob::ERROR.
Note that bsd_glob() deviates from POSIX and FreeBSD glob(3) behaviour by not considering ENOENT and ENOTDIR as errors - bsd_glob() will continue processing despite those errors, unless the GLOB_ERR flag is set.
Be aware that all filenames returned from File::Glob are tainted.
NOTES
If you want to use multiple patterns, e.g. bsd_glob "a* b*", you should probably throw them in a set as in bsd_glob "{a*,b*}". This is because the argument to bsd_glob() isn't subjected to parsing by the C shell. Remember that you can use a backslash to escape things.
On DOSISH systems, backslash is a valid directory separator character. In this case, use of backslash as a quoting character (via GLOB_QUOTE) interferes with the use of backslash as a directory separator. The best (simplest, most portable) solution is to use forward slashes for directory separators, and backslashes for quoting. However, this does not match ``normal practice'' on these systems. As a concession to user expectation, therefore, backslashes (under GLOB_QUOTE) only quote the glob metacharacters '[', ']', '{', '}', '-', '~', and backslash itself. All other backslashes are passed through unchanged.
Win32 users should use the real slash. If you really want to use backslashes, consider using Sarathy's File::DosGlob, which comes with the standard Perl distribution.

know perl better

Sorting is a commonly needed operation in all kinds of programs. Luckily, for us perl programmers, perl provides a very simple yet extremely powerful mechanism to accomplish any sort you might think of. This article is about teaching the novice programmer how to sort lists of things, while showing to the more experienced folks certain techniques and ideas that could be new to them if they are migrating from a different language.
Moving to the meat of the matter staight away, we'll start from talking about comparison. Obviously, in order to put a list of things in order, you'll have to first define that order. Order is defined by how things compare to each other. If I give you two items from the list, can you tell me which one is bigger / better / nicer / sexier ... [insert you favourite adjective here] than the other? Or tell me thet they are both of equal order? Well, that's just about it! If you give me a list of items and promise me that you can answer this question for any pair of them, I can make a sorted list of them. All I have to do is take all possible pairs and ask you "how do these two compare?" and arrange them accordingly to finally come up with a sorted list. Actually there are even smarter ways to do it, minimising the amount of comparisons needed, but that is not an issue here, as we will see soon that perl performs that task for us, and we trust perl that it uses the least expensive method.
Now, the issue in question being comparison, I assume you must be familiar with all (or at least most) of perl's comparison operators. There's a list of them:
Numbers
Strings
<
lt
>
gt
<=
le
>=
gr
==
eq
<=>
cmp
Now the first five rows should be ok, they're just like math. But what are the <=> and cmp operators? Basically, the expression $a <=> $b (or $a cmp $b for strings) returns one of the values 1, 0, -1 if $a is, respectively, larger, equal or lower than $b. (see table below)
Relation of $a and $b
Value Returned by $a <=>
$a greater than $b
1
$a equal to $b
0
$a less than $b
-1
Does that ring a bell? Coming to think of it, the <=> and cmp operators actually provide the answer to the question we were investigating earlier when we talked about how to sort by using a comparative criterion. So, if we already have an operator that answers this question ("how do two items compare?") all we need is a function that will take a list of items and perform the necessary comparisons to arrive at a sorted list. And guess what? That's exactly what perl's sort operator does. So, if you have an unsorted list @not_sorted and want to created a sorted list @sorted, you just say:
@sorted = sort { $a <=> $b } @not_sorted # numerical sort or @sorted = sort { $a cmp $b } @not_sorted # ASCII-betical sort or better @sorted = sort { lc($a) cmp lc($b) } @not_sorted # alphabetical sort
Looking at the sort function, we notice that it is exactly as we described it in words, earlier in this article. Perl needs just two things: the list of items to sort, and a subroutine that answers the question "how do these two compare?" for any pair of items in the list. Perl puts the two items it want to compare int the special variables $a and $b and your function is responsible to give a return value that corresponds to the existing relationship of the two, as shown in the table shown earlier.
Here, in this simple example, we used perl's built-in comparison operators that work for numerical and alphabetical (not realy... to be correct it is ASCII order) sorting. Of course, you can roll your own comparison function to create sorts for any kind of ordering you wish to have. Before you start coding your own functions, take a look to the following examples:
Get a list of hash keys sorted by value.@sorted = sort { $hash{$a} cmp $hash{$b} } keys %hash;
Get a reverse sort of a list.@sorted = sort { $b cmp $a } @list; Which can also be done with @sorted = reverse sort { $a cmp $b } @list;
Get an alphabetical sort of words, but make 'aardvark' always come last.(Now, why you would want to do that is another question...)
@sorted = sort { if ($a eq 'aardvark') { return -1; } elsif ($b eq 'aardvark') { return 1; } else { return $a cmp $b; } } @words;

Monday, July 30, 2007

Dynamic Pages with mod_perl

Basic Mysql Connection/queriesmy $conn = DBI->connect("dbi:mysql:database=;host=;user=;password=")
die ("Couldn't connect to database. ".$DBI::errstr);
my $sql = qq{

};

my $stmt = $conn->prepare($sql);
$stmt->execute();

my $dataRef = $stmt->fetchall_hashref();
if (($dataRef->{''}->{''}) =~ /enum($.*$)/)
{
. . .
}
$conn->disconnect();

Scanning for Status CodesExample Usage

Script I.1.5: Scanning for Status CodesExample Usage
find_status.pl -t10 200 ~www/logs/access_log
TOP 10 URLS/HOSTS WITH STATUS CODE 200:
REQUESTS URL/HOST -------- -------- 1845 /www/wilogo.gif 1597 /cgi-bin/contig/sts_by_name?database=release 1582 /WWW/faqs/www-security-faq.html 1263 /icons/caution.xbm 930 / 886 /ftp/pub/software/WWW/cgi_docs.html 773 /cgi-bin/contig/phys_map 713 /icons/dna.gif 686 /WWW/pics/small_awlogo.gif
Some Useful Status Codes
Code Message Description
200 OK The URL was found. Its contents follows.301 Moved The URL has permanently moved to a new location.302 Found The URL can be temporarily found at a new location.304 Not Modified The URL has not been modified since the indicated date.400 Bad Request Syntax error in the request.401 Unauthorized Used in authorization schemes.403 Forbidden This URL is forbidden, and authorization won't help.404 Not Found It isn't here.500 Internal Error The server encountered an unexpected error.
Source Code
#!/usr/local/bin/perl # File: find_status.pl require "getopts.pl"; &Getopts('L:t:h') die < ... Scan Web server log files and list a summary of URLs whose requests had the one of the indicated status codes. Options: -L Ignore local hosts matching this domain -t Print top integer URLS/HOSTS [10] -h Sort by host rather than URL USAGE ; if ($opt_L) { $opt_L=~s/\./\\./g; $IGNORE = "(^[^.]+$opt_L)\$"; } $TOP=$opt_t 10; while (@ARGV) { last unless $ARGV[0]=~/^\d+$/; $CODES{shift @ARGV}++; } while (<>) { ($host,$rfc931,$user,$date,$request,$URL,$status,$bytes) = /^(\S+) (\S+) (\S+) \[([^]]+)\] "(\w+) (\S+).*" (\d+) (\S+)/; next unless $CODES{$status}; next if $IGNORE && $host=~/$IGNORE/io; $info = $opt_h ? $host : $URL; $found{$status}->{$info}++; } foreach $status (sort {$a<=>$b;} sort keys %CODES) { $info = $found{$status}; $count = $TOP; foreach $i (sort {$info->{$b} <=> $info->{$a};} keys %{$info}) { write; last unless --$count; } $- = 0; # force a new top-of-report } format STDOUT_TOP= TOP @## URLS/HOSTS WITH STATUS CODE @##: $TOP, $status REQUESTS URL/HOST -------- -------- . format STDOUT= @##### @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $info->{$i},$i .

$_

Basic Rotation Script
#!/usr/local/bin/perl
$LOGPATH='/usr/local/apache/logs';
@LOGNAMES=('access_log','error_log','referer_log','agent_log');
$PIDFILE = 'httpd.pid';
$MAXCYCLE = 4;
chdir $LOGPATH; # Change to the log directory
foreach $filename (@LOGNAMES) {
for (my $s=$MAXCYCLE; $s >= 0; $s-- ) {
$oldname = $s ? "$filename.$s" : $filename;
$newname = join(".",$filename,$s+1);
rename $oldname,$newname if -e $oldname;
}
}
kill 'HUP',`cat $PIDFILE`;

Perl note