Microwave Biscuit

Perl script to parse a log

Posted in Linux, ubuntu by microwavebiscuit on February 22, 2007

At work I do quite a bit of database stuff. One of the databases I use puts out an error file for unmatched items. The problem is, the unmatched item could appear in 200, 2,000 or a million records and sorting through it to figure out the real issues can be a challenge.

Here’s an example of the error log:

\\ Member foo Not Found In Database
“US$” “Working Scenario” “FY06” “PC” “foo” “Jan” 100

\\ Member bar Not Found In Database
“US$” “Working Scenario” “FY06” “PC” “bar” “Jan” 100

\\ Member foo bar Not Found In Database
“US$” “Working Scenario” “FY06” “PC” “foo bar” “Jan” 100

\\ Member bar foo Not Found In Database
“US$” “Working Scenario” “FY06” “PC” “bar foo” “Jan” 100

So, the file is organized with \\ indicating the beginning of the message, the second row is the entire record, then there is a new line. In the above example, I have only 4 error records and each one contains a unique item however in reality I could have 500 or more records for “foo” which would have 500 messages in this log.

Here’s the Perl script that I wrote to deal with it so that I get a unique, sorted list of items that aren’t found. Note: many of the items in this script are “hard coded” and if I were to make this a more general script I would accept the file name at runtime, etc. I also know that I’m probably using too many temp files here and I could probably go from the input file to an output file without the intermediary temp file, however when I did it like that I got weird results. It works perfectly for my task as is so I’m good with it.

# dataerrs.pl
use warnings;
use strict;
my %seen;
my @uniq;
my @list;
my $item;
# Find Errors in Error Log
open FILE, "dataload.err" || die "Can't find file";
open(TEMPFL, '>/home/me/dataerrs.txt') || die "Can't open output file";
$_ = ;
s/\r\n/\n/g; # replace Windows new line with Unix just in case
while () {
if (/^\\\\/) { # row needs to begin with \\
s/\\\\//; # get rid of \\
s/Member //; # get rid of "Member"
s/Not Found In Database//; # get rid of "Not Found in Database"
print TEMPFL "$_\n" # All that should be left is item not found - write it to a temp file
close TEMPFL;
open INFILE, "dataerrs.txt" || die "Can't find file"; #open temp file
open(OUTFILE, '>/home/me/finalerrs.txt') || die "Can't open output file"; # open final output file
%seen = ();
foreach $item (@list) {
push (@uniq, $item) unless $seen{$item}++;
my @sorted = sort {lc($a) cmp lc($b) } @uniq;
print OUTFILE @sorted; #print unique and sorted records to the outfile
close OUTFILE;

In this example, the result would be a file “finalerrs.txt” with the following in it:

foo bar
bar foo

I know that this particular script is only really suited for the log file that I am dealing with but portions of it and the concepts are certainly reusable.

3 Responses

Subscribe to comments with RSS.

  1. Darshit said, on September 8, 2007 at 1:25 am


    It’s very good logic developed by you.
    Can you help me out in my problem for perl script?

    I have to parse one text file, in which there are so many parameters i need to capture.As i am knew to perl i am not able to do it. Req. ur help urgently.

    Thanks in advance.


  2. praveenkumar nayak said, on September 15, 2008 at 12:51 am

    You can use sed and awk for that which will do the needed.


  3. florian said, on October 31, 2009 at 11:20 am

    nice script. if you need inter-line context sed and awk won’t work.

    but what about:
    grep -v “^\\\\” | awk ‘{print $5}’ | sed s/([^”])/$1/


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: