結局、これがお手軽。
apacheのログ(access_log)をパースするcpan moduleは多くあります
Regexp::Log::Common | Barbie / Regexp-Log-Common - search.cpan.org |
---|---|
Parse::AccessLogEntry | Marc Slagle / Parse-AccessLogEntry - search.cpan.org |
Apache::ParseLog | Akira Hangai / Apache-ParseLog - search.cpan.org |
access_log程度のパースの例
($host, $ident, $user, $time, $request, $status, $bytes, $referer, $agent) = $line =~ /^([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "(.*?)" ([^ ]*) ([^ ]*) "(.*?)" "(.*?)"/o; # http://pmakino.jp/tdiary/20070907.html
sample script
#!/usr/local/bin/perl use strict; use CGI; use Data::Dumper; #refer to http://httpd.apache.org/docs/2.2/mod/mod_log_config.html my $REGEXP = join(' ', '^([^ ]*) ([^ ]*) ([^ ]*) \[([^]]*)\] "([^ ]*)(?: *([^ ]*)', '*([^ ]*))?" ([^ ]*) ([^ ]*) "(.*?)" "(.*?)" (\d+)'); main(@ARGV); sub main { my ($log_file) = @_; open(my $fh ,$log_file) or die "can't open $log_file $!"; my $summary = {}; while (my $line = <$fh>){ chomp($line); #parse log my ($host, $ident, $user, $datetime, $method, $resource, $proto, $status, $bytes, $referer, $agent, $time) = $line =~ /$REGEXP/o; #### something TODO } close($fh) or die "can't close $log_file $!"; }
access_logのlog format仕様は次のurlに記載されています
http://httpd.apache.org/docs/2.2/mod/mod_log_config.html
時間のparseには DateTime::Format::HTTP
↓こちらもよく使います
http://search.cpan.org/perldoc?DateTime%3A%3AFormat%3A%3AHTTP