R で、Apacheの処理時間をリクエスト毎にプロットしたいと思ったのだ その1 データ準備編

estis2013/05/26 (日) 11:40 に投稿

Apacheのログフォーマットが、
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D"
の時。

ログの例

192.168.24.51 - - [26/May/2013:10:55:49 +0900] "GET /R/ HTTP/1.1" 200 901 "http://192.168.24.85/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 2063
192.168.24.51 - - [26/May/2013:10:55:49 +0900] "GET /icons/blank.gif HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 624
192.168.24.51 - - [26/May/2013:10:55:49 +0900] "GET /icons/layout.gif HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 612
192.168.24.51 - - [26/May/2013:10:55:49 +0900] "GET /icons/back.gif HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 551
192.168.24.51 - - [26/May/2013:10:55:50 +0900] "GET /R/ HTTP/1.1" 200 901 "http://192.168.24.85/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 1577
192.168.24.51 - - [26/May/2013:10:55:50 +0900] "GET /icons/layout.gif HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 599
192.168.24.51 - - [26/May/2013:10:55:50 +0900] "GET /icons/blank.gif HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 553
192.168.24.51 - - [26/May/2013:10:55:50 +0900] "GET /icons/back.gif HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 555
192.168.24.51 - - [26/May/2013:10:55:52 +0900] "GET /R/Rplots.pdf HTTP/1.1" 304 - "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/536.29.13 (KHTML, like Gecko) Version/6.0.4 Safari/536.29.13" 286
192.168.24.51 - - [26/May/2013:10:56:57 +0900] "GET /R/Rplots.pdf HTTP/1.1" 200 4548 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/535.7 (KHTML, like Gecko) RockMelt/0.16.91.483 Chrome/16.0.912.77 Safari/535.7" 320
192.168.24.51 - - [26/May/2013:10:57:13 +0900] "GET /R/ HTTP/1.1" 200 901 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/535.7 (KHTML, like Gecko) RockMelt/0.16.91.483 Chrome/16.0.912.77 Safari/535.7" 857
192.168.24.51 - - [26/May/2013:10:57:13 +0900] "GET /icons/blank.gif HTTP/1.1" 200 148 "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/535.7 (KHTML, like Gecko) RockMelt/0.16.91.483 Chrome/16.0.912.77 Safari/535.7" 304
192.168.24.51 - - [26/May/2013:10:57:13 +0900] "GET /icons/back.gif HTTP/1.1" 200 216 "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/535.7 (KHTML, like Gecko) RockMelt/0.16.91.483 Chrome/16.0.912.77 Safari/535.7" 245
192.168.24.51 - - [26/May/2013:10:57:13 +0900] "GET /icons/layout.gif HTTP/1.1" 200 276 "http://192.168.24.85/R/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/535.7 (KHTML, like Gecko) RockMelt/0.16.91.483 Chrome/16.0.912.77 Safari/535.7" 205
192.168.24.51 - - [26/May/2013:10:57:13 +0900] "GET /favicon.ico HTTP/1.1" 404 288 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/535.7 (KHTML, like Gecko) RockMelt/0.16.91.483 Chrome/16.0.912.77 Safari/535.7" 269

リクエスト各項目毎の
アクセス時間と処理時間のデータを集計したファイルを作成する。
(ログは、./access_log。リクエスト内容に含まれる「/」は「%」に置き換える。)

awk '{split($4, time, ":");data[$7]=data[$7]"\n"time[2]":"time[3]":"time[4]" "$NF}END{for(i in data){j=i; gsub(/\//, "%", i); print j,data[j] > i".log"} }' access_log
を実行すると

例えば
/R/
10:55:49 2063
10:55:50 1577
10:57:13 857

/R/Rplots.pdf
10:55:52 286
10:56:57 320
と記載のある%R%.log や %R%Rplots.pdf.log ができる。

Comments

Comment

R へのデータ読み込み時にread.table するので
ヘッダーが2カラム分ある方がいろいろ都合よさげなので

awk '{split($4, time, ":");data[$7]=data[$7]"\n"time[2]":"time[3]":"time[4]" "$NF}END{for(i in data){j=i; gsub(/\//, "%", i); print "Time "j,data[j] > i".log"} }' access_log

として、

Time /R/
10:55:49 2063
10:55:50 1577
10:57:13 857

こんな風な内容にすることにした。