Log checking with GREP and AWK

Post Written by
Ivan Dabić
Last modified on June 30th, 2020 at 10:42 am

Most of the time when checking logs we’re presented with a lot of data we may necessarily not need, or we have to manipulate it in some way or present it to a client in a specific way. All of that can be done using a scripting language or utilizing AWK right from the terminal for relatively simple and/or short scripts. We’ll be using examples with real logs just to illustrate usable cases with AWK. First, let’s get some logs:

$] grep n3i9g2a7 exlogs.log
133.121.23.11  2017-05-04 15:39:14 n3i9g2a7 cds http://cds.n3i9g2a7.hwcdn.net/ curl/7.51.0 TLB 0 - 81.111.60.15:80 - 0ms 1ms - 0ms 0ms 5395e28849d511ed:
133.121.23.11  2017-05-04 15:41:35 n3i9g2a7 cds http://cds.n3i9g2a7.hwcdn.net/ curl/7.51.0 TLB 0 - 81.111.60.15:80 - 0ms 1ms - 1ms 0ms 056020c9011c7647:
133.121.23.11  2017-05-04 15:41:35 n3i9g2a7 cds http://cds.n3i9g2a7.hwcdn.net/ curl/7.51.0 TLB 0 - 81.111.60.15:80 - 0ms 1ms - 1ms 0ms 0560f8c9011d6947:
133.121.23.11  2017-05-04 15:42:26 n3i9g2a7 cds http://cds.n3i9g2a7.hwcdn.net/ curl/7.51.0 TLB 0 - 81.111.60.15:80 - 0ms 4ms - 2ms 0ms 3ee2bbac6df77d36:

Now, the way AWK works, it would be useful to get information on the field names that will tell us the purpose of each one of those fields:

$] head -n 1 exlogs.log
#Client IP Date Time CustHash Product URL User-Agent TLB/HTTP Status Code Bytes Sent Referer Selected Edge TLB Code Processing Time Request Time SSL Proxy IP TLB Time IPVS Switch Time Comment

Let’s say we’d like to display only the Client IP field. Note that the counting starts from 1, not 0, which will be useful when using AWK commands.

$] zgrep n3i9g2a7 exlogs.log.2017-05-04-1500.gz | awk -F '\t' '{print $1}'
133.121.23.11
133.121.23.11
133.121.23.11
133.121.23.11

We’re piping the AWK command after filtering the logs for the keyword we were searching. The -F parameter defines which delimiter we’ll be using, the default one is whitespace, but in our case, the log fields are split by tab so we’re using \t as a parameter to specify that our logs are split by tab and not by space. The ‘{}’ is the actions part where we’ll define what AWK will be doing to the text we’ve piped through. We can use ‘BEGIN {} {} END {}’. BEGIN and END indicate what we’ll be able to do in our first and last iteration. If you check the result of the previous CURL, we have copies of the same IP which we don’t need. In our case we only have four, but imagine thousands of results in large scale logs, we wouldn’t want to count them one by one. To check how many times the client's IP is occurring we can simply use the uniq command which will tell us how many unique occurrences of a specific resources occurred:

$] zgrep n3i9g2a7 exlogs.log.2017-05-04-1500.gz | awk -F '\t' '{print $1}' | uniq -c
      4 133.121.23.11

If we’d like to format the data to display only the date, time and the Product URL and split each of them in a new row, we can do the following:

$] zgrep n3i9g2a7 doppler.log.2017-05-04-1500.gz | awk -F '\t' '{OFS="\n"; print "Date: "$2, "Time: "$3, "URL: "$6}'
Date: 2017-05-04
Time: 15:39:14
URL: http://cds.n3i9g2a7.hwcdn.net/
Date: 2017-05-04
Time: 15:41:35
URL: http://cds.n3i9g2a7.hwcdn.net/
Date: 2017-05-04
Time: 15:41:35
URL: http://cds.n3i9g2a7.hwcdn.net/
Date: 2017-05-04
Time: 15:42:26
URL: http://cds.n3i9g2a7.hwcdn.net/

The only difference here is the OFS command which specifies the Offset that will be used when writing output, and we’ve specified that we’d like each parameter to be printed in a new row. We can also manipulate data, for example, calculate the average Request Time (In order to calculate the request time we need to sum up all of the request times and divide by the number of requests):

$] zgrep n3i9g2a7 exlogs.log.2017-05-04-1500.gz | awk -F '\t' '{sum += $14} END{ if (NR > 0)  print sum / NR " ms"}'
1.75 ms

As we’ve mentioned before, we can use the END to specify an action that will be done on the last iteration. The if the statement is a failsafe to ensure we’re not dividing by zero. The NR is a built-in variable that specifies the number of rows, in our case that was 4. So, for each request, a row is counted. We start our AWK with the standard -F command to specify that we’re using tab instead of space in our fields. The iterable action is, to sum up, all of the Request times and on the last iteration divide by the number of rows after checking that NR is different than zero. Using these simple rules and combining them allows us to modify, sum, arrange, and display very complex data on a very large scale.

Contact Us

Fill out the enquiry form and we'll get back to you as soon as possible.