IIS Logs / Log Parser Studio – Aggregated Hits per Server

Background

Our monitoring team has developed and rolled out scripts for monitoring our web farm.

And, we are getting alerts through email.

Quite a lot of emails are coming across and wanted to see if they are coming from same host or a combination of hosts.

 

Emails

Looked at the emails and they happen to be coming from same host.

And, so will have to engage our Network team and see how the Load Balancer is configured.

Is there a prospect that more traffic is being directed at the failing node?

Network Load Balancer

As we prepared to go to the Network Load Balancer team took the opportunity to take gather and query the IIS Logs, as well.

 

TroubleShooting

Log Parser Studio

Query


SELECT 
            To_String(date, 'yyyy-MM-dd') as dated

          , sc-status as status

          , sum (
                    case s-ip
                        when '10.0.4.25' then 1
                        else 0
                   end
               ) as S1

          , sum (
                    case s-ip
                        when '10.0.4.26' then 1
                        else 0
                   end
               ) as S2


          , sum (
                    case s-ip
                        when '10.0.4.27' then 1
                        else 0
                   end
               ) as S3

         , sum (
                    case s-ip
                        when '10.0.4.28' then 1
                        else 0
                   end
               ) as S4

          , min(TO_TIMESTAMP(date, time)) as tsRecordedMin


          , max(TO_TIMESTAMP(date, time)) as tsRecordedMax


FROM '[LOGFILEPATH]' 


where   (


           (

             TO_TIMESTAMP(date, time) 
                     between timestamp('2017/08/02 10:30:00', 'yyyy/MM/dd hh:mm:ss')  
                          and timestamp('2017/08/02 17:20:00', 'yyyy/MM/dd hh:mm:ss')
           )

       )

/*

	and  c-ip not in ('10.0.4.141')
	
*/

group by
         date
       , sc-status


order by
           dated 
         , status



Output

Time Range – 1 ( August 2nd 10:30 AM – 5:20 PM )

Results

Explanation
  1. It is difficult to make case that traffic is exhaustively being waded into a specific host

Time Range – 2 ( August 8th 5:13 PM – 8:40 PM )

Results

Explanation
  1. In our second time slot, 4700 records bearing HTTP 200 is right around average

Summary

At this time it is likely that the sufferance we are seeing with this specific host is not due to outside pressure, but internal to the host itself.

 

IIS – Review IISLog to track traffic within time period

Issue

We have been receiving a bunch of alerts from our monitoring tool.

Came through email, but as a loud mouth I asked the monitoring group to please send us  a tabulated summary.

 

Alert Report

Image

Explanation

  1. Again, I am like what happened for 2 hours on a specific web server.
    • On the second data row
      • How did we stay gone from 6:20 and 8:20 AM

TroubleShooting

Setup

Collected IIS Logs and trained Log Parser Studio against them.

 

Query


/*  New Query  */

SELECT TOP 10000 
            TO_TIMESTAMP(date, time) as ts
          , c-ip as ipAddress
          , cs-username as username
          , cs-uri-stem as URL
          , cs-uri-query as query
          , sc-status as status
          , time-taken as timeTaken
          , cs(User-Agent) as userAgent
          , cs(Referer) as referer

FROM '[LOGFILEPATH]'

where  TO_TIMESTAMP(date, time)
             between timestamp('2017/07/30 06:00:00', 'yyyy/MM/dd hh:mm:ss')  
             and timestamp('2017/07/30 12:00:00', 'yyyy/MM/dd hh:mm:ss')


Output

Explanation

  1. On 2017-June-30th between 6 AM and 6:13 AM, we recorded HTTP requests which came in twos
    • The first request was targeted to the home page
      • IIS returned 302
        • Redirection
    • The second request is to the /Account/LogOn page
      • Returned 200
        • 200 is OK
  2. We did not get another request till 8:18 AM
    • Again two HTTP requests
      • The first was 302
        • Re-direct
      • The redirection lead as to /Account/Logon
        • Returned 200
        • But, took a lot longer 18156 ms or 18 seconds
          • Need to come back upon validating actual measurement
  3. Things returned back to normal
    • 8:28 AM, 8:33 AM, 8:38 AM, 8:43 AM, 8:48 AM, 8:53 AM, 8:58 AM, 9:03 AM, 9:08 AM, and 9:13 AM

Summary

Traced the error back to the monitoring account being locked out during our blind two hour period.