Return to doc.sitecore.com

  Troubleshooting
Prev

1.  Identifying Slow or Leaking Pages

This section describes how to use global.asax file to identify requested pages, where certain thresholds are exceeded. Here we concentrate on:

  1. Pages taking more than X milliseconds to execute.
  2. Pages leaking more than X Kb data.

The former is useful if your website has an unusually high CPU load and or database access, but initial profiling could not reveal any misbehavior on the most obvious pages. By tracing all pages taking more than a specified timeframe to render we will get a good starting point for further investigations. For instance, it could be the Sitemap page that uses a sitemap.xslt with a "//" expression in its xpath.

The latter is useful if you experience memory leaks and are looking for single pages that could leak memory.

In both cases dtSearch or a similar tool can be used to trigger requests to all pages.

Getting Started

bool logmemory = true;
bool logtime = true;

The Time Threshold

Set logtime = true, and

int timeThreshold = <Your_Preferred_Threshold in Milliseconds>; // ms

Save and close.

You should now start getting entries in the log file like:

3064 14:47:53 INFO  Page exceeded time threshold (ms): 1172,1225 ( Url: /sitecore/content/home/default)

The Memory Threshold

Set logmemory = true, and

int memoryThreshold = <YourPreferredThreshold in Kb>; // ms

Save and close. 

You should now start getting entries in the log file like:

3064 14:47:53 INFO  Page exceeded memory threshold (Kb): 172 ( Url: /sitecore/content/home/default)

Important note: In this mode, the cache will be flushed on each request in order to ensure that only true (see the note below) memory leaks are logged. However this will have considerably negative effect on your website performance.

Note : Sitecore uses the ASP.NET cache to store Sitecore items, which will cause an increase in memory consumption. This is however not a true memory leak as the memory will be released when the item is removed from cache eventually.

You can comment out the lines that flush the cache in order to improve performance but you should now be aware that the information logged will be more blurred.

If a leaking page is found, repeatedly refreshing this page should now lead to a noticeable increase in memory usage.

Download the source code for this article.

2.  Scheduled Publications

The frequency of scheduled publication can be manipulated but note that publishing clears the HTML cache and therefore can interfere with performance. Review http://sdn.sitecore.net/SDN5/FAQ/Administration/Scheduled%20tasks.html and manipulate the Sitecore.Tasks.PublishAgent which is by default disabled by comments /web.config.

3.  Digging gold in IIS logfiles

This section describes how to use the free tool from Microsoft "Log Parser 2.2" with IIS logfiles.

Have you ever experienced that CPU usage skyrockets on your server, or at least somehow experieced long response times from Sitecore. Then you have likely ended up with a question like this:

     I wonder what pages on my site (be it on the website or in the client) are consuming all the resources?

Well, elsewhere here on SDN5 you can read how global.asax can be used to log all files taking longer than x milliseconds. However it doesn't give you the full picture, and you have to wait until enough sample data have been collected. You don't want to wait, you want answers NOW? Well if the column "Time taken" is added to your IIS logfiles, you can get these answers. If not, hurry up and add it.

So you open your IIS logfile in Excel and do some sorting, filtering and grouping to show these data, right? Yes, if you have less than 65.000 entries and enough patience, maybe you do, but it doesn't exactly work like a dream. If you (hopefully) are familiar with SQL, you are likely to wine something like "Sigh, why can't I just do some SUM's, COUNT's and AVERAGE's in sql instead of this nightmare Excel stuff?". Now there is a solution. Try e.g.:

  1. Just Download LogParser 2.2 from http://www.microsoft.com/downloads/details.html?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en
  2. Install it on your webserver
  3. From the logfolder on your website, run the following command: SELECT TOP 10 COUNT(*), AVG(time-taken), SUM(time-taken), cs-uri-stem INTO report_sum.txt FROM test.log WHERE EXTRACT_EXTENSION(cs-uri-stem)='aspx' GROUP BY cs-uri-stem ORDER BY sum(time-taken) DESC
  4. Now view the top 10 urls that spent the most time being rendered in the outputfile report_sum.txt

It works like a charm - and fast. When opening the report_sum.txt afterwards, hey presto: a list as promised, showing that Default.html (frontpage of website) topped the list with 1100+ seconds totally. Also it tells me that my Maillist.html averages at a high 3 seconds per request.

COUNT(ALL *) AVG(ALL time-taken) SUM(ALL time-taken) cs-uri-stem
------------ ------------------- ------------------- ----------------
976          1139                1111939             /Default.html
100          2959                295996              /Aktuelt/Postlister.html
146          989                 144425              /Job.html
20           4663                93269               /sitecore/.../Upload/Upload2.html
...

I must admit this is nothing less than a Super Cool Tool that will allow you to easily identify problem pages in your installation.  The way these SQL statements can be combined is endless, and it can even be used with Sitecore 5 logfiles.

Another usefull query I would like to show is the following that will give you statistics per hour from your logfile.

"c:\program files\log parser 2.2\logparser" "SELECT date, QUANTIZE(time,3600) as interval, COUNT(*), AVG(time-taken), SUM(time-taken), DIV(TO_REAL(COUNT(*)), 3600) AS Requests/sec INTO 'report_total_load_per_minute.txt' FROM %LOGFILE% GROUP BY date, interval ORDER BY date, interval" -o:NAT

As can be seen below, this query will e.g. tell you that there were 111K+ requests between 12a.m. and 1p.m., average time was 144 ms, and there was a whopping 31 Requests / second during this period. The total time taken is 16097 seconds or app. 4400ms per second. Now there are only 1000 ms on a real second, but as these data were taken from a 4 processor server you would multiply this to get 4000 expected CPU milliseconds per second.

date       interval COUNT(ALL *) AVG(ALL time-taken) SUM(ALL time-taken) Requests/sec
---------- -------- ------------ ------------------- ------------------- ------------
2006-07-03 12:00:00 111391       144                 16086950            30.941944
2006-07-03 13:00:00 133281       106                 14210042            37.022500
2006-07-03 14:00:00 50872        150                 7678055             14.131111
2006-07-03 15:00:00 29795        198                 5921612             8.276389
2006-07-03 16:00:00 10403        435                 4526177             2.889722

I would like to introduce a key value called "CPU-Utilization" that is to be measured the following way:

100 pct * Avg(time-taken) * (Requests / second) / (#CPU * 1000).

From the above numbers, the CPU-Utilization would now be calculated like this: 144ms * 31 ReqPerSec) / (4 * 1000) = 112%.

What does this number tell you? Well you would not expect to utilize your server above 100%, and a number above this can only mean that requests have been queued or idle within the server before a response was delivered. I would say that any CPU-utilization exceeding 80% is not sound over longer periods of time, and in these cases you should seriously consider if the workload can be decreased or the hardware should be added to the installation. The limit of 80% is taken out of the blue and may well be set too high. At a CPU-U rate of 112%, something is terribly wrong and your users are probably suffering badly.

Summary:

This super cool tool allows us to easily determine the workload of the server back in time as well as pinpoint urls that are very resource demanding. Standard .bat files can be created, which produces detailed reports on the basis of one or more logfiles in seconds. If your customer or IT department insists that their single CPU server is more than enough hardware to do the job, this may just be the evidence needed to convince them otherwise.


Prev