Visitor identification using the robot detection component

Abstract

How the robot detection component identifies human behavior and robots.

The Sitecore robot detection component detects robots and unwanted interactions from automated browsers and robots. The component is enabled by default and consists of a pipeline processor, an event handler, a JavaScript file (that detects human behavior), and several robot detection classes.

Every time a page is requested on your website, the following pipeline processor is activated:

  • Sitecore.Analytics.RobotDetection.Pipeline.InitializeTracker.Robots

The processor first checks that robot detection is enabled by checking the value of the Analytics.AutoDetectBots setting in the Sitecore.Analytics.Tracking.config file. You can disable the component by changing this setting to false.

The Sitecore.Analytics.Pipelines.ClassificationStrategy.ContactClassification class contains the classification constants and helper methods. The following helper methods take the contact classification as a parameter and return a Boolean value indicating whether the contact is a human or a robot.

  • IsHuman

  • IsRobot

  • IsAutoDetectedRobot

The SC_ANALYTICS_GLOBAL_COOKIE persistent session cookie contains the IsClassificationGuessed field, which is set to true or false.

When a new visitor comes to the website, the IsClassificationGuessed field is set to false by default because the classification of the visitor has not yet been determined. At this stage, the visitor could be a human or a robot. When the visitor classification has been determined, this field is set to true.

When a visitor views a page on your website, the VisitorIdentification control is rendered on the page. It first checks whether the VisitorIdentification.ascx control is present in the layouts/system folder. If the control is present, the content of the VisitorIdentification.ascx user control is rendered on the page, and it:

  • Saves the current UTC time to the VICurrentDateTime meta tag.

  • Adds a reference to the layouts\system\VisitorIdentification.js to load the visitor identification JavaScript file.

When a visitor views a page, the browser loads the VisitorIdentification.js JavaScript file. Robots do not usually load CSS or JavaScript files.

There are two events that the script subscribes to:

  • OnMouseMove event – triggered when a computer mouse is moved.

  • OnTouchStart event – triggered when the screen on a tablet or mobile phone is touched.

If the visitor moves the computer mouse or touches the screen of a tablet or mobile phone, code is executed that requests the VisitorIdentificationCSS.aspx page. A URL to this page is created (not a direct request). Unlike a robot, a human visitor will attempt to load this CSS stylesheet into a browser. When this happens the VisitorIdentificationCSS.aspx page is requested, which generates an empty style sheet. This page also contains code that is executed every time a request for the page is made.

If a human visitor has caused the page to run, the code in this page makes the following changes:

  • The Visitor classification code is set to 0, which means the visitor is classified as human.

     Current.Session.SetClassification(0, 0, true); - 
    
  • The IsClassificationGuessed boolean value of the cookie set to true. This means that the visitor has now been classified so the robot detection logic no longer needs to be executed

    cookie.IsClassificationGuessed = true;
    
  • The ASP.NET session timeout setting is reset back to the default for human visitors (20 minutes).

The final robot detection measure is a timeout setting comparison. Sitecore schedules the execution of the JavaScript function to take place after 30 seconds (the default setting):

timeoutSleep (30000, placeCheckerRequest);

The JavaScript function reads the UTC time from the VICurrentDateTime meta tag and makes a request to the VIChecker.aspx page sending the retrieved time in the tstamp parameter.

The VIChecker.aspx page checks the difference between the current UTC time and the time in the tstamp parameter. If the visitor is a human visitor, this code is executed 30 seconds after the page is loaded. Robots can execute JavaScript quicker than 30 seconds, so if the request is executed in under 30 seconds, the contact is detected as a robot. As a result, the visitor classification is set back to a robot.

Tracker.Current.Session.SetClassification(925, 925. True);

In earlier robot detection logic, if a visitor made a request to download a media item, then the visitor was identified as human. In the xDB robot detection component, this approach is not enough.

In the Sitecore.Analytics.Tracking.RobotDetection.config file, the following event handler enforces this:

  • Sitecore.Analytics.RobotDetection.Media.MediaRequestEventHandler

When this event handler is loaded, it processes the tracking field of the media item but does not change the classification to human if a visitor downloads a media item.

To change the classification, you must access the session. In Sitecore, the custom media request session module (a C# class file) enables a session for requests to media items that contain something in the tracking field. If there is nothing in the tracking field, a session is not required, which in turn speeds up the processing time of the requests.

Session handling

Whenever a new request is made to the website, ASP.NET checks if a session cookie is present. If there is not, it creates a new private session to keep information about which pages have been visited. It cannot be determined upfront whether the request is from a robot or a human, so ASP.NET always has to create a session.

When Sitecore decides a visitor is a robot, it automatically reduces the session timeout to 1 minute instead of the session timeout configured for session state provider. You can change the default for the timeout of robot sessions.

The end of any session triggers the session end event. This is because the session state provider raises the session end event, and the session state provider cannot determine if the visitor is human or robot. However, by default, Sitecore does not preserve any experience data for robot sessions.