This post provides information about techniques that you can use to schedule processes with the Sitecore ASP.NET Web Content Management System.
Introduction
There are at least three ways to schedule processes with the Sitecore .NET CMS:
- Configure agents in web.config.
- Define scheduled tasks in a database.
- Use the Windows Task Scheduler to call a Web service in a Sitecore instance.
Agents in web.config are very straightforward and by far the most common way to schedule processes in Sitecore, especially for things that run perpetually, such as administrative work.
Scheduling tasks in the database is a good solution when you need to control tasks dynamically and/or programmatically, since this approach doesn’t require that you update web.config, which would restart the ASP.NET worker process.
The Windows Task Scheduler gets around limitations to the ASP.NET architecture that make it unrealistic to expect to invoke a process at a very specific time, and the fact that the ASP.NET worker process might not be active when you want Sitecore to invoke a process.
Agents in web.config
Sitecore uses web.config to enable several default agents, and you enable additional agents that Sitecore disables by default, or create your own agents.
Each /configuration/sitecore/scheduling/agent element defines an agent. The type attribute of each <agent> element specifies the .NET class to invoke. The method attribute defines the method of the class that Sitecore will call. The interval attribute defines the minimal interval between invocations of the agent in HH:mm:ss format. A value of 00:00:00 for the interval attribute disables an agent. Comments above the agent definitions in the web.config file describe their functions.
Sitecore passes parameters to the constructor of the agent class and sets properties of that object in the same way it does for any other type defined in web.config. You can read about how to define properties and constructor arguments here.
The polling frequency determines how often Sitecore checks for agents that it needs to invoke. You can control the polling frequency by setting the value of the /configuration/sitecore/scheduling/frequency element in the web.config file (in HH:mm:ss format). I like to set the polling frequency to half of the value of the smallest value of the interval attribute of all the agents defined in the web.config file.
To create a custom agent, just create a class that implements a method, and register that class as an agent. Here’s an example class:
namespace Sitecore.Sharedsource.Tasks
{
public class LogSomethingAgent
{
public string Message
{
get;
set;
}
public void Run()
{
Sitecore.Diagnostics.Log.Info(this + " : " + this.Message, this);
}
}
}
After setting the value of the <frequency> element to one minute (00:00:30), here’s how I configured the agent in web.config for testing:
<agent type="Sitecore.Sharedsource.Tasks.LogSomething" method="Run" interval="00:01:00">
<message>Hello, World!</message>
</agent>
With the default Sitecore verbosity level, Sitecore logs the invocation of each agent. So Sitecore actually writes three lines to the log each time it invokes this agent. For example:
ManagedPoolThread #8 13:05:11 INFO Job started: Sitecore.Sharedsource.Tasks.LogSomethingAgent
ManagedPoolThread #8 13:05:11 INFO Sitecore.Sharedsource.Tasks.LogSomethingAgent : Hello, World!
ManagedPoolThread #8 13:05:11 INFO Job ended: Sitecore.Sharedsource.Tasks.LogSomethingAgent
The UrlAgent
If you find that ASP.NET inactive when you want your scheduled task to run, you can reduce the interval attribute of the UrlAgent. The goal of this agent is to keep the ASP.NET worker process alive by periodically requesting an ASP.NET page.
Sometimes the URL used is incorrect in one or two ways. This is the default configuration of the UrlAgent that I found on my system:
<agent type="Sitecore.Tasks.UrlAgent" method="Run" interval="01:00:00">
<param desc="url">/sitecore/service/keepalive.aspx</param>
<LogActivity>true</LogActivity>
</agent>
If the URL does not contain a protocol and hostname, Sitecore assumes http://127.0.0.1, which may or may not correspond to the Sitecore instance. This will most likely show up in the Sitecore log as something like the following:
ManagedPoolThread #3 10:26:51 INFO Job started: Sitecore.Tasks.UrlAgent
ManagedPoolThread #3 10:26:51 INFO Scheduling.UrlAgent started. Url: http://127.0.0.1/sitecore/service/keepalive.aspx
ManagedPoolThread #3 10:26:51 ERROR Exception in UrlAgent (url: /sitecore/service/keepalive.aspx)
Exception: System.Net.WebException
Message: The remote server returned an error: (404) Not Found.
A solution is to update the first parameter to the constructor to include the protocol and domain:
<param desc="url">http://sitename/sitecore/service/keepalive.aspx</param>
I seem to remember some old Sitecore versions had a typo in the path in this URL, which I expect would also result in 404, so check the referenced file exists, or find it and update the path. It’s probably not a problem if the file doesn’t exist, as the ASP.NET process has to be running to handle the HTTP 404 condition, but this problem will generate some noise in the log.
Scheduled Tasks
You can create items in a Sitecore database to schedule tasks. First, create a .NET class that contains the logic, then create a command definition item that references that class, and then create one or more schedule definition items to invoke that command.
To define the scheduled task logic, create a class that contains a method with the following signature:
public void MethodName(Sitecore.Data.Items.Item[] items, Sitecore.Tasks.CommandItem command, Sitecore.Tasks.ScheduleItem schedule)
The text below explains how to pass items to your command in the first parameter. The second parameter is the command definition item. The third parameter is the schedule definition item.
For example:
namespace Sitecore.Sharedsource.Tasks
{
public class LogSomethingDatabase
{
public void WriteToLogFile(
Sitecore.Data.Items.Item[] items,
Sitecore.Tasks.CommandItem command,
Sitecore.Tasks.ScheduleItem schedule)
{
if (items != null)
{
foreach(Sitecore.Data.Items.Item item in items)
{
Sitecore.Diagnostics.Log.Info(this + " : item : " + item.Paths.FullPath, this);
}
}
Sitecore.Diagnostics.Log.Info(this + " : command : " + command.InnerItem.Paths.FullPath, this);
Sitecore.Diagnostics.Log.Info(this + " : schedule : " + schedule.InnerItem.Paths.FullPath, this);
}
}
}
To define the command for the scheduled task to invoke:
- If appropriate, first select the appropriate database in the Sitecore desktop.
- In the Content Editor, navigate to the /Sitecore/System/Tasks/Commands item.
- In the Content Editor, insert a command definition item using the System/Tasks/Command data template.
- In the command definition item, in the Data section, in the Type field, enter the signature of the .NET class, such as Sitecore.Sharedsource.Tasks.LogSomethingDatabase, assembly.
- In the command definition item, in the Data section, in the Method field, enter the name of the method to invoke in that class, such as WriteToLogFile.
That was actually the easy part.
To define the schedule to invoke the command:
- If appropriate, first select the appropriate database in the Sitecore desktop.
- In the Content Editor, navigate to the /Sitecore/System/Tasks/Schedules item.
- In the Content Editor, insert a schedule definition item using the System/Tasks/Schedule data template.
- In the schedule definition item, in the Data section, in the Command field, select the command to invoke.
- In the schedule definition item, in the Data section, in the Items field, you can specify a list of items to pass to the command, separated by pipe (“|”) characters. Alternatively, you can enter a Sitecore query (without the query: prefix), but remember that the Query.MaxItems setting in the web.config file applies to this query.
- In the schedule definition item, in the Data section, in the Schedule field, enter a parameters to control the schedule, separated by piple (“|”) characters. The first parameter indicates the start date for the schedule in yyyyMMdd format. The second parameter indicates the end date for the schedule in the same format, The third parameter indicates the days of the week on which to run the task. This basically works like a bit mask, where 1=Sunday, 2=Monday, 4=Tuesday, 8=Wednesday, 16=Thursday, 32=Friday, and 64=Saturday. So Monday through Friday is 2+4+8+16+32=62, while every day is 1+2+4+8+16+32+64=127. These values come from the Sitecore.DaysOfWeek enum. The fourth parameter is the minimum interval between invocations of the task in HH:mm:ss format.
- In the schedule definition item, in the Data section, in the Last Run field, you can enter a date and time to control when Sitecore thinks it last invoked the command due to the existence of this schedule definition item, in the ISO format Sitecore uses for all dates (yyyyMMddTHHmmss). Sitecore automatically updates this field after invoking the command to control the processing schedule.
- In the schedule definition item, in the Data section, select the Async checkbox to cause the task to run asynchronously.
- In the schedule definition item, in the Data section, select the Auto Remove checkbox to cause Sitecore to remove the schedule definition item after invoking the task. This setting only comes into play after the expiration date defined in the Schedule field.
Obviously something that only a computer program could love.
I created a test Command definition item named Command and a test schedule definition item named Scheduled Task in the Master database. In my scheduled definition item, for Command, I selected my command definition item and used /sitecore|/sitecore/content|/sitecore/content/home for Items and 20000101|21000101|127|00:00:01 for Schedule, leaving everything else blank. Afterwards, I found the following in my Sitecore log:
ManagedPoolThread #19 14:08:15 INFO Job started: Sitecore.Tasks.DatabaseAgent
ManagedPoolThread #19 14:08:15 INFO Scheduling.DatabaseAgent started. Database: master
ManagedPoolThread #19 14:08:15 INFO Examining schedules (count: 1)
ManagedPoolThread #19 14:08:15 INFO Starting: Scheduled Task
ManagedPoolThread #19 14:08:15 INFO Sitecore.Sharedsource.Tasks.LogSomethingDatabase : item : /sitecore
ManagedPoolThread #19 14:08:15 INFO Sitecore.Sharedsource.Tasks.LogSomethingDatabase : item : /sitecore/content
ManagedPoolThread #19 14:08:15 INFO Sitecore.Sharedsource.Tasks.LogSomethingDatabase : item : /sitecore/content/Home
ManagedPoolThread #19 14:08:15 INFO Sitecore.Sharedsource.Tasks.LogSomethingDatabase : command : /sitecore/system/Tasks/Commands/Command
ManagedPoolThread #19 14:08:15 INFO Sitecore.Sharedsource.Tasks.LogSomethingDatabase : schedule : /sitecore/system/Tasks/Schedules/Scheduled Task
ManagedPoolThread #19 14:08:15 INFO Ended: Scheduled Task
ManagedPoolThread #19 14:08:15 INFO Job ended: Sitecore.Tasks.DatabaseAgent (units processed: 1)
The default web.config file includes two agents that check for scheduled tasks to invoke as defined in one of the Sitecore databases. These agents define parameters to check for tasks in the Master and the Core database. By default, there don’t seem to be any scheduled task definitions in either database. You can disable, update or copy these agents and change parameters to check for tasks scheduled in a publishing target database (such as the Web database) or any other Sitecore database.
Troubleshooting Scheduled Processes
The ASP.NET worker process must be active for Sitecore to poll for scheduled processes to invoke. If a scheduled process does not run, you have likely updated web.config or otherwise caused the ASP.NET process to terminate. If the ASP.NET worker process is not already active, you can bring it up by requesting any ASP.NET resource.
Another reason that a scheduled process might not run is that you have not set the polling frequency to a small enough value.
Windows Task Scheduler, Web Services, and Scheduled Publication
Sitecore customers frequently want to schedule publication. You can read about how to set the publishing schedule for an item and each version of an item in the Content Cookbook. The next publishing operation after the item or version publication date will publish the item or version. You can enable the PublishAgent in the web.config file to schedule publishing operations at some interval, but as I wrote earlier, it’s not feasible to get ASP.NET to do something at a very specific time. That does not mean that you should configure the PublishingAgent to run extremely frequently. Maybe you could configure the database agent to run frequently, and create task schedules to publish when users update publication date values in items and versions.
Publishing evicts entries from some caches, and clears other caches. Excessively frequent publication can have a negative impact on performance. Alex Shyba blogged about how to publish at a specific time, but the code seems to have disappeared. Luckily I had kept a copy and made it available through this forum post on the Sitecore Developer Network, but I can’t vouch for the code quality (though I have to assume that Alex’s code is better than my own would be). I think his solution uses the Windows Task Scheduler to invoke a command line tool that calls a Web service in the Sitecore instance to do the processing, which seems relatively straightforward. In addition to the advantage of scheduling a task for a very specific time, the Windows Task Scheduler does not depend on the ASP.NET worker process (in fact, the Web service call will bring up ASP.NET if it is not already active). The downside is that the Windows Task Scheduler is not integrated into Sitecore, so you need to configure it separately. Maybe you could hook into Sitecore events or elsewhere to schedule the Windows task.
Conclusion
This is a complex topic and a long post, so I probably left some details out and got some others wrong, in which case please comment below. The options and possibilities for scheduling processes are just one factor that make Sitecore the best .NET CMS available!
Slashdot/SDTimes: "Midori" Concepts Materialize In .NET
