We use System Center: Data Protection Manager for backups. On February 1st of 2011, the annual backup of one of my protection groups was running. Since it was scheduled for January 1st, I was a bit confused. I opened a ticket with Microsoft support and thus began a very long adventure into the DPM scheduling agent. The end result is that the issue is a bug triggered by an interaction between DPM and SQL Server that can be triggered under specific conditions that causes the behavior above.
Modify a protection group that has a long term retention schedule that is on anything other than a weekly or monthly basis and NOT adjust the timing of those backups.
The job is recreated in the SQL Server Agent without modifying the Start Date. This will cause it to trigger inappropriately, by running when it shouldn’t as well as failing to run when it should. This is due to the fact that although the start date didn’t change, the last run date is lost. Since the jobs are actually configured to run on X date (or day of the week/month/year) every x period of time instead of exact dates, this means my January 1st run date is configured as run on the 1st of the month every twelve months. Because of this it runs on the 1st of the next month if I trigger this bug.
How can you confirm this issue is going to happen to you? The easiest way is to use a script I was provided by Microsoft Support. A copy of the script is in the zip file available here.
The first time I ran the script I got an error. I re-ran it however, and it executed properly. I was told this is a bug in the script as it is currently defined. The output will look something like this:
|Name SQL Agent||Job Definintion ID||Protection Group||Start Date||Schedule Create date/time||Last run date/time||Next run date/time||Tape_Label||Time Zone|
|GUID||GUID||PGName||06-30-2011||2011-06-30 09:09:37.130||01-21-2012 23:00:00||01-28-2012 23:00:00||Weekly||Ignore|
|GUID||GUID||PGName||07-01-2011||2011-06-30 09:09:36.173||01-01-2012 23:00:00||02-01-2012 23:00:00||Monthly||Ignore|
|GUID||GUID||PGName||01-01-2012||2011-06-30 09:09:35.133||01-01-2012 23:00:00||01-01-2013 23:00:00||Yearly||Ignore|
This is an example of one of my protection groups. Fields in italics have been altered to anonymize the data. The text in red is the one that is likely to have the issue. Should I modify this protection group at this time without entering the Modify Schedule dialog it WILL end up running a backup on the 1st of the following month since the start date is in the past.
As of now, there is no fix for this behavior. There is, however, a way to work around it. As mentioned in the Trigger section this only happens when you don’t open the modify schedule button. So what happens if you do? The Start Date is reset and the entire schedule will run as planned. When editing a protection group and you get to this screen:
You need to click on the Modify button on the Backup Schedule area so you get this screen:
Click OK on this screen to close it out. As long as do this the Start Date for the backup schedule will be reset so that it is correct. This will avoid triggering the bug and keep your backups on schedule.
At this time Microsoft has indicated that a script to fix this will be given to me in the relatively near future once they have the bugs worked out. Additionally either in DPM 2012 or DPM 2012 SP1 this issue will be resolved so the scheduler no longer causes this issue. At this time, they are unsure if the bugfix will be ported back to DPM 2010.