Analysis and reporting on 28-day backup heatmap for managed service providers.
Analysis and reporting on 28-day backup heatmap for managed service providers.
The data covers the full scope of Autotask PSA records relevant to this analysis, broken down by the key dimensions your team needs for day-to-day decisions and client reporting.
Who should use this: NOC teams, service managers, and MSP owners monitoring backup compliance
How often: Daily for operations, weekly for management review, monthly for client reporting
Analysis and reporting on 28-day backup heatmap for managed service providers.
EVALUATE ROW("Total History Records", COUNTROWS('BI_Backup_SaasProtection_Backup_History_Summary_Day'), "Distinct Dates", DISTINCTCOUNT('BI_Backup_SaasProtection_Backup_History_Summary_Day'[Date]), "Total Report Items", COUNTROWS('BI_Backup_SaasProtection_Report_Items'), "Avg Perfect Pct", AVERAGE('BI_Backup_SaasProtection_Backup_History_Summary_Day'[Max_Active_Service_Perfect_Percent]))
Breakdown of 28-day backup heatmap across managed clients.
| Client | Perfect Days | Worst Day | Weekend Avg | Weekday Avg | Status |
|---|---|---|---|---|---|
| CloudGuard MSP | 18 | 87.2% | 97.1% | 93.8% | Good |
| DataVault Pro | 17 | 85.3% | 94.9% | 91.7% | Good |
| IronShield IT | 17 | 83.1% | 92.6% | 89.4% | Warning |
| SafeHaven Tech | 16 | 79.8% | 88.9% | 85.9% | Warning |
| Citadel Systems | 15 | 76.7% | 85.4% | 82.5% | Critical |
| FortKnox IT | 14 | 72.0% | 80.2% | 77.5% | Good |
CloudGuard MSP maintains the highest backup success rate in the portfolio at over 99%. FortKnox IT trails significantly and needs a focused remediation plan addressing VSS errors and storage constraints. Closing this gap would eliminate the most common source of client risk in your backup operations.
EVALUATE
SUMMARIZECOLUMNS(
BI_Datto_Backup_Jobs[company_name],
"Perfect Days", DIVIDE(
CALCULATE(COUNTROWS(BI_Datto_Backup_Jobs), BI_Datto_Backup_Jobs[is_successful] = TRUE()),
COUNTROWS(BI_Datto_Backup_Jobs)
),
"Worst Day", COUNTROWS(BI_Datto_Backup_Jobs)
)
ORDER BY [Perfect Days] DESC
How 28-day backup heatmap has evolved over the past three quarters.
| Date | Perfect % | Total Services | Perfect Services |
|---|---|---|---|
| 2026-01-20 | 97.3% | 22,020 | 21,295 |
| 2026-01-19 | 99.97% | 22,028 | 22,015 |
| 2026-01-18 | 99.97% | 22,042 | 22,030 |
| 2026-01-09 | 82.2% | 22,835 | 20,187 |
| 2026-01-07 | 77.4% | 22,797 | 19,991 |
The portfolio shows consistent improvement over three quarters, moving from 91.4% in Q3 2025 to 95.6% in Q1 2026. This upward trend reflects targeted optimization efforts. Maintain the current improvement cadence and extend attention to newly onboarded clients to sustain the trajectory.
EVALUATE TOPN(28, GROUPBY('BI_Backup_SaasProtection_Backup_History_Summary_Day', 'BI_Backup_SaasProtection_Backup_History_Summary_Day'[Date], "Avg_Perfect_Pct", AVERAGEX(CURRENTGROUP(), 'BI_Backup_SaasProtection_Backup_History_Summary_Day'[Max_Active_Service_Perfect_Percent]), "Total_Services", SUMX(CURRENTGROUP(), 'BI_Backup_SaasProtection_Backup_History_Summary_Day'[Max_Active_Service_Count]), "Perfect_Services", SUMX(CURRENTGROUP(), 'BI_Backup_SaasProtection_Backup_History_Summary_Day'[Max_Active_Service_With_Perfect_Backup_Count])), 'BI_Backup_SaasProtection_Backup_History_Summary_Day'[Date], DESC) ORDER BY 'BI_Backup_SaasProtection_Backup_History_Summary_Day'[Date] DESC
Individual devices with backup failures in the past 7 days.
| Client | Device | Last Success | Consecutive Failures | Error | Severity |
|---|---|---|---|---|---|
| Citadel Systems | SRV-DC-01 | 2026-03-28 | 6 | VSS writer timeout | |
| Citadel Systems | SRV-SQL-02 | 2026-03-30 | 4 | Disk space exhausted | |
| CloudGuard MSP | SRV-FILE-01 | 2026-04-01 | 3 | Network timeout | |
| SafeHarbor Tech | WS-CAD-04 | 2026-04-02 | 2 | Agent not responding | |
| FortKnox IT | SRV-APP-01 | 2026-03-25 | 9 | License expired | |
| BackupFirst Inc | SRV-DC-02 | 2026-04-01 | 3 | VSS writer timeout |
Citadel Systems has two servers with consecutive failures, including their domain controller SRV-DC-01 which has not backed up successfully since March 28. FortKnox IT's SRV-APP-01 has the longest streak at 9 consecutive failures due to an expired license that should have been caught by proactive monitoring.
Daily backup job statistics across the portfolio.
| Client | Daily Jobs | Avg Size (GB) | Success Rate | Avg Duration | Efficiency |
|---|---|---|---|---|---|
| CloudGuard MSP | 48 | 124.6 | 99.2% | 42 min | |
| DataVault Pro | 36 | 89.4 | 97.8% | 38 min | |
| FortKnox IT | 52 | 156.2 | 88.4% | 68 min | |
| IronShield IT | 28 | 42.8 | 96.4% | 22 min | |
| Citadel Systems | 44 | 198.4 | 72.4% | 94 min | |
| SafeHarbor Tech | 32 | 67.2 | 94.8% | 34 min | |
| Vault360 IT | 24 | 34.6 | 98.6% | 18 min | |
| BackupFirst Inc | 40 | 112.8 | 91.2% | 52 min |
Citadel Systems runs 44 daily backup jobs with the lowest success rate at 72.4% and the longest average duration at 94 minutes. Their average backup size of 198.4 GB suggests oversized backup sets that should be reviewed. CloudGuard MSP demonstrates best practice with 99.2% success across 48 jobs.
Weekly backup success rates for the past 4 weeks.
| Week | Total Jobs | Successful | Failed | Success Rate | Change |
|---|---|---|---|---|---|
| Mar 10-16 | 2,184 | 2,052 | 132 | 93.9% | -- |
| Mar 17-23 | 2,216 | 2,082 | 134 | 93.9% | +0.0% |
| Mar 24-30 | 2,198 | 2,038 | 160 | 92.7% | -1.2% |
| Mar 31-Apr 6 | 2,240 | 2,068 | 172 | 92.3% | -0.4% |
Backup success rates declined from 93.9% to 92.3% over the past four weeks. Failed jobs increased from 132 to 172, a 30.3% increase. The drop began in the week of March 24, which correlates with the Citadel Systems license expiration and increased VSS failures at multiple sites.
The gap between top and bottom performers is wider than expected. The bottom 20% scores more than 25 percentage points below the portfolio average, indicating structural issues that require targeted intervention.
Entities in the moderate risk category show a declining trend over the past quarter. Without intervention, 3-4 of these entities may shift to the high-risk category within 60 days.
The top 30% of the portfolio maintains stable performance above target, indicating current best practices are effective and can serve as a model for the rest.
1. Conduct a targeted review of all high-risk entities within 2 weeks. Document the root cause for each entity and create a remediation plan with clear deadlines and accountable owners.
2. Implement automated monitoring for the moderate-risk group. Set thresholds that trigger an alert when performance drops 5 percentage points below target, enabling early intervention before entities slip into high risk.
3. Schedule this report monthly as part of the QBR process. Use the trend data to verify that improvement initiatives are delivering measurable results across multiple quarters.
Look for horizontal streaks (persistent device issues) and vertical columns (date-specific events like network outages or maintenance windows that affected many devices).
Regular failures on specific weekdays suggest scheduled conflicts. Failures clustering around midnight indicate backup window congestion. Weekend gaps suggest devices being powered off.
A 95% success rate sounds good, but the heatmap might show that the same 5% of devices fail every single day, which is much worse than random distributed failures.
28 days captures a full monthly cycle including month-end processing, which is often when backup issues spike due to higher data churn.
At 92%, roughly 1 in 12 backup jobs fails. If backups run daily, that means some devices go 2+ days between successful backups. For servers with RPO targets under 24 hours, a 92% rate is unacceptable.
Alert on consecutive failures (2+), not single failures. Transient issues like temporary network glitches cause one-off failures that self-correct. Consecutive failures indicate a persistent problem that needs human intervention.
Connect Proxuma Power BI to your PSA, RMM, and M365 environment, use an MCP-compatible AI to ask questions, and generate custom reports - in minutes, not days.
See more reports Get started