Exoprise continuously monitors the operations of its service — using CloudReady sensors from all over the world in addition to other services. If there are any interruptions in service, planned or unplanned, a note will be posted here.
|Core Servers||Public Sites||Alarms||Dependant Services||Service Comms API|
Monday May 9th, 2022 12:30 PM
A fix was deployed for the Yammer sensors that was affecting some tenants where the targeted group for synthetic testing was not locatable with the returned result set. The sensors should automatically recover as the updates are delivered. We apologize for any inconvenience this may have caused.
Thursday May 5th, 2022 10:00 PM
We are investigating issues with Yammer sensors for certain tenants and the visibility of the designated synthetic test group. We are still trying to recreate the issue.
Thursday February 24th, 2022 9:00 PM
We have continued to monitor the offline sites and are only down to a few. Most sites have recovered and if sites remain offline they should be cycled via the Management Client, services controller or other means. We expect 100% of the sites to recover within the next 1 to 2 hours.
We will continue to investigate the root cause of the outage and publish the findings here within the next few days. We apologize for any inconvenience this outage may have caused and plan to put in place processes to prevent this from happening in the future.
[Updated 12:30 2022-02-25, Root Cause]: Root cause analysis determined that an update deployed earlier in the day on February 24 caused many older sites to temporarily go offline. The update to the core task took advantage of a newer platform call and was not supported by older sites. This caused the core task to error and prevented uploads or further operations by many sites.
The rolling outage wasn’t discovered until later in the day and once it was, the operations team initiated a rollback. Once rolled back, sites began updating again and fully recovered within 2 hours.
To prevent this from happening again, we’ve enhanced our automated testing against older versions of our sites and will keep this in place until older site versions are completely upgraded. Again, we apologize for any inconvenience the outage caused.
Thursday February 24th, 2022 7:50 PM
We have successfully rolled back system updates and restoring previous systems. We see sites slowly coming online and sensor data being updated. We will post an update within an hour.
Thursday February 24th, 2022 7:00 PM
We continue to investigate the source of the outage and are rolling back system updates. We will post an update within an hour.
Thursday February 24th, 2022 6:00 PM
We are investigating an outage of some of the queuing and reporting subsystems for our monitoring service. We will post an update within an hour.
Friday October 14th, 2021 3:29 PM
- PUB-FRANKFURT public sites have recovered and are processing tests.
Friday October 14th, 2021 3:00 PM
- PUB-FRANKFURT public sites are experiencing availability issues and we are investigating.
Wednesday September 15th, 2021 7:15 AM
- Some machines in the PUB-TOKYO region experienced outages during a Windows update and took longer to recover than expected. We are investigating the issue to determine the root cause and to prevent this in the future.
Thursday August 19th, 2021 12:00 PM
- It looks like Microsoft has finally (after 24 hours) recognized the error of its ways and corrected an account prompt AFTER signing in. The changes appear to be laboriously rolling out across their sign in infrastructure. We’ve made no changes to our sensors.
Thursday August 19th, 2021 9:50 AM
- We have identified a core issue that is causing the sensors to periodically fail to sign in. It looks like Microsoft has recently changed the logon procedure to include an extra prompt to select an account when its not necessary.
We actually don’t know why they are prompting for account selection. It seems like an inadvertent change and partial deploy on Microsoft’s part where they didn’t tell anyone about their new sign-on flow. Additionally, its appears to only be partially deployed which is causing further problems. Finally, it only appears to be affecting Teams making it further seem like a bug.
We continue to investigate workarounds but they have to be executed carefully because Microsoft will likely roll it back once they realize their foolish mistake.
Thursday August 19th, 2021 8:30 AM
- We are investigating a Teams sign-in interaction that is causing increased sensor errors. We expect to have updates later this afternoon and will post more information here as it is discovered.
Saturday June 12th, 2021 1:00 PM
- We’ve identified a fix for the small number of customers that are affected by this condition with OWA sensors. We are continuing to test in development and stage and plan to deploy the fix at some point during the next two business days.
Friday June 11th, 2021 5:00 PM
- There are limited reports of OWA sensors periodically failing for a limited set of tenants in different regions. We are investigating the issue and attempting to pinpoint differences. We expect to have more to report on Monday, June 14th.
Thursday April 22st, 2021 2:44 PM
- Virtual machines in the Ohio region have been rebooted and are operational. We apologize for the inconvenience caused by this outage and will investigate quicker failover detection and remediation.
Thursday April 22st, 2021 2:00 PM
- We are investigating an outage for some of the machines in the Ohio region. We hope to have an update within the next 45 minutes.
Thursday April 1st, 2021 6:20 PM
- Azure DNS is recovering and so are the core Exoprise services. Exoprise recently aligned itself with our customers by moving pieces of the Exoprise infrastructure to Azure. This inadvertently introduced new dependencies on Azure services which we are examining for improvement. More information about the outage
Thursday April 1st, 2021 5:40 PM
- Core services that are dependent on Azure are experiencing DNS issues. We will update this page by 6:00 PM at the latest.
Wednesday March 31st, 2021 11:43 AM
- We rolled back DNS maintenance for our core services and recovery should be complete.
Wednesday March 31st, 2021 11:23 AM
- We are performing some emergency DNS maintenance for our core services. We expect recovery to be complete within the next 10-15 minutes. We will update this page within that time period.
Monday March 8th, 2021 1:00 AM
- Maintenance to the PUB-VIRGINIA has completed. We apologize for any inconvenience this may have caused.
Monday March 8th, 2021 9:00 AM
- Public hosting sites in the Virginia (US-EAST) region are undergoing maintenance and load balancing. We hope to have the procedures finished by noon today. We will update this page as soon as it is finished. In the meantime, some performance oriented alarms from sensors running in this region may be elevated.
Friday March 5th, 2021 1:26 PM
- A fix for the new Teams meeting experience has been deployed and is being rolled out across the Teams AV sensors. We continue to investigate various tenant channels for Teams.
Friday March 5th, 2021 8:29 AM
- The Teams AV sensors are failing for certain tenants as an updated Teams meeting experience is deployed across servers and tenants. We are investigating the issue and hope to have a resolution by the end of the day.
Thursday January 28th, 2021 3:00 PM
- We updated our Teams AV sensor to support newer Microsoft Teams interfaces that are being deployed. A very small subset of tenants and users were affected by the updated. The Teams AV sensor should update itself within the next 2 hours.
Thursday January 28th, 2021 5:30 AM
- We are investigating interface and API upgrades to Microsoft Teams that may be causing sensor errors across different tenants. We hope to have updates available by the end of the day.