Exoprise continuously monitors the operations of its service — using CloudReady sensors from all over the world in addition to other services. If there are any interruptions in service, planned or unplanned, a note will be posted here.
|Core Servers||Public Sites||Alarms||Dependant Services||Service Comms API|
Monday May 9th, 2022 12:30 PM
A fix was deployed for the Yammer sensors that was affecting some tenants where the targeted group for synthetic testing was not locatable with the returned result set. The sensors should automatically recover as the updates are delivered. We apologize for any inconvenience this may have caused.
Thursday May 5th, 2022 10:00 PM
We are investigating issues with Yammer sensors for certain tenants and the visibility of the designated synthetic test group. We are still trying to recreate the issue.
Thursday February 24th, 2022 9:00 PM
We have continued to monitor the offline sites and are only down to a few. Most sites have recovered and if sites remain offline they should be cycled via the Management Client, services controller or other means. We expect 100% of the sites to recover within the next 1 to 2 hours.
We will continue to investigate the root cause of the outage and publish the findings here within the next few days. We apologize for any inconvenience this outage may have caused and plan to put in place processes to prevent this from happening in the future.
[Updated 12:30 2022-02-25, Root Cause]: Root cause analysis determined that an update deployed earlier in the day on February 24 caused many older sites to temporarily go offline. The update to the core task took advantage of a newer platform call and was not supported by older sites. This caused the core task to error and prevented uploads or further operations by many sites.
The rolling outage wasn’t discovered until later in the day and once it was, the operations team initiated a rollback. Once rolled back, sites began updating again and fully recovered within 2 hours.
To prevent this from happening again, we’ve enhanced our automated testing against older versions of our sites and will keep this in place until older site versions are completely upgraded. Again, we apologize for any inconvenience the outage caused.
Thursday February 24th, 2022 7:50 PM
We have successfully rolled back system updates and restoring previous systems. We see sites slowly coming online and sensor data being updated. We will post an update within an hour.
Thursday February 24th, 2022 7:00 PM
We continue to investigate the source of the outage and are rolling back system updates. We will post an update within an hour.
Thursday February 24th, 2022 6:00 PM
We are investigating an outage of some of the queuing and reporting subsystems for our monitoring service. We will post an update within an hour.
Friday October 14th, 2021 3:29 PM
- PUB-FRANKFURT public sites have recovered and are processing tests.
Friday October 14th, 2021 3:00 PM
- PUB-FRANKFURT public sites are experiencing availability issues and we are investigating.
Wednesday September 15th, 2021 7:15 AM
- Some machines in the PUB-TOKYO region experienced outages during a Windows update and took longer to recover than expected. We are investigating the issue to determine the root cause and to prevent this in the future.
Thursday August 19th, 2021 12:00 PM
- It looks like Microsoft has finally (after 24 hours) recognized the error of its ways and corrected an account prompt AFTER signing in. The changes appear to be laboriously rolling out across their sign in infrastructure. We’ve made no changes to our sensors.
Thursday August 19th, 2021 9:50 AM
- We have identified a core issue that is causing the sensors to periodically fail to sign in. It looks like Microsoft has recently changed the logon procedure to include an extra prompt to select an account when its not necessary.
We actually don’t know why they are prompting for account selection. It seems like an inadvertent change and partial deploy on Microsoft’s part where they didn’t tell anyone about their new sign-on flow. Additionally, its appears to only be partially deployed which is causing further problems. Finally, it only appears to be affecting Teams making it further seem like a bug.
We continue to investigate workarounds but they have to be executed carefully because Microsoft will likely roll it back once they realize their foolish mistake.
Thursday August 19th, 2021 8:30 AM
- We are investigating a Teams sign-in interaction that is causing increased sensor errors. We expect to have updates later this afternoon and will post more information here as it is discovered.