CLE01 HELM Legacy Cluster Network Routing Issues

At approximately 11:15PM, March 13th we have become alerted to an issue affecting several legacy HELM servers in CLE01 datacenter. We are currently investigating this issue.

We appreciate your patience, all updates will be posted here.

UPDATE 12:06AM March 14th: The connectivity issues are affecting RoadRunner, CenturyTel, and several other ISPs. All servers are operational and are reachable from different networks. It appears the issue is related to network routing issues with the Level(3) core router in the Cleveland area.

UPDATE 2:00PM March 14th: We have escaleted this issue with the Level (3) core team. We are waiting for a resolution. At the moment our servers continue to operate and serve network traffic, however the routing issues continue to occur for users accessing affected VLAN from certain ISPs.

UPDATE 7:41PM March 15th: Our team has spent the entire day troubleshooting in the CLE01 datacenter affected by the routing issues. Here is what we know. Servers are operational and are serviving traffic, so it is not a completely "lights out situation". We are aware that you may see your site as "down", however another person connecting from elsewhere will have access to your site, business as usual. The conclusion we have arrived at, along with the upstream provider's staff is that the upstream core router is malfunctioning. Unfortunatly, being a weekend there is no one qualified to dignose this equipment. We don't own it, nor have access to it on premises where it is located. The scope of impact is limited to our legacy CLE01 facility that serves a portion of the HELM cluster. If you are a legacy HELM customer, about 2/3 of our entire capacity is carried in another datacenter and most likely you are not affected. In some instances it is possible that your website is hosted in the CLE02 datacenter, but the database is in CLE01. In this case it is plausable that your MSSQL or MYSQL database will not respond and the site may show a database related error. We understand this is a serius situation and are working hard to resolve it. Unfortuntly we have to wait for a large telecom provider to troubleshoot their equpment and cannot speed this up as much as we would like to. We ask all affected customers to limit the number and frequency of calls and support tickets opened regarding this outage. We are aware of the issues and our staff is already under pressure and working through the weekend to solve these problems.

UPDATE 1:26PM March 16th: Our technicians are currently on hand at the data center running tests and working with the data center's technicians to resolve the network routing issues occurring within the data center. This may require DNS testing and switches that could briefly extend the range of servers or domain names affected by this network routing issue. However, those tests should be brief and temporary. Those tests should help the data center technicians to narrow down the cause of the issue and have it corrected within their infrastructure to fully resolve the issue. We greatly appreciate your continued patience as we work with the data center technicians to fully resolve this issue as quickly as possible.

UPDATE 5:50PM March 16th: Our CLE01 Data Center provider has isolated the cause of the networking issue within th CLE 01 and is working to resolve that issue as quickly as possible. In the meanwhile our own technicians have begun physically moving our DNS Servers to our CLE02 data center where they will be setup immediately to route all DNS requests through our CLE02 data center provider. This will restore all DNS and networking issues related to the ihostasp.net name servers not being resolvable currently. This will restore the majority of services currently being affected for most customers. Once the CLE01 data center technicians have been able to resolve the physical issue within their network then all remaining CLE01 services will be restored to normal as well.

UPDATE 7:30PM March 16th: The issue has been resolved. All servers in the HELM cluster for the exception of the control panel are functioning. From the preliminary reports theunidirectional link failure on the port channel connecting two of devices on the upstream providers equpment has caused an assymetric routing which has resulted in a bizzare and difficult to diagnose routing behavior.

Friday, March 13, 2015

« Back

CLE01 HELM Legacy Cluster Network Routing Issues

By Month

By Month

Support

CLE01 HELM Legacy Cluster Network Routing Issues

By Month

By Month

Support

Generate Password