Note: Non-human translated text, this is a translation from the original HU article and might contain mistakes.

Skyguard Car Security Service Complete Outage - Critical Deficiencies in a EuroOne Partner's IT System

Consequences of Business Continuity and Disaster Recovery Deficiencies

Time investment: 10+ hours active problem-solving and communication

Duration: 40+ hours of complete service outage

Financial damage: 80,000 HUF + VAT additional cost at the service center

The Case from a Professional Perspective: IT Infrastructure and Business Continuity

On December 2, 2023, Saturday evening, a severe service outage began with the Skyguard car security system operated by Autosecurit Ltd. As an IT infrastructure expert, it was immediately clear to me that the problem stemmed from the lack of an adequate business continuity and disaster recovery plan (BCDR), and that the recovery time objective (RTO) and recovery point objective (RPO) were obviously not properly defined or implemented.

The severe outage of Skyguard was not just a simple technical error, but a systemic problem that highlights the critical role of IT infrastructure in modern services. It is particularly concerning that this involves a security service whose primary role is to guarantee safety, and which operates as a EuroOne partner, where there are strict requirements regarding service availability.

The Customer Experience: Loss of Security and Inconveniences

The real impact of the service outage was evident on the customer side. The application suddenly interrupted authentication, which meant that the car could only be used with the traditional PIN keypad. Fortunately, this worked for me as the device was in the car and I had recently replaced its battery.

"It was particularly concerning that when the car alarm went off, no call came from the service provider. The essence of a 24/7 security service would be to respond immediately in such cases."

The situation became critical on Monday when I had to take the car to the service center. Due to the Skyguard system malfunction, I was unable to put the vehicle into service mode despite having the correct PIN code. The service center eventually had to create a bypass for the CAN gateway, resulting in an additional cost of 80,000 HUF + VAT. This was not just a financial loss, but demonstrated how a security system failure can cascade and cause problems with other systems in the car.

In addition to the loss of security, the unavailability of customer service was a serious problem. On Sunday, I attempted to contact them by phone, but all calls were immediately rejected. This is completely unacceptable for a service that performs security functions and 24/7 availability would be expected.

The IT Expert's Diagnosis: Technical Background of the Problem

Based on my IT professional experience, I identified several possible causes behind the collapse of the Skyguard system:

  • Complete data center shutdown: This alone does not explain the 40+ hour outage. Even in the case of a power outage, service could be restored within 4 hours with a diesel generator.
  • Lack of redundancy: A basic requirement for a security service would be the existence of a geographically separate secondary data center.
  • Possibility of a cyber attack: Upon investigation, I noticed that Skyguard's infrastructure is AWS-based. Other AWS customers did not report service outages during this period, so it is more likely that a targeted attack or serious configuration error occurred.
  • Telephone system collapse: The automatic rejection of phone calls indicates a VOIP or other network problem, which means that the telephony was not properly redundant.

It is particularly concerning that not only the application, but also the telephone customer service was unavailable. This indicates that there was no proper backup system or procedure in case the primary communication channel fails. A truly professional security service should have multiple, independent communication channels.

Milestones of the Service Outage: Anatomy of an IT Disaster

Phase 1: Service Collapse and Initial Reactions
December 2, 2023 (Saturday), 18:00
Complete service outage
The application interrupted authentication, the system became unavailable. The car can only be used with a PIN keypad.
December 2, 2023 (Saturday), approx. between 19:00-22:00
Alarm malfunction
The car alarm went off, which could be heard from nearby, but no call came from the service provider, indicating a serious flaw in the security protocol.
December 3, 2023 (Sunday), 09:00-18:00
Failed contact attempts
Multiple attempts to reach the service provider. Every phone call was immediately rejected, customer service completely unavailable.
Phase 2: Problem Escalation and Further Consequences
December 4, 2023 (Monday), 08:30
Impossibility of activating service mode
The car could not be put into service mode despite the correct PIN code. The system continuously showed errors.
December 4, 2023 (Monday), 10:00
Bypass creation at the service center
The service center had to create a bypass for the CAN gateway, resulting in significant additional costs.
December 4, 2023 (Monday), 14:00
Service provider still unavailable
Even more than 40 hours after the start of the outage, it was not possible to contact the service provider by phone. They responded on Facebook at 9:01, advising to call the central numbers, but none were reachable despite multiple attempts since the weekend.
Phase 3: Partial Recovery and Evaluation
December 5, 2023 (Tuesday)
Partial restoration
During a phone conversation, they informed that the service is only partially operational. According to customer service, systems are being restored gradually, one after another.
December 6, 2023 (Wednesday)
Ongoing issues
Customer service claimed the system is now operational and PIN codes were verified, but the PIN keypad still doesn't work, and the Skyguard system couldn't be switched to service mode. Access to logs is not possible, and the exact cause of the problems remains completely obscure.
RTO and RPO in Practice: The Cornerstones of Modern IT Security

The Skyguard case is an excellent example of why proper definition and implementation of RTO (Recovery Time Objective) and RPO (Recovery Point Objective) is critically important for every service, especially security systems.

Based on my 25+ years of IT infrastructure experience, for a security service, especially as a contracted partner of EuroOne, the following expectations would be reasonable:

  • RTO (Recovery Time Objective): maximum 4 hours. This means that the service must be operational again within 4 hours after a disaster.
  • RPO (Recovery Point Objective): maximum 1 hour. This means that a maximum of 1 hour of data loss is permissible.

In the case of Skyguard, these two values were exceeded many times over: the RTO exceeded 40 hours, while the RPO value corresponded to several days, as basic customer and contract data were not available even after recovery.

The critical role of IT infrastructure in modern business life cannot be emphasized enough. The lack of proper systems, protocols, and security procedures means not only technical problems but real business and security risks. The Skyguard case clearly shows that even a security provider can struggle with serious deficiencies in this area.

Today, every business is built on IT infrastructure. Take, for example, banks and financial service providers that operate with 99.999% availability, meaning less than 5 minutes of annual downtime. Healthcare systems, where an IT outage could endanger lives, often use redundant architectures with multiple data centers. For airports and airlines, a booking system failure can cause hundreds of thousands of dollars in losses per minute and affect thousands of passengers.

Previous Experiences and the Lack of Lessons Learned

It is particularly concerning that this was not the first problem of this kind in Skyguard's history. Already in 2015, more than 8 years ago, I experienced a similar outage when the application and phone availability also did not work.

A memorable incident was when they woke me up at 3 AM to ask if everything was alright with my car because it had triggered an alarm, while there was a server outage. This already indicated a serious deficiency - they were asking me about my car while their own server was not working!

The fact that similar problems arose in both 2015 and 2023 suggests that the company has not learned from previous mistakes and has not adequately developed its IT infrastructure. This is particularly problematic for a provider that operates as a EuroOne partner and would be expected to continuously improve and apply the highest level of security protocols.

A proper business continuity plan (BCDR) would consist of the following elements:

  • Multi-layered redundancy: Minimum two geographically separate data centers, plus a cloud-based backup system
  • DNS A and CNAME records: Would allow quick redirection to another infrastructure within 1-2 hours
  • "Shadow copy" database: Staff could access basic data even during a complete system outage
  • Fault-tolerant application design: The application must be able to recognize a failure in the primary system and automatically switch to the secondary system
  • Redundant telephone system: Communication channels must also be redundant
  • Regular DR practices: Recovery plans need to be regularly tested

In the case of Skyguard, it is clear that these elements were either completely missing or not functioning properly. This is unacceptable for a provider responsible for the security of its customers, and particularly problematic from the perspective of the EuroOne partnership.

Conclusions and Recommendations: Path to a Secure Service

The 40+ hour outage of the Skyguard service and the subsequent problems clearly show that urgent changes are needed in the company's IT infrastructure and business continuity planning.

In my experience, companies are often only willing to take BCDR planning seriously after a severe incident. I remember a previous client who only accepted the previously proposed redundant Hyper-V and Microsoft SQL solutions after a significant system outage.

In the case of Skyguard, the question arises as to how well they comply with industry standards, especially the EuroOne requirements, which are extremely strict for security services. The service provider needs to reevaluate its business continuity plan and make significant investments in its IT infrastructure.

I would recommend the following steps to avoid similar cases:

  1. Review the Single Points of Failure during the cyber security assessment
  2. Perform and document an appropriate risk analysis
  3. Implement a multi-layered, geographically separated redundant infrastructure
  4. Define and adhere to clear RTO and RPO targets
  5. Hold regular DR exercises to test recovery protocols
  6. Develop a multi-level customer communication plan for service outages

A security service, especially a EuroOne partner, should set an example in the areas of reliability, security, and business continuity. The Skyguard case clearly shows that they still have a long way to go in this area.

This case is an excellent example of how IT infrastructure and proper business continuity planning is not just a technical issue, but has a direct impact on customer safety, convenience, and company reputation. The consequences of IT failures extend far beyond technology and pose real business and security risks.