BlogNewsPress
21 NOV, 2025

The Cybersecurity Express – November 21, 2025

Cybourn Media Hub

You arrive at the platform not because you must, but because it is there. The rails stretch out in perfect symmetry, a gesture toward reason in a world that resists it. Around you, screens glow with warnings and numbers—each an attempt to name the chaos, to make the unpredictable feel measurable. The Cybersecurity Express will come, they say, though no one can say when. Still, you wait. Waiting, after all, is what we do best when faced with the unknown.

When the train slides into view, you feel no triumph, no relief—only the quiet recognition that vigilance itself is an act of hope. You step aboard, knowing that each destination will promise safety but never deliver it completely. Yet you go anyway, because to seek security in an insecure world is not futility—it is what makes us human.

Yet Another Cloudflare Outage

On November 18, 2025, Cloudflare—the critical infrastructure provider known for its content delivery network (CDN), DDoS protection, and security services, servicing over 20% of the internet—experienced a significant service outage that disrupted internet traffic globally. This outage, lasting approximately six hours (at least, according to Cloudflare’s official statement… Online, people have complained of outages lasting much longer), impacted core Cloudflare services including HTTP traffic routing, Bot Management, Workers KV, and authentication systems like Cloudflare Access and Turnstile login. Unlike cyberattacks or external malice, the root cause was a software bug in the generation of a critical configuration file used by Cloudflare’s Bot Management system.

At approximately 11:20 UTC, Cloudflare’s network began failing to deliver core traffic, resulting in widespread 5xx HTTP status errors. Initially, the sudden spike in errors and degraded service levels led some in the incident response team to suspect a hyper-scale distributed denial-of-service (DDoS) attack. This suspicion was compounded when Cloudflare’s status page, hosted off-network, inexplicably went offline around the same time, misleading responders to consider a coordinated attack against multiple systems.

After intensive investigation, Cloudflare identified that the outage originated from an erroneous change in database permissions on their ClickHouse system—a distributed, columnar database used for managing telemetry and configuration data. A modification aimed at improving security and reliability in distributed query processing unintentionally caused the database’s query to produce a feature configuration file with duplicate rows, more than doubling its expected size.

This configuration file, refreshed every few minutes and deployed across all of Cloudflare’s proxy machines, feeds into the Bot Management system’s machine learning model. This model generates bot scores for incoming web traffic to help clients enforce filtering rules against automated bots and cyber threats. The bot features file exceeded a pre-allocated memory limit of 200 features (the usual count being around 60), causing Rust-based components of the new FL2 proxy engine to panic with unhandled errors. This panic cascaded into systemic failures, manifesting as a surge in 5xx error responses and rendering large segments of Cloudflare’s network unable to route customer traffic.

The problematic query causing duplicated rows resulted from the recent ClickHouse improvement that transitioned queries to run under the original user context rather than a shared system account. This exposed underlying table metadata (r0 database) unintentionally, which was then misinterpreted by the Bot Management feature file generator.

Technical Details and Affected Components:

  • Core Proxy Engines – Two proxy versions exist—FL (legacy) and FL2 (new, Rust rewritten). FL2 was heavily impacted and returned panic errors due to feature file size limits, producing HTTP 5xx errors. FL clients did not error but incorrectly assigned bot scores of zero, potentially causing false positives or unfiltered traffic.
  • Bot Management – The outage originated here due to the corrupted configuration file, which involves machine learning models relying on feature vectors representing web request characteristics for bot detection.
  • Workers KV – Cloudflare’s key-value store, dependent on the Bot Management module, experienced elevated error rates and degraded response times.
  • Cloudflare Access and Turnstile – Authentication systems experienced outages because they rely on upstream Workers KV and Bot Management modules; login and access control were disrupted.
  • Dashboard and Control Plane – User login functionality was impaired, primarily for new sessions requiring Turnstile verification, with two notable impact periods during the outage due to ongoing cascading issues and overload from login backlogs.

Cloudflare’s engineering teams acted swiftly:

  • At 13:05 UTC, a workaround was deployed to bypass affected components in Workers KV and Cloudflare Access, partially mitigating service degradation.
  • By 13:37 UTC, efforts focused on rolling back the Bot Management configuration file to a verified clean version.
  • At 14:24 UTC, automated creation and propagation of Bot Management feature files were halted to stop further dissemination of corrupt configurations.
  • A corrected configuration file was fully deployed across the network by 14:30 UTC, rapidly restoring service functionality.
  • Final system restarts and residual error resolution completed by 17:06 UTC, marking the official end of the outage.

Underlying Systems and Software Affected:

  • ClickHouse Distributed Query Engine – The root cause was traced to a change involving ClickHouse’s “Distributed” engine and metadata queries that began exposing additional shards’ column data leading to duplicate metadata rows.
  • Rust-based Proxy Service (FL2) – The panic error stemmed from an unchecked unwrap() call on Result::Err in feature vector loading code, hitting memory preallocation limits.
  • Machine Learning Feature File – This file organizes web traffic features such as HTTP headers, IP reputation signals, and behavioral attributes critical for bot scoring algorithms.

Cloudflare acknowledged this was its worst network outage since 2019 and issued a formal apology. Planned follow-up measures include:

  • Enhancing ingestion safeguards for system-generated configuration files to prevent malformed or unexpected inputs.
  • Introducing global kill switches to quickly disable malfunctioning features in their proxies.
  • Reviewing and improving error handling to prevent panics and uncontrolled crashes.
  • Improving system monitoring and debugging infrastructure to catch early warning signs and prevent cascading failures.

Given Cloudflare’s integral role in the global Internet ecosystem—protecting millions of websites and services—the disruption caused broad service interruptions, including inaccessible websites, degraded security enforcement, and login failures. Internet users worldwide experienced intermittent outages, illustrating how a subtle bug in a machine learning configuration pipeline can cascade into major network failures.

This incident highlights the complexity and inter dependencies of modern cloud infrastructure, particularly with the adoption of machine learning-driven security mechanisms and distributed database systems. It underscores the importance of rigorous testing, pre-deployment validation, and fail-safe engineering in mission-critical infrastructure.

In sum, the November 18, 2025 Cloudflare outage was not the result of external cyber attacks but an internal software regression triggered by a database query permission change. The incident caused cascading failures in Cloudflare’s bot management and proxy services, triggering global service disruptions. Although far less severe than Cloudflare’s infamous July 19, 2024 bluescreen crash—which brought large portions of its client’s IT systems to a standstill and left a lasting dent in the company’s reputation—the new outage placed Cloudflare back under scrutiny, reigniting new criticism without never fully recovering from the earlier blunder. This latest disruption contributed to a 3 percent drop in Cloudflare’s stock price, a modest decline compared to the market shock following the 2024 event, but still indicative of growing concerns around reliability. Swift mitigation efforts restored services, and Cloudflare is now focused on enhancing resilience to prevent future incidents of this nature. Together, these episodes illustrate the challenges of managing risk in increasingly complex cloud and security infrastructures.

Recent State of Artificial Intelligence: Rapid Growth Amidst Uncertainty and Noise

Artificial Intelligence (AI) continues to evolve at an unprecedented pace, driving transformative potential across industries and reshaping how people and organizations operate. However, beneath the surface of this rapid expansion lies a complex web of competing technologies, growing pains, and market hype that is creating confusion, uncertainty, and stalled adoption for many businesses. The state of AI in late 2025 is one of excitement mixed with frustration, as technical challenges, inconsistent capabilities, and an overheated investment climate cloud the promise of this revolutionary technology.

AI development is accelerating with remarkable speed. New AI models and frameworks emerge weekly, embodying diverse languages, architectures, and specialties—from generative language models and vision systems to reinforcement learning agents and domain-specific AI. This explosion of variety creates a noisy, cluttered ecosystem. While this diversity allows highly specialized use cases and innovations, it also overwhelms potential adopters and confounds standardization and integration efforts.

The 2025 AI Index report highlights that nearly 90% of notable AI models originated from industry research labs, a significant rise from previous years, with U.S., Chinese, and European institutions competing fiercely on both quality and quantity fronts. This global competition drives remarkable breakthroughs but also contributes to overlapping efforts and inconsistent APIs, protocols, and performance benchmarks. Companies often face a bewildering choice from dozens of incompatible AI toolsets, each promising unique advantages but few providing seamless interoperability or reliability at scale.

Despite considerable advances, AI today is far from a flawless or fully autonomous technology. Researchers continue identifying critical bugs and systemic errors that undermine AI reliability and trustworthiness. One prominent problem is the prevalence of “hallucinations,” where AI confidently generates inaccurate or fabricated information. This issue plagues even advanced language models like ChatGPT and Gemini, impacting their suitability for high-stakes decision-making or content generation roles.

Security experts warn of adversarial vulnerabilities where malicious actors craft inputs deliberately designed to confuse AI models, bypass filters, or exploit backdoors in training datasets. For instance, data poisoning attacks risk embedding subtle manipulations that can alter AI behavior unpredictably without immediate detection. These technical challenges expose risks not only in performance but also data privacy and corporate compliance, substantially tempering expectations for AI replacing human roles anytime soon.

​Recent high-profile cases, such as AI-generated “deepfake” citations in court filings or large-scale removal of AI-generated spam tracks on platforms like Spotify, underscore weaknesses and unintended consequences still to be resolved. Tools intended to detect AI-generated content often fail or produce false positives, limiting potential use in verifying authenticity or combating misinformation.

Corporations eager to harness AI’s promise abound, yet many find themselves at an impasse. The fragmented landscape and ongoing quality concerns contribute to “analysis paralysis” where decision-makers hesitate to invest heavily in AI initiatives for fear of sunk costs, failed implementation, or harm to brand trust. Although over 40% of U.S. businesses report actively paying for AI tools, adoption beyond pilot phases remains spotty and outcomes uneven.

This ambivalence represents a standstill, as organizations wait for the technology to mature and prove clear return on investment. AI leaders emphasize the need for partnerships with experienced developers capable of navigating AI’s complexities, focusing on efficient data governance, algorithmic transparency, and scalability. Enterprises that succeed often take a modular approach, piloting narrowly defined use cases with stringent oversight before broader rollouts.

A darker facet of the AI boom is an overheated market rife with inflated promises and marketing embellishments. Recent critiques draw parallels to the dotcom bubble of the early 2000s, where speculative capital chases shiny narratives rather than fundamentals. High-profile investors like Michael Burry have publicly accused major AI hyperscalers of artificially boosting earnings and user metrics to secure higher valuations, cautioning against the risks of unsustainable growth fueled by hype rather than solid performance.

In this race for AI supremacy, companies aggressively promote minor incremental improvements as breakthroughs, inflate user engagement numbers with vanity metrics, and spotlight model size or training data volume while obscuring lingering shortcomings. This “AI bubble” breeds skepticism among industry analysts and decision-makers wary of becoming the next overvalued casualty.

Even prominent AI CEOs, including OpenAI’s Sam Altman, acknowledge the inherent risks of this enthusiasm-driven inflation, emphasizing the need for sober assessment of what AI can realistically achieve today versus aspirational forecasts of artificial general intelligence (AGI). The current market is marked by a juxtaposition of meaningful technological progress with marketing excess, requiring discerning evaluation by investors and users alike.

​To make AI work effectively for businesses, collaboration between AI developers, domain experts, and users is critical. Investments in high-quality data infrastructure and transparent metrics measuring actual business impact over superficial model size are essential. Strategic specialization to solve specific problems rather than “one size fits all” AI will improve adoption success. Moreover, regulation and industry standards are necessary to ensure safety, fairness, and accountability.

In conclusion, while AI is not yet a universal replacement for human expertise and still grapples with technical and market uncertainties, it remains a cornerstone of future innovation. Companies should cautiously navigate the noise and inflated claims, ground expectations in technical realities, and embrace incremental, well-managed AI deployments to build sustainable competitive advantage in this transformative but still maturing domain.

This wraps up today’s issue. Wherever you are out there in the digital world just stay safe, install the latest patches and keep a watchful eye out for anything that might want to deceive you. Thank you so much for being a wanderer on The Cybersecurity Express and we look forward to welcoming you on board the next time.

Share

We Also Recommend to See:

EtherLast™
The versatile platform that allows you to promptly detect complex threats, analyse and respond to them from a single pane of glass.
Dreamlab
CyBourn's DreamLab pushes the boundaries of innovation in the cyberspace.

Tell us about your Cybersecurity needs

We are strategists, engineers, analysts, and governance experts embedded in the world’s biggest cyber missions and trusted to advance them. Let us help you today.