The Walls Have Ears: Why Your Penetration Testing Is Stuck in the Past
Penetration testing has long been a vital tool in our cybersecurity arsenals, helping organizations find and address vulnerabilities before attackers can exploit them. Traditionally, these tests involve simulating attacks to identify weaknesses. However, as cyber threats become more sophisticated and our technological landscapes more complex, are these conventional methods keeping pace?
In a recent paper we published, "Redefining Penetration Testing: A Deterministic and Non-Deterministic Approach Through the Adversarial Penetration Testing Model (APTM)," we argue that we can improve our approach to adversarial penetration testing especially with rapid adoption of AI/ML systems. It proposes a mathematically-grounded framework designed to overcome the shortcomings of static testing by integrating intelligent, adaptive strategies into offensive security assessments.
The Old Playbook: Why Traditional Pen Testing Falls Short
Conventional penetration testing methodologies, while foundational, exhibit significant limitations in their capacity to model the dynamic, adaptive, and probabilistic nature of modern cyber-attacks. These traditional approaches are often constrained by a reliance on predefined toolkits, rigid, checklist-driven procedures, and an absence of real-time feedback mechanisms, resulting in a failure to adequately simulate sophisticated adversarial behavior.
In our paper we systematically deconstruct the core deficiencies of established penetration testing practices. These methodologies are characterized by several critical constraints that limit their effectiveness in the context of an evolving threat landscape, including:
Over-Reliance on Predefined Toolkits: Traditional testing is heavily dependent on tools configured with known exploits and attack patterns. This approach is inherently limited to documented vulnerabilities (CVEs) and lacks the adaptive capabilities required to respond to dynamic defenses or discover zero-day flaws.
Template-Driven Rigidity: Frameworks such as the OWASP Top 10 and the Penetration Testing Execution Standard (PTES) enforce a structured but inflexible process. This checklist-based approach cannot account for the complex interplay of system configurations or simulate an adversary who alters their tactics, techniques, and procedures (TTPs) mid-attack.
Lack of Adaptive Feedback: A significant flaw in traditional testing is the absence of a dynamic feedback loop that allows the tester to iteratively refine tactics based on environmental responses. Real-world adversaries, in contrast, continuously gather intelligence during an operation and adjust their strategies accordingly.
Limited Scope and Coverage: These tests often focus on individual systems in isolation, thereby missing complex attack paths that exploit inter-component relationships, such as lateral movement across network segments. Human-driven vectors like social engineering are also frequently under-evaluated.
APTM introduces a formal mathematical structure to model the penetration testing process, drawing inspiration from decision-making models and game theory. The model is defined by the 5-tuple
M=(S,A,T,R,γ),
where:
S (States): Represents the set of possible system states, such as "Firewall bypassed" or "User access gained".
A (Actions): The set of possible actions an agent can take, which is the union of two distinct subsets:
P(aD∣s)=1, assuming all preconditions are met.
T (Transitions): The transition probability function, T:S×A×S→[0,1], which defines the probability of moving from one state to another given a specific action.
R (Rewards): A function that assigns a numerical value to a state, indicating its desirability relative to the agent's goal.
γ (Discount Factor): A value between 0 and 1 that balances the preference for immediate versus long-term rewards.
This framework mirrors a Markov Decision Process (MDP) for deterministic scenarios and incorporates principles of a Partially Observable MDP (POMDP) when dealing with the uncertainty inherent in non-deterministic actions.
The Environment (E) is the dynamic, multi-layered digital ecosystem in which the agent operates. It is not a static target but a reactive system that can change in response to the agent's actions. The APTM defines the environment with granular layers, including infrastructure, applications, security controls, identity management, and even a human behavioral layer, each influencing the outcome of actions.
The Agent (A) is the intelligent, goal-directed entity executing actions. It can be a human operator, an automated script, or an AI/ML-driven system. The agent is formally defined as
A=(K,Σ,Λ):
Knowledge Base (K): The information the agent possesses about the environment, which is continuously updated.
Strategy (Σ): The decision-making framework, or policy, used to select actions. This can range from a simple rule-based tree for deterministic plans (ΣD) to adaptive, probabilistic planning for non-deterministic scenarios (ΣN).
Adaptation Loop (Λ): The feedback mechanism that modifies the agent's behavior and strategy based on action outcomes.
The feedback loop (Λ) is the core of APTM's dynamic nature, enabling the agent to learn and optimize its strategy over time. After each action, the agent observes the outcome and updates its knowledge base and policy. This can be implemented using mechanisms like Reinforcement Learning (RL). For instance, a Q-learning update rule can be used to refine the value of state-action pairs:
This allows the agent to learn an optimal policy, π∗, that maximizes the expected cumulative reward while accounting for the cost of actions (in terms of time, resources, and stealth risk).
The paper provides a practical application of APTM in a scenario named "The Silent Slice," which targets a private 5G network. The agent's process illustrates the model's dynamism:
Initial State (S0): The agent begins with no access. An initial non-deterministic action (API probing) fails but provides environmental feedback.
Foothold (S0 → S1): The agent adapts, using a deterministic action (OSINT and exploiting a known CVE) to compromise a staging server.
Core Network Analysis (S1 → S2): From the new foothold, the agent uses deterministic scanning to identify 5G Network Functions (NFs) and then applies a non-deterministic action (protocol fuzzing) against the Network Repository Function (NRF). The feedback loop is critical here, as the agent tunes its fuzzing parameters based on the NRF's responses to trigger an information leak.
Objective Achievement (S2 → S5): The agent uses the leaked data (a deterministic analysis) to identify a target User Plane Function (UPF). It then exploits a known CVE on the UPF (deterministic) to bypass filters, redirect traffic, and intercept the target sensor data, thus achieving its goal.
The Adversarial Penetration Testing Model offers a significant methodological advancement over traditional security assessments. By formalizing the testing process with a mathematical framework that incorporates both deterministic and non-deterministic actions, APTM allows for a more realistic and effective simulation of intelligent adversaries. Its emphasis on an adaptive feedback loop enables the modeling of an attacker that learns and adjusts its strategy, providing security teams with a far more comprehensive and accurate evaluation of their system's security posture against advanced, persistent threats.
We are developing a prototype in order to evaluate the efficacy of the model in calculating optimal attack paths given a domain resources and environmental conditions.
Contact us for more information: