SOC 2 Penetration Testing for On-Premise Networks

Written by Ali Aleali | Apr 12, 2026 1:04:02 AM

TL;DR

Pen testing maps to CC4.1 (separate evaluations, where the AICPA names penetration testing explicitly) and CC7.1 (vulnerability detection)
Scope decisions matter more than tooling: external, internal, network segmentation, application, wireless, and out-of-band management interfaces (iDRAC, iLO, IPMI) all need to be on the table
Annual is the SOC 2 baseline cadence; findings flow back into the patch and remediation program through the same ticket workflow as scanner output
Look for CREST or CHECK accreditation and OSCP-credentialed testers, with retest included in the engagement scope
A pen test report that surfaces real findings is more valuable than a clean report; auditors read repeated clean reports skeptically

Where does an on-prem SOC 2 pen test stop? At the public load balancer? At the edge firewall appliance? At the production VLAN boundary? At the iDRAC interface on a server in the cage that nobody has touched in eighteen months? The scoping decisions made before the test starts matter more than the tooling choices made during it, and they are the part most SOC 2 pen test guidance leaves out.

The cloud-native version of this question is short. A web app behind a load balancer, a handful of public API endpoints, identity managed by an IdP, everything else abstracted away by the provider. Scope the external pen test, hand the auditor a PDF, move on. On-prem and hybrid environments do not collapse that cleanly. The team owns more of the stack, so the attack surface is wider, and the pen test that satisfies CC4.1 and CC7.1 in that environment looks different. This post covers how to scope a SOC 2 pen test for an on-prem or hybrid network, how findings flow back into the remediation program, and how to pick a testing firm that produces the evidence package SOC 2 expects.

How Penetration Testing Maps to the Trust Services Criteria

Pen testing is one of the few security activities the AICPA names explicitly in the 2017 Trust Services Criteria. It sits at the intersection of two criteria most cloud-centric content treats separately.

Two criteria, one activity

CC4.1 answers does someone periodically and independently evaluate whether the controls are actually working. CC7.1 answers are the technical vulnerabilities being identified across the environment. A pen test produces evidence for both at the same time.

CC4.1 is the monitoring activities criterion. It expects the entity to run a mix of ongoing and separate evaluations to ascertain whether internal control components are present and functioning. One of its Points of Focus lists the acceptable separate evaluation types, and penetration testing is named among them alongside vulnerability scans, security assessments, internal audit, and third-party assessments. An annual pen test is the canonical separate evaluation a SOC 2 auditor expects to see.

CC7.1 is the vulnerability identification and monitoring criterion. It expects the entity to detect changes that introduce new vulnerabilities and to identify susceptibilities to newly discovered vulnerabilities. A pen test exercises the controls rather than just cataloguing software, so it surfaces what continuous scanning cannot.

The sibling posts on on-prem vulnerability scanning and on-prem patch management cover the ongoing-detection half of CC7.1. This post covers the separate-evaluation half and the remediation loop.

Scope: Test Types That Cover an On-Prem Attack Surface

One pen test covering everything is usually the wrong framing. On-prem environments have distinct attack surfaces that require different skill sets and different rules of engagement. A good scoping conversation separates them before the engagement starts.

Test Type	What it covers	When to include
External network	Public web servers, API endpoints, VPN gateways, mail relays, perimeter firewall ruleset	Always. Minimum baseline for every SOC 2 environment annually
Internal network	Lateral movement from a foothold, default credentials on management interfaces, weak SMB and RDP, legacy services	Always for on-prem. Produces the most valuable findings
Network segmentation	Whether the corporate or guest VLAN can reach the production VLAN without a legitimate path	Always when multiple VLANs carry different trust levels
Application	Web apps, APIs, and custom services tested to OWASP methodology; phased unauthenticated then authenticated	When the team maintains custom applications or APIs
Wireless	Rogue AP detection, WPA3 enforcement, guest-network isolation	When office wireless touches the production or management path
Out-of-band management	iDRAC, iLO, IPMI exposure and authentication, management VLAN isolation	Always on bare metal. Folded into internal or segmentation testing

External network testing works from an external IP with no credentials, probes for exposed services, checks for misconfigurations and unpatched vulnerabilities, and tries to gain a foothold. This test also validates that the perimeter firewall ruleset matches the documented design.

Internal network testing gives the tester a foothold inside the corporate or production network, through a jump host or a dropped device, and they work outward. This is where on-prem environments produce the most valuable findings, because the internal test validates a different assumption than the external one: if an attacker gets one foothold, the blast radius is contained. Internal testing typically surfaces weak segmentation, default credentials on management interfaces, exposed iDRAC and iLO ports, weak SMB and RDP exposure, and legacy services nobody remembered were still running.

Network segmentation testing is a focused subset of internal testing. Can the tester reach the production VLAN from the corporate VLAN, or from a guest wireless segment, without a legitimate credential and path. This is the test that validates the VLAN architecture covered in the on-prem network security post. For SOC 2, segmentation evidence often matters more than raw vulnerability counts, because the segmentation boundary is the control auditors rely on when assessing blast radius.

Application testing covers web applications, APIs, and custom services the team maintains. Methodology follows OWASP and equivalent application testing standards. Legacy applications benefit from a phased scope: unauthenticated testing first (validate authentication flows, check for MFA bypass, probe the external surface), then authenticated testing (authorization boundaries, business logic, data access controls with valid credentials). The phased approach prevents known framework-level vulnerabilities such as outdated libraries and deprecated UI components from dominating the findings and crowding out application-specific risks.

Physical testing at the colocation is rarely in scope for a baseline SOC 2 pen test. The colocation provider's own physical controls and their CSAE 3000 or SOC 2 report cover that boundary as a subservice organization control.

Why Annual Is the Baseline Cadence

SOC 2 does not prescribe a frequency. Annual is the cadence the market has settled on and the one auditors expect to see. The reasoning is pragmatic. A pen test is a point-in-time evaluation. Run it too rarely and the findings reflect a stale architecture. Run it too often and the cost outruns the value because the controls under test have not changed enough to retest. Annual matches how on-prem environments actually change: semiannual firmware rollouts, quarterly firewall reviews, major architecture changes once or twice a year.

Higher-risk environments often add a second narrower test on a six-month cadence, such as a segmentation-only retest or a focused application test on a newly released module. Two additional triggers warrant out-of-cycle tests: a material architecture change and a material incident that exposed a control gap. CC4.1's Adjusts Scope and Frequency Point of Focus is the direct hook for that decision.

Scoping the Engagement

Scoping is where most pen tests go sideways. The firm proposes a template scope, the team signs it without pushing back, and the findings come in with the wrong depth in the wrong places. A scoping conversation that reflects the environment takes a couple of hours and prevents most of the rework.

Inclusions. A clear asset list is the core of the scope. For external testing, IPs, hostnames, and public services with a note on which belong to the company and which belong to a subservice provider. For internal testing, the VLANs in scope and the applications the tester may interact with. For segmentation testing, the explicit pairs of network zones the test will attempt to traverse. For application testing, the specific applications, the target environment (isolated staging with an erasable database copy is the right answer for almost every legacy application), and the authentication model.

Exclusions. Systems genuinely out of scope for the SOC 2 audit, third-party services the subservice provider owns, and devices where an active test carries unacceptable production risk. Exclusions get documented in the statement of work so the auditor can see the boundary decisions were deliberate. A pen test with no exclusions is usually a red flag that the scoping conversation did not happen.

Known issues. Flag framework-level vulnerabilities and legacy components the team already tracks. Otherwise they dominate the report and crowd out findings that need action. From recent engagements, legacy applications with outdated UI libraries or deprecated frameworks are the classic case where pre-flagging known risks sharpens the report dramatically.

Rules of engagement. Testing windows, escalation contacts, communication cadence during the engagement, and techniques that are out of bounds (social engineering and DoS are the common ones).

Three deliverables to name in the SOW

An executive summary a board or auditor can read
A technical report with finding-level detail and reproduction steps
A retest after remediation, included in scope, not added as a line item later

The retest is what turns a pen test from a point-in-time snapshot into a closed-loop control.

Picking a Testing Firm

The pen test market includes a wide range of quality, from serious boutique firms producing findings that shape the next year of security work to template-driven vendors running an automated scan and rebranding the output. Four signals matter more than the sales pitch.

Accredited firm or accredited testers. CREST and CHECK are the two widely recognized independent accreditations. OSCP is the baseline credential for individual hands-on testers. None are strictly required by SOC 2, but their presence is a proxy for methodology, documentation discipline, and ongoing training.

Methodology references. The firm should work from a recognized methodology: PTES (Penetration Testing Execution Standard), NIST SP 800-115, or OWASP Testing Guide for application work. A statement of work that does not reference any methodology usually means the testing is ad hoc.

Retest included. Retest after remediation should be built into the contract from the start, not added as a line item later. Firms that charge separately for retest often underprice the initial engagement and treat findings as a volume play rather than a closed loop.

Sample report review. Ask to see a redacted sample of a prior report. A strong report has a clear executive summary, CVSS-scored findings with business-context risk ratings, concrete reproduction steps, specific remediation recommendations, and a retest section that records the closure of each finding. A weak report is a scan output with a cover page.

Common Findings on On-Prem Networks

Patterns from on-prem engagements cluster around a predictable set of findings. Knowing the list before the test runs helps the team pre-empt the low-hanging ones.

Unpatched legacy systems. Hardware or software past end of vendor support, usually running because a critical business process depends on it. Remediation is almost always network isolation plus a documented replacement plan.
Weak network segmentation. The corporate VLAN can reach ports on the production VLAN nobody intended. The management VLAN is not actually isolated because a dual-homed admin workstation bridges it. A guest wireless segment has a route to the internal subnet.
Default credentials on management interfaces. iDRAC, iLO, IPMI, switch admin accounts, UPS management, KVM consoles. Every engagement finds at least one, usually on a device installed years ago and never revisited.
Exposed out-of-band management. iDRAC or iLO accessible from more of the network than it should be, often the result of a temporary configuration that became permanent.
Weak SMB and RDP exposure. SMB signing not enforced, SMBv1 still enabled on legacy shares, RDP reachable from the corporate VLAN to production without an intermediate jump host.
Unauthenticated internal services. Printers, old web consoles for discontinued tools, internal dashboards, monitoring panels. Feeds reconnaissance and lateral movement rather than direct exploitation.

These are findings an external pen test will rarely surface and an internal test will almost always surface. That asymmetry is the reason internal testing is worth commissioning even when external testing is already a clean run.

The Remediation Loop

A pen test report that does not feed back into the security program is a compliance artifact, not a control. The loop that makes the test useful runs through the same ticketing and change workflow that handles vulnerability remediation, covered in SOC 2 change management with tickets instead of CI/CD.

When the report lands, each finding gets triaged using the same severity ladder used for vulnerability scanning. Critical and high findings on internet-facing systems get the same SLA as critical vulnerabilities: a 48-hour first response and remediation tied to the next maintenance window. Medium and low findings land in the standard patch and change cycle. Exceptions get a risk acceptance ticket with compensating controls and a named review date.

The retest is the closing step

Without it, the team has evidence of findings but no evidence of closure. With it, the CC7.1 evidence package becomes: original pen test results, remediation tickets showing the work, scan-after-remediation output showing the technical fix, and a retest section confirming the finding is closed. That is a closed-loop control, and it is what makes a SOC 2 Type 2 sample defensible.

Findings also feed durable program improvements. Any finding the pen test caught that routine scanning missed is a signal the scanning configuration needs tuning. A finding rooted in a baseline miss is a signal that the configuration baseline drifted or was never enforced across every system in the tier. Pen test findings are feedback for the program, not just items for a remediation queue.

Scope a Pen Test That Reflects Your Stack

Truvo scopes on-prem pen tests and wires remediation into an effective security program that holds up under audit.

Book Your Strategy Call

Evidence: What the Auditor Samples

Pen test evidence under CC4.1 and CC7.1 follows the same three-part continuous evidence pattern the rest of the program uses.

Configuration proves the program exists: the pen test policy naming cadence, scope types, vendor qualifications, and remediation loop
Execution history proves the activity happens on cadence: executed statements of work, test dates, kick-off and debrief records, and any out-of-cycle tests triggered by architecture changes
Representative samples prove the output is meaningful: the executive summary and technical report from the most recent test, the remediation tickets that closed critical and high findings, and the retest report confirming closure

Where Pen Testing Lands in an Effective Security Program

Teams that get pen testing right do not treat it as a line item on the annual compliance calendar. They treat it as the separate evaluation that stress-tests the rest of the program and feeds durable improvements back into it. Scanning tells the team what is on the asset list. Pen testing tells the team what an attacker would actually find once they are in. One is continuous and broad, the other periodic and deep. Mature programs need both, and a SOC 2 program that cites CC4.1 and CC7.1 without an annual pen test is a thin program.

Build the pen test program once with a scope that matches the real attack surface and a remediation loop that matches how the team already runs vulnerability work. Map it onto SOC 2, ISO 27001, and CPCSC without restart. Incidents and testing are feedback loops, and mature programs get stronger from both.

How CC4.1 and CC7.1 Points of Focus Show Up in Penetration Testing

CC4.1 and CC7.1 in the 2017 Trust Services Criteria (with revised Points of Focus, 2022) are the two criteria most directly engaged by a pen test program. CC4.1 frames pen testing as a monitoring activity, and the 2022 revision explicitly names penetration testing as one of the valid separate evaluation types. CC7.1 frames pen testing as a vulnerability identification activity that complements ongoing scanning.

CC4.1: Monitoring Activities

CC4.1 governs how the entity selects, develops, and performs ongoing and separate evaluations to ascertain whether internal control components are present and functioning.

A mix of ongoing and separate evaluations. Management balances continuous monitoring with point-in-time independent reviews. Scanning and SIEM monitoring are the ongoing evaluations; the annual pen test is the separate evaluation.
Rate of change drives cadence. Frequency and depth consider how quickly systems change. Environments with rapid architectural change may need tighter cadence or out-of-cycle tests after material changes.
Knowledgeable personnel. Evaluators have enough knowledge to understand what they are looking at. CREST, CHECK, and OSCP are proxies for that knowledge.
Integration with business processes. Evaluations are woven into operations, not bolted on. A pen test that triggers the same remediation ticket workflow as the vulnerability scanning program is the practical expression of this PoF.
Scope and frequency adjust to risk. The direct hook for out-of-cycle pen tests triggered by architecture changes or incidents, and for higher-risk environments adding a second narrower test on a six-month cadence.
Objective evaluation. Separate evaluations provide objective feedback. A pen test run by the same team that built the controls is not objective. A pen test run by an independent firm with its own methodology is, which is the structural reason pen testing is contracted out.
Different types of separate evaluations. This is the critical one. The 2022 revision explicitly names penetration testing as one of several acceptable separate evaluation types, alongside first- and second-line monitoring, internal audit, compliance assessments, resilience assessments, vulnerability scans, security assessments, and third-party assessments. This is the direct AICPA acknowledgment that a pen test is a valid CC4.1 evaluation, and the clearest signal that an environment whose only separate evaluation is the annual audit itself is under-evaluating its controls.

CC7.1: Vulnerability Detection and Monitoring

CC7.1 governs how the entity detects changes that introduce new vulnerabilities and susceptibilities to newly discovered vulnerabilities.

Defined configuration standards and monitoring for noncompliance. Pen test findings that violate a documented baseline are a signal that either the baseline was not enforced or drifted after enforcement, both of which feed back into the configuration program. Pen testing catches noncompliance scanning misses, particularly issues rooted in architecture or business logic rather than missing patches.
Change-detection mechanisms. A pen test will often surface configuration drift that change-detection tooling missed, which becomes an input into tuning that program.
Detects unknown or unauthorized components. Internal pen tests routinely find forgotten or unmanaged systems (old appliances, decommissioned-but-still-powered hardware, unauthorized wireless APs) the asset inventory does not list.
Conducts vulnerability scans. Pen testing complements rather than replaces this PoF. Scanning runs continuously and broadly. Pen testing runs periodically and deeply. Both are required to satisfy CC7.1 in full.

Explore further in Framework Explorer: CC4.1 · CC7.1, see the full requirement, implementation guidance, evidence types, and cross-framework mappings.

Source: AICPA TSP Section 100, 2017 Trust Services Criteria with Revised Points of Focus (2022). Point of Focus characteristics described in Truvo's words and mapped to an on-prem penetration testing implementation pattern. Consult the source document for the official AICPA text.

Frequently Asked Questions

How often does SOC 2 require a penetration test?

SOC 2 does not prescribe a specific cadence. Annual is the cadence the market has settled on and the one auditors expect to see. CC4.1's Points of Focus on adjusting scope and frequency to the rate of change support that baseline. Higher-risk environments often add a second narrower test on a six-month cadence, and material architecture changes or incidents should trigger out-of-cycle testing regardless of when the last full test ran.

What scope should a SOC 2 pen test cover for on-prem environments?

A baseline on-prem pen test typically covers external network testing (internet-facing services and the perimeter firewall), internal network testing (lateral movement from a foothold inside the network), and segmentation testing (whether the corporate VLAN can reach the production VLAN). Application testing is added when the team maintains custom web or API services. Wireless testing is added when office wireless touches the production or management path. Physical testing at the colocation is rarely in scope because the provider's own physical controls cover that boundary.

Does SOC 2 require a specific accreditation for pen testing firms?

No. SOC 2 does not mandate CREST, CHECK, OSCP, or any other accreditation. Those credentials are proxies for methodology, documentation discipline, and ongoing training. The real requirement is that the testing is independent, uses a recognized methodology (PTES, NIST SP 800-115, OWASP), and produces a report the auditor can sample as evidence.

How do pen test findings feed into SOC 2 remediation evidence?

Each finding gets triaged into the same ticketing workflow the team uses for vulnerability remediation, with SLAs matched to severity. The evidence package the auditor samples includes the original pen test report, the remediation tickets showing the work, scan output confirming the technical fix, and a retest section in a follow-up pen test report confirming closure. The retest is the critical closing step that turns a pen test from a point-in-time snapshot into a closed-loop control under CC7.1.

What is the difference between vulnerability scanning and penetration testing for SOC 2?

Scanning is continuous and broad. It catalogues known vulnerabilities across the asset inventory and produces the ongoing-evaluation evidence CC7.1 expects. Pen testing is periodic and deep. It exercises the controls, probes for architectural and business logic issues scanning cannot catch, and produces the separate-evaluation evidence CC4.1 expects. Mature SOC 2 programs run both.

View full post