Where does an on-prem SOC 2 pen test stop? At the public load balancer? At the edge firewall appliance? At the production VLAN boundary? At the iDRAC interface on a server in the cage that nobody has touched in eighteen months? The scoping decisions made before the test starts matter more than the tooling choices made during it, and they are the part most SOC 2 pen test guidance leaves out.
The cloud-native version of this question is short. A web app behind a load balancer, a handful of public API endpoints, identity managed by an IdP, everything else abstracted away by the provider. Scope the external pen test, hand the auditor a PDF, move on. On-prem and hybrid environments do not collapse that cleanly. The team owns more of the stack, so the attack surface is wider, and the pen test that satisfies CC4.1 and CC7.1 in that environment looks different. This post covers how to scope a SOC 2 pen test for an on-prem or hybrid network, how findings flow back into the remediation program, and how to pick a testing firm that produces the evidence package SOC 2 expects.
Pen testing is one of the few security activities the AICPA names explicitly in the 2017 Trust Services Criteria. It sits at the intersection of two criteria most cloud-centric content treats separately.
Two criteria, one activity
CC4.1 answers does someone periodically and independently evaluate whether the controls are actually working. CC7.1 answers are the technical vulnerabilities being identified across the environment. A pen test produces evidence for both at the same time.
CC4.1 is the monitoring activities criterion. It expects the entity to run a mix of ongoing and separate evaluations to ascertain whether internal control components are present and functioning. One of its Points of Focus lists the acceptable separate evaluation types, and penetration testing is named among them alongside vulnerability scans, security assessments, internal audit, and third-party assessments. An annual pen test is the canonical separate evaluation a SOC 2 auditor expects to see.
CC7.1 is the vulnerability identification and monitoring criterion. It expects the entity to detect changes that introduce new vulnerabilities and to identify susceptibilities to newly discovered vulnerabilities. A pen test exercises the controls rather than just cataloguing software, so it surfaces what continuous scanning cannot.
The sibling posts on on-prem vulnerability scanning and on-prem patch management cover the ongoing-detection half of CC7.1. This post covers the separate-evaluation half and the remediation loop.
One pen test covering everything is usually the wrong framing. On-prem environments have distinct attack surfaces that require different skill sets and different rules of engagement. A good scoping conversation separates them before the engagement starts.
| Test Type | What it covers | When to include |
| External network | Public web servers, API endpoints, VPN gateways, mail relays, perimeter firewall ruleset | Always. Minimum baseline for every SOC 2 environment annually |
| Internal network | Lateral movement from a foothold, default credentials on management interfaces, weak SMB and RDP, legacy services | Always for on-prem. Produces the most valuable findings |
| Network segmentation | Whether the corporate or guest VLAN can reach the production VLAN without a legitimate path | Always when multiple VLANs carry different trust levels |
| Application | Web apps, APIs, and custom services tested to OWASP methodology; phased unauthenticated then authenticated | When the team maintains custom applications or APIs |
| Wireless | Rogue AP detection, WPA3 enforcement, guest-network isolation | When office wireless touches the production or management path |
| Out-of-band management | iDRAC, iLO, IPMI exposure and authentication, management VLAN isolation | Always on bare metal. Folded into internal or segmentation testing |
External network testing works from an external IP with no credentials, probes for exposed services, checks for misconfigurations and unpatched vulnerabilities, and tries to gain a foothold. This test also validates that the perimeter firewall ruleset matches the documented design.
Internal network testing gives the tester a foothold inside the corporate or production network, through a jump host or a dropped device, and they work outward. This is where on-prem environments produce the most valuable findings, because the internal test validates a different assumption than the external one: if an attacker gets one foothold, the blast radius is contained. Internal testing typically surfaces weak segmentation, default credentials on management interfaces, exposed iDRAC and iLO ports, weak SMB and RDP exposure, and legacy services nobody remembered were still running.
Network segmentation testing is a focused subset of internal testing. Can the tester reach the production VLAN from the corporate VLAN, or from a guest wireless segment, without a legitimate credential and path. This is the test that validates the VLAN architecture covered in the on-prem network security post. For SOC 2, segmentation evidence often matters more than raw vulnerability counts, because the segmentation boundary is the control auditors rely on when assessing blast radius.
Application testing covers web applications, APIs, and custom services the team maintains. Methodology follows OWASP and equivalent application testing standards. Legacy applications benefit from a phased scope: unauthenticated testing first (validate authentication flows, check for MFA bypass, probe the external surface), then authenticated testing (authorization boundaries, business logic, data access controls with valid credentials). The phased approach prevents known framework-level vulnerabilities such as outdated libraries and deprecated UI components from dominating the findings and crowding out application-specific risks.
Physical testing at the colocation is rarely in scope for a baseline SOC 2 pen test. The colocation provider's own physical controls and their CSAE 3000 or SOC 2 report cover that boundary as a subservice organization control.
SOC 2 does not prescribe a frequency. Annual is the cadence the market has settled on and the one auditors expect to see. The reasoning is pragmatic. A pen test is a point-in-time evaluation. Run it too rarely and the findings reflect a stale architecture. Run it too often and the cost outruns the value because the controls under test have not changed enough to retest. Annual matches how on-prem environments actually change: semiannual firmware rollouts, quarterly firewall reviews, major architecture changes once or twice a year.
Higher-risk environments often add a second narrower test on a six-month cadence, such as a segmentation-only retest or a focused application test on a newly released module. Two additional triggers warrant out-of-cycle tests: a material architecture change and a material incident that exposed a control gap. CC4.1's Adjusts Scope and Frequency Point of Focus is the direct hook for that decision.
Scoping is where most pen tests go sideways. The firm proposes a template scope, the team signs it without pushing back, and the findings come in with the wrong depth in the wrong places. A scoping conversation that reflects the environment takes a couple of hours and prevents most of the rework.
Inclusions. A clear asset list is the core of the scope. For external testing, IPs, hostnames, and public services with a note on which belong to the company and which belong to a subservice provider. For internal testing, the VLANs in scope and the applications the tester may interact with. For segmentation testing, the explicit pairs of network zones the test will attempt to traverse. For application testing, the specific applications, the target environment (isolated staging with an erasable database copy is the right answer for almost every legacy application), and the authentication model.
Exclusions. Systems genuinely out of scope for the SOC 2 audit, third-party services the subservice provider owns, and devices where an active test carries unacceptable production risk. Exclusions get documented in the statement of work so the auditor can see the boundary decisions were deliberate. A pen test with no exclusions is usually a red flag that the scoping conversation did not happen.
Known issues. Flag framework-level vulnerabilities and legacy components the team already tracks. Otherwise they dominate the report and crowd out findings that need action. From recent engagements, legacy applications with outdated UI libraries or deprecated frameworks are the classic case where pre-flagging known risks sharpens the report dramatically.
Rules of engagement. Testing windows, escalation contacts, communication cadence during the engagement, and techniques that are out of bounds (social engineering and DoS are the common ones).
Three deliverables to name in the SOW
The retest is what turns a pen test from a point-in-time snapshot into a closed-loop control.
The pen test market includes a wide range of quality, from serious boutique firms producing findings that shape the next year of security work to template-driven vendors running an automated scan and rebranding the output. Four signals matter more than the sales pitch.
Accredited firm or accredited testers. CREST and CHECK are the two widely recognized independent accreditations. OSCP is the baseline credential for individual hands-on testers. None are strictly required by SOC 2, but their presence is a proxy for methodology, documentation discipline, and ongoing training.
Methodology references. The firm should work from a recognized methodology: PTES (Penetration Testing Execution Standard), NIST SP 800-115, or OWASP Testing Guide for application work. A statement of work that does not reference any methodology usually means the testing is ad hoc.
Retest included. Retest after remediation should be built into the contract from the start, not added as a line item later. Firms that charge separately for retest often underprice the initial engagement and treat findings as a volume play rather than a closed loop.
Sample report review. Ask to see a redacted sample of a prior report. A strong report has a clear executive summary, CVSS-scored findings with business-context risk ratings, concrete reproduction steps, specific remediation recommendations, and a retest section that records the closure of each finding. A weak report is a scan output with a cover page.
Patterns from on-prem engagements cluster around a predictable set of findings. Knowing the list before the test runs helps the team pre-empt the low-hanging ones.
These are findings an external pen test will rarely surface and an internal test will almost always surface. That asymmetry is the reason internal testing is worth commissioning even when external testing is already a clean run.
A pen test report that does not feed back into the security program is a compliance artifact, not a control. The loop that makes the test useful runs through the same ticketing and change workflow that handles vulnerability remediation, covered in SOC 2 change management with tickets instead of CI/CD.
When the report lands, each finding gets triaged using the same severity ladder used for vulnerability scanning. Critical and high findings on internet-facing systems get the same SLA as critical vulnerabilities: a 48-hour first response and remediation tied to the next maintenance window. Medium and low findings land in the standard patch and change cycle. Exceptions get a risk acceptance ticket with compensating controls and a named review date.
The retest is the closing step
Without it, the team has evidence of findings but no evidence of closure. With it, the CC7.1 evidence package becomes: original pen test results, remediation tickets showing the work, scan-after-remediation output showing the technical fix, and a retest section confirming the finding is closed. That is a closed-loop control, and it is what makes a SOC 2 Type 2 sample defensible.
Findings also feed durable program improvements. Any finding the pen test caught that routine scanning missed is a signal the scanning configuration needs tuning. A finding rooted in a baseline miss is a signal that the configuration baseline drifted or was never enforced across every system in the tier. Pen test findings are feedback for the program, not just items for a remediation queue.
Truvo scopes on-prem pen tests and wires remediation into an effective security program that holds up under audit.
Pen test evidence under CC4.1 and CC7.1 follows the same three-part continuous evidence pattern the rest of the program uses.
Teams that get pen testing right do not treat it as a line item on the annual compliance calendar. They treat it as the separate evaluation that stress-tests the rest of the program and feeds durable improvements back into it. Scanning tells the team what is on the asset list. Pen testing tells the team what an attacker would actually find once they are in. One is continuous and broad, the other periodic and deep. Mature programs need both, and a SOC 2 program that cites CC4.1 and CC7.1 without an annual pen test is a thin program.
Build the pen test program once with a scope that matches the real attack surface and a remediation loop that matches how the team already runs vulnerability work. Map it onto SOC 2, ISO 27001, and CPCSC without restart. Incidents and testing are feedback loops, and mature programs get stronger from both.
CC4.1 and CC7.1 in the 2017 Trust Services Criteria (with revised Points of Focus, 2022) are the two criteria most directly engaged by a pen test program. CC4.1 frames pen testing as a monitoring activity, and the 2022 revision explicitly names penetration testing as one of the valid separate evaluation types. CC7.1 frames pen testing as a vulnerability identification activity that complements ongoing scanning.
CC4.1 governs how the entity selects, develops, and performs ongoing and separate evaluations to ascertain whether internal control components are present and functioning.
CC7.1 governs how the entity detects changes that introduce new vulnerabilities and susceptibilities to newly discovered vulnerabilities.
Explore further in Framework Explorer: CC4.1 ยท CC7.1, see the full requirement, implementation guidance, evidence types, and cross-framework mappings.
Source: AICPA TSP Section 100, 2017 Trust Services Criteria with Revised Points of Focus (2022). Point of Focus characteristics described in Truvo's words and mapped to an on-prem penetration testing implementation pattern. Consult the source document for the official AICPA text.
SOC 2 does not prescribe a specific cadence. Annual is the cadence the market has settled on and the one auditors expect to see. CC4.1's Points of Focus on adjusting scope and frequency to the rate of change support that baseline. Higher-risk environments often add a second narrower test on a six-month cadence, and material architecture changes or incidents should trigger out-of-cycle testing regardless of when the last full test ran.
A baseline on-prem pen test typically covers external network testing (internet-facing services and the perimeter firewall), internal network testing (lateral movement from a foothold inside the network), and segmentation testing (whether the corporate VLAN can reach the production VLAN). Application testing is added when the team maintains custom web or API services. Wireless testing is added when office wireless touches the production or management path. Physical testing at the colocation is rarely in scope because the provider's own physical controls cover that boundary.
No. SOC 2 does not mandate CREST, CHECK, OSCP, or any other accreditation. Those credentials are proxies for methodology, documentation discipline, and ongoing training. The real requirement is that the testing is independent, uses a recognized methodology (PTES, NIST SP 800-115, OWASP), and produces a report the auditor can sample as evidence.
Each finding gets triaged into the same ticketing workflow the team uses for vulnerability remediation, with SLAs matched to severity. The evidence package the auditor samples includes the original pen test report, the remediation tickets showing the work, scan output confirming the technical fix, and a retest section in a follow-up pen test report confirming closure. The retest is the critical closing step that turns a pen test from a point-in-time snapshot into a closed-loop control under CC7.1.
Scanning is continuous and broad. It catalogues known vulnerabilities across the asset inventory and produces the ongoing-evaluation evidence CC7.1 expects. Pen testing is periodic and deep. It exercises the controls, probes for architectural and business logic issues scanning cannot catch, and produces the separate-evaluation evidence CC4.1 expects. Mature SOC 2 programs run both.