
Building a data center isn’t enough - it must prove it can perform under real conditions. That’s where commissioning and testing come in. These processes ensure critical systems like power, cooling, and controls work together seamlessly and can handle failures. Two key methods are Integrated Systems Testing (IST) and Pull-the-Plug Testing. Here’s what you need to know:
Integrated Systems Testing (IST) is the final and most rigorous step in the commissioning process for data centers. By this stage, individual systems have already been started up and verified in earlier phases. IST brings everything together, testing all systems collectively under realistic, operational conditions. Think of it as a full-scale dress rehearsal where power, cooling, fire suppression, and control systems are tested in scenarios that mimic real-life operations, including simulated failures.
The primary purpose of IST is to ensure that critical systems work seamlessly together. As Crestchic explains:
"Integrated Systems Testing is critical because it validates that systems have been designed and commissioned correctly to keep the environment stable, safe, and online - even under stress." [7]
By identifying potential issues early, IST helps avoid costly surprises later on.
The next section dives into the specific systems examined during IST.
Unlike earlier tests that focus on individual components, IST evaluates the entire facility's infrastructure simultaneously. As Specifying Engineer explains:
"integrated systems testing incorporates multiple related systems, rather than a single piece of equipment or a single system." [8]
Here’s a breakdown of the systems typically assessed during IST:
| System Type | Components Tested |
|---|---|
| Electrical Systems | PDUs, UPS units, backup generators |
| Mechanical Systems | Chilled water systems, heating systems, cooling units |
| Fire and Life Safety | Fire alarms, suppression systems |
| IT and Security Systems | Network infrastructure, security systems |
| Emergency Power Systems | Utility outage simulations, generator response tests |
It’s not enough for these systems to operate correctly on their own - they must also respond predictably when other systems fail or change states. For example, a generator that functions perfectly in isolation is of little use if it cannot handle the load during a simulated utility outage.
These comprehensive tests are crucial to ensure the data center meets its performance and reliability standards.
Despite its importance, IST is often misunderstood. One common misconception is viewing it as a mere formality - a simple, low-risk process to confirm that everything works as expected. However, IST is far more than routine maintenance or a vendor sign-off. It’s the ultimate stress test of how all systems interact under challenging and unpredictable conditions. As CxPlanner warns:
"The final IST is the ultimate proving ground - but it's often too clean, too rehearsed, and too optimistic." [5]
A real-world example illustrates this point. During an IST at a hyperscale data center, TechSite uncovered a critical flaw: alarm systems failed to trigger correctly during simulated failures. If this issue had gone unnoticed until after the data center went live, it could have posed serious operational risks. [4]
This example underscores the importance of conducting IST under conditions that closely resemble real-world scenarios. It’s not just about ensuring individual systems work - it’s about confirming they work together, even under stress.
Data Center Commissioning: IST Execution Sequence Step-by-Step
Executing Integrated Systems Testing (IST) effectively starts with thorough preparation. At the core of this preparation are two key documents: the Owner's Project Requirements (OPR) and the Basis of Design (BOD). These documents outline the facility's performance goals - such as target critical load, redundancy levels, and acceptance criteria. Every test script stems from these foundational documents.
The Commissioning Provider (CxP) plays a pivotal role in creating detailed IST scripts, which define the step-by-step process for each simulated failure scenario. Supporting documentation is equally important and includes:
For large-scale data center construction projects, having these documents approved and distributed before the first day of testing is critical. Additionally, daily coordination meetings should be held before testing begins. These meetings bring together the Owner, CxP, General Contractor, and vendors to review the day's tests, address potential obstacles, and confirm safety measures.
Finally, ensure all individual systems have passed their standalone tests before progressing to integrated testing.
Before diving into integrated testing, specific prerequisites must be met. All systems should have already completed earlier commissioning phases, with sign-offs for Level 3 (System Start-Up) and Level 4 (Functional Testing) in place.
Other key items to confirm include:
The rollback plan is especially important to prevent test scenarios from escalating into uncontrolled incidents.
Once the documentation and prerequisites are in place, IST can proceed with a standardized sequence of activities:
| Step | Activity |
|---|---|
| 1. Baseline Verification | Confirm normal operating conditions, including stable power, standard draw, and active BMS schedules. |
| 2. Utility Power Loss | Simulate a utility power shutdown; verify that UPS alerts and BMS/SCADA notifications activate as expected. |
| 3. Emergency Power | Confirm generators start, synchronize, and provide power while emergency lighting activates. |
| 4. Load Shedding | Ensure non-critical systems shut down while critical systems remain operational. |
| 5. Thermal Response | Monitor CRAC/CRAH units for stable temperature control during power transitions. |
| 6. Fault Injection | Conduct "Red Team" scenarios, such as stuck breakers, sensor signal failures, and simultaneous ATS malfunctions. |
| 7. Restoration | Restore utility power and verify systems restart and synchronize automatically. |
| 8. Data Review | Collect and review time-stamped logs from BMS/SCADA for reporting and punch list resolution. |
If any test fails, systems must be restored to their pre-test state immediately, and the root cause must be addressed before reattempting the test.
The Red Team fault injection phase (Step 6) is particularly valuable for identifying hidden vulnerabilities. As CxPlanner explains:
"Red Teaming involves actively probing for weaknesses. In commissioning, that means injecting controlled failures, and surfacing hidden vulnerabilities before the facility is live." [5]
Examples of edge cases to include during fault injection are a partial UPS battery failure during generator warm-up, loss of BMS communication across multiple devices, and power restoration while a system remains in failover mode.
After all steps are completed and the Commissioning Provider confirms that every Level 5 task is resolved, a "White Tag" is typically affixed to the equipment. This serves as a physical sign-off that the system has successfully passed integrated testing. [3]
After completing integrated systems testing, facilities often conduct pull-the-plug testing to simulate a complete power outage.
This test replicates a real-world electrical grid failure by physically disconnecting utility power from the facility. Unlike fault injection during integrated systems testing - which simulates failures at a software or sensor level - pull-the-plug testing directly challenges the hardware. The aim is to verify that switchgear, backup generators, and the Programmable Logic Controllers (PLCs) responsible for transitioning to on-site power all function as intended [10].
Power failure is the leading cause of major data center outages, with most issues occurring during the transition from grid power to backup systems rather than within the generators themselves [10].
There are three main approaches to conducting a pull-the-plug test, each with its own level of risk and realism:
| Method | Description | Pros | Cons |
|---|---|---|---|
| Utility Coordination | The grid provider cuts all incoming power. | Most realistic; tests the entire system. | Expensive; hard to schedule; requires utility technicians on-site [10]. |
| Isolation Device | Disconnecting power via upstream switches or breakers. | Lower cost; no utility involvement needed. | May bypass critical detection systems; prone to human error [10]. |
| Transformer Fuses | Removing fuses at the transformer to simulate power loss. | Safest option; quick recovery; less strain on UPS batteries. | Doesn't fully test UPS batteries; some processes may be skipped [10]. |
The choice of method depends on your facility's risk tolerance, budget, and any agreements with tenants or colocation clients.
Approximately 70% of data centers conduct pull-the-plug tests, with 95% performing them annually [10]. This high adoption rate reflects the importance of verifying backup power systems.
These tests ensure that UPS systems, generators, Automatic Transfer Switches (ATS), and cooling systems work together under load. They also expose flaws that more controlled tests might miss [1][5]. As Douglas Donnellan, Senior Research Associate at the Uptime Institute, explains:
"Pull-the-plug testing provides the most comprehensive assessment of these systems... [but] organizations that interrupt their power supply independently... are at risk of performing an incomplete test." [10]
However, the risks are just as real. An improper transfer to backup power can result in unexpected downtime. Using isolation devices without verifying their functionality can also bypass key control logic, leading to a false sense of security [10]. Additionally, high-voltage environments pose safety risks for personnel conducting the test.
Thermal risks are another critical factor. While a facility might pass the electrical portion of the test, it could fail if cooling systems - like CRAC/CRAH units and chilled water systems - don’t restart quickly enough to prevent servers from overheating once power is restored [1]. Ensuring that cooling recovery is part of the test is essential.
With these benefits and risks in mind, implementing strict safety protocols is crucial for a successful pull-the-plug test.
Safety is the top priority during pull-the-plug testing. A structured approach, starting with staged testing, is essential. Each component should first pass standalone and functional tests before attempting a full power-loss scenario. Skipping these steps can turn a controlled test into a chaotic event.
Every test must include a rollback plan and predefined abort criteria. For example, if a UPS battery reaches a critical level, the team should immediately halt the test [5]. A list of potential issues, along with specific response plans, should also be prepared in advance.
Vendor support on-site is critical. Equipment vendors and safety personnel should be present to monitor systems and respond if any component fails during the transition [10]. If isolation devices are used and installed outdoors, they must be safeguarded against extreme weather to avoid hardware failures at crucial moments [10].
For colocation facilities, it’s essential to coordinate with tenants before scheduling tests. Some contracts may limit when or how these tests can be conducted, and addressing conflicts ahead of time prevents unnecessary disputes during the process [10].
The success of a commissioning program largely depends on assembling the right team. Integrated Systems Testing (IST) and pull-the-plug testing involve live electrical systems, intricate control logic, and the potential for significant risks. Because of this, every team role must be clearly defined before testing begins.
At the core of the operation is the Commissioning Provider (CxP) or Commissioning Authority (CxA). They are responsible for writing and issuing test scripts, overseeing the entire testing program, and being physically present during critical scenarios like live failover or load transfers [9]. The General Contractor manages load bank operations, coordinates on-site subcontractors, and ensures all punch list items are addressed before final sign-off [3].
Two often-overlooked roles are the Energy Marshal and the Building Operator. The Energy Marshal oversees the Energy Control Program (ECP), ensuring all Lockout-Tagout (LOTO) procedures are followed during high-risk energization phases [11]. Meanwhile, the Building Operator undergoes detailed training to troubleshoot and maintain systems independently after handover [3].
MEP specialists bring technical expertise, reviewing facility designs and ensuring the IST program aligns with the Owner’s Project Requirements (OPR) [2][11]. Equipment vendors are also key players, providing on-site support for systems like UPS units and generators, which require detailed product knowledge during testing [3].
Here’s a quick breakdown of the primary responsibilities for each role:
| Role | Primary IST Responsibility |
|---|---|
| Owner / PM | Approves scripts, signs off on level completion, and owns the facility [3] |
| CxP / CxA | Writes scripts, manages testing, and issues the final close-out report [3] |
| General Contractor | Operates load banks and removes temporary equipment after testing [3] |
| Building Operator | Supports testing and completes facility operative training [3] |
| Energy Marshal | Manages the Energy Control Program (ECP) and electrical safety protocols [11] |
| Vendors | Provides technical support and equipment-specific integration [3][11] |
A well-structured team is only effective if supported by strong governance to ensure decisions are made efficiently.
Clear governance is critical to avoid confusion during testing, especially when it comes to deciding if operations should be paused. The Owner or Project Manager has the final say on script approvals, permits, and close-out reports, while the CxP oversees day-to-day operations [3]. Using a RASCI matrix (Responsible, Accountable, Supportive, Consulted, Informed) to define roles for each IST task ensures that decisions are made quickly and by the right person.
Daily coordination workshops help the team stay aligned and address potential issues early. Additionally, two-week look-ahead schedules ensure resources are planned effectively. While AI tools can assist with administrative tasks, human experts must retain control over acceptance criteria and life-safety decisions. Ultimately, engineers and owners bear the responsibility for final outcomes.

Staffing a commissioning team comes with its own challenges. With global data center capital expenditures expected to surpass $400 billion by 2026, there’s a growing demand for skilled workers. Out of the 650,000 positions needed for data center construction and operations in 2026, approximately 340,000 are projected to remain unfilled [12]. The most difficult roles to hire for are often MEP Managers and Commissioning Managers, who are essential for ensuring facilities are energized on time [12].
iRecruit.co specializes in filling these critical roles. Founded by Dallas Bond and Tanya Runholt, the firm focuses on the intersection of construction, MEP, and systems integration - skills that are crucial for Level 1 through Level 5 commissioning. Their services range from placing embedded recruiters within client organizations to conducting executive searches for leadership roles and providing comprehensive recruiting support for field positions.
One insight from their 2026 market data is that Commissioning Managers for hyperscale projects now earn base salaries between $160,000 and $240,000, with top specialists exceeding $280,000 [12]. Additionally, roles in data center construction tend to pay 25% to 32% more than those in standard commercial construction, making it essential to benchmark offers against these rates. iRecruit.co also stresses that because commissioning specialists cannot be hired as quickly as general construction labor, securing a Commissioning Manager during the design phase - rather than waiting until construction is nearly complete - can significantly impact a project’s success [12].
Operational readiness is the hallmark of a fully functional data center. To achieve this, commissioning must go beyond simply completing construction - it demands a disciplined, step-by-step approach. When done right, thorough commissioning can enhance performance by 10–20% and significantly reduce rework costs. However, this only works if every stage is fully validated before moving forward. Skipping steps might seem like a shortcut, but it often leads to risks that show up when you least expect them.
One of the most conclusive tests is pull-the-plug testing. By physically cutting off utility power, this method ensures that generators kick in, UPS systems maintain the load, and transfer switches operate correctly - all under real-world conditions. This kind of testing is critical to preparing data centers for the challenges of tomorrow.
With advancements like AI, liquid cooling, and the massive power needs of gigawatt-scale facilities, commissioning is evolving into an ongoing process. For example, modern GPU racks, such as NVIDIA Blackwell-class units that generate between 60–100 kW of heat per rack [12], highlight the importance of continuous performance checks.
"In a world where downtime can cost millions per hour, precision testing is not just a safeguard - it's a competitive advantage." - Cadence [6]
The most successful teams share a few key habits: they involve the commissioning authority early, meticulously plan for every possible failure scenario, and view turnover documentation as a dynamic, ongoing record rather than a last-minute task. These practices, rooted in rigorous testing and attention to detail, are what distinguish data centers that open on schedule from those that don’t.
Integrated Systems Testing (IST) needs to be planned as the last step in the commissioning process. This should only happen after successfully completing site acceptance testing and functional verification for each individual system. To minimize risks and prevent delays, it's crucial for project teams to include these milestones in the construction schedule right from the beginning. Before starting IST, double-check that all prerequisites have been thoroughly verified to avoid wasting time or effort.
The most frequent IST setbacks that delay go-live stem from poor coordination and logic breakdowns under stress. Common challenges include unexpected load imbalances during power transfers, communication failures in the Building Management System (BMS), conflicting control setpoints, and incomplete documentation. On top of that, a lack of certified MEP specialists often stretches timelines. Hidden alarm failures within the BMS can also go unnoticed, leading to delayed responses and further disruptions during testing.
To perform a pull-the-plug test safely, start by confirming that all systems are functioning properly and are connected to the Building Management System (BMS). Conduct detailed safety checks to identify any weak points that could lead to failures. Instead of disconnecting utility power directly, use simulation techniques such as pulling transformer fuses or activating isolation devices to replicate a power outage. Make sure you have well-defined acceptance criteria, a solid rollback plan, and experienced personnel on-site to oversee the process, manage transitions, and handle any unexpected problems.



