WHY MOST PLANTS RUN REACTIVE MAINTENANCE BY DEFAULT
Reactive maintenance is not a policy choice. It is what happens in the absence of a system. When there is no PM program, technicians respond to failures. The plant learns the rhythm of breakdowns and builds around them: extra WIP staged at the bottleneck, backup suppliers for common emergency parts, maintenance staff permanently on call. The operation adapts to unreliability rather than eliminating it.
This adaptation has a cost that rarely appears explicitly in the budget. It shows up as overtime when a machine fails during a critical run, as expedite charges when a job slips because the equipment was down, as quality escapes when a degraded process runs past where it should have been caught. In plants assessed, these costs are distributed across labor, materials, and overhead lines rather than collected in one place where they would be visible. The manufacturing downtime true cost post covers how to quantify what reactive maintenance actually costs per event.
The shift from reactive to planned maintenance consistently produces improvements in equipment uptime, fewer unplanned downtime events, and lower maintenance cost per output unit. The first 90 days of a real PM program feel like more work, not less, because the backlog of deferred maintenance becomes visible all at once. The payoff typically appears in months four through nine.
This post covers how to build a manufacturing maintenance program from scratch, in the sequence that works: assess where you are, build the equipment list and criticality ranking, build the PM schedule, implement the work order system, and add autonomous maintenance as the organization builds the capability. It is a companion to the manufacturing maintenance KPIs post, which covers how to measure whether the program is working once it is running.
THE FOUR PHASES: A REALISTIC PROGRESSION
Maintenance programs do not go from reactive to world-class in one step. A realistic progression moves through four phases, and most plants should expect to spend significant time in each before the next phase is sustainable.
Phase 1: Reactive. Equipment runs until it fails. Technicians respond. No PM schedule exists or, if one exists on paper, it is not followed. This is the starting point for most plants that have not invested in maintenance discipline.
Phase 2: Planned maintenance. Reactive response continues for emergencies, but a backlog of known repairs is identified, prioritized, and scheduled into the maintenance week in advance. Planned corrective work is reactive maintenance done on your terms rather than the machine's terms. It reduces chaos without yet preventing failures.
Phase 3: Preventive maintenance. A documented PM schedule exists for each piece of critical equipment, based on OEM recommendations or historical failure data. PMs are executed on a time-based or usage-based frequency, before failures occur. The ratio of planned to unplanned work shifts significantly toward planned. Most small and midmarket manufacturers should target Phase 3 as the operating standard.
Phase 4: Predictive maintenance. Condition monitoring (vibration analysis, thermography, oil analysis) allows the plant to predict when a failure is approaching rather than relying on time-based intervals. Predictive maintenance maximizes uptime per maintenance dollar spent, but requires instrumentation, data infrastructure, and technician skills that are a Phase 4 investment. Do not attempt Phase 4 discipline before Phase 3 is stable.
START WITH THE EQUIPMENT LIST AND CRITICALITY RANKING
The foundation of any maintenance program is a complete equipment list with a criticality ranking for each asset. Without this, there is no rational basis for prioritizing PM investment, scheduling maintenance windows, or stocking spare parts.
Build the equipment list by walking the plant with a clipboard. Every piece of production equipment, utility equipment, and facility equipment that requires maintenance gets a row: asset ID, description, location, year installed or purchased if known, and the name of the technician most familiar with it. If no asset IDs exist, create them. A simple numbering scheme (PRESS-001, LATHE-002) is enough to start.
Once the list exists, rank each asset by criticality using three factors: impact of failure on production output, probability of failure given current condition and age, and availability of a backup or workaround if the asset fails. A simple scoring approach: rate each factor 1 to 3 and sum the scores. Assets scoring 7 to 9 are critical. Assets scoring 4 to 6 are important. Assets scoring 3 are non-critical.
Critical assets get full PM programs, documented inspection procedures, and critical spare parts stocked in the storeroom. Important assets get PM schedules but may not require stocked spares for all components. Non-critical assets get a documented list of known issues and corrective action when convenient.
Most plants that complete this ranking for the first time discover they have been spending disproportionate time on non-critical equipment (because it is easy to work on or because the operator requesting repairs is persistent) and under-investing in the assets that would cause the most damage if they failed.
BUILDING THE PM SCHEDULE: OEM TASKS VS. WHAT YOU BUILD YOURSELF
For each critical asset, the starting point for a PM schedule is the OEM maintenance manual. Most equipment comes with a recommended maintenance schedule: daily checks, weekly lubrication, monthly inspections, annual overhauls. These recommendations reflect the manufacturer's design knowledge and should be the baseline.
OEM schedules have limitations. They are written for generic operating conditions and may not match the specific stress the equipment experiences in your plant. A press running two shifts in a high-particulate environment needs more frequent filter changes than the OEM schedule designed for a clean single-shift application. Use historical failure data to calibrate: if a component is consistently failing at 80 percent of the OEM replacement interval, shorten the interval. If it is consistently lasting twice as long, extend it and free up the maintenance time for other work.
For equipment where no OEM documentation exists (common with older machines and custom equipment), build the PM schedule from technician knowledge. Have the technician most familiar with each machine document what they check, when they check it, and what signs indicate an impending failure. Formalize that knowledge into a checklist. This is especially important before that technician retires or moves on.
Organize PM tasks by frequency: daily operator checks, weekly technician checks, monthly inspections, quarterly overhauls, annual rebuilds. Each task should specify the asset, the task description, the tool or material required, the time to complete, and the technician skill level required. "Lubricate main spindle bearings with Mobilux EP2 grease, 3 pumps" produces consistent results. "Grease bearings" does not.
AUTONOMOUS MAINTENANCE: WHAT OPERATORS CAN OWN
Autonomous maintenance transfers basic equipment care from the maintenance department to the operators who run the machines every day. Operators perform daily inspections, cleaning, and simple lubrication tasks that previously required a maintenance call. This frees maintenance technicians for more complex work and catches problems earlier, because the operator interacts with the equipment continuously rather than waiting for a technician's scheduled visit.
The starting point for autonomous maintenance is the daily operator check: a documented list of items the operator checks at the start of each shift. Fluid levels, belt tension, guards in place, abnormal sounds or vibrations, leaks, and cleanliness around the base of the machine. The check takes five minutes and is recorded on a simple form or marked on a visual board that makes compliance visible to the supervisor.
Operators should not perform work that requires specialized tools, technical training, or entry into hazardous energy sources. The boundary between operator maintenance and technician maintenance must be clearly defined, trained, and respected. When that boundary is clear, autonomous maintenance is a stable program. When operators attempt work beyond their training, equipment gets damaged and safety incidents follow.
The OEE framework connects directly to autonomous maintenance. Reducing the availability losses that drag down OEE often requires catching and responding to equipment condition issues faster than the PM schedule allows. Daily operator checks are the mechanism that makes this possible.
THE WORK ORDER SYSTEM FUNDAMENTALS
A work order system is how the maintenance department manages demand. Every job, whether planned or reactive, gets a work order: a record of what was requested, who was assigned, what was done, what parts were used, and how long it took. Without work orders, there is no way to know how maintenance time is being spent, which assets are consuming the most resources, or whether the PM program is actually reducing reactive work over time.
The work order system does not need to be software. A paper-based system with a simple tracking spreadsheet works at plants with one or two maintenance technicians. A computerized maintenance management system (CMMS) is appropriate when the volume of work orders, the number of assets, or the size of the maintenance team makes paper tracking unwieldy. The tool matters less than the discipline of recording every job.
Regardless of the tool, the work order system must capture: asset ID, work type (planned PM, planned corrective, or emergency reactive), labor hours, parts used, and completion date. This data, reviewed monthly, drives three conversations: which assets are consuming disproportionate reactive hours (candidates for PM program redesign or replacement evaluation), what is the ratio of planned to unplanned work (the key health metric of the maintenance program), and how is the parts spend tracking against budget.
PARTS AND STOREROOM MANAGEMENT BASICS
Parts availability is often the binding constraint on maintenance response time. A reactive repair that should take two hours takes eight because the part is not in the storeroom and must be ordered emergency from a distributor. Getting the storeroom right reduces mean time to repair on reactive events and enables PMs to be executed without delays when the scheduled window arrives.
The starting point is to inventory what is in the storeroom against what should be there. Critical spare parts for critical assets should be stocked at defined minimum quantities, with a reorder trigger when stock drops below the minimum. Non-critical parts for non-critical assets do not need to be stocked if procurement lead time is short.
Every part in the storeroom should be tagged with the asset numbers it serves. Parts that do not correspond to any current asset are candidates for disposal. Storerooms that have not been audited in several years typically contain significant inventory value in parts for machines that were scrapped or sold long ago. Recovering that value improves the maintenance budget and reduces the confusion that slows technicians searching for parts under time pressure.
THE STAFFING QUESTION
Before building the PM schedule, the plant needs an honest assessment of whether the current maintenance headcount can execute it. A PM program that requires 40 technician-hours per week of scheduled preventive work cannot run on a single technician who is already fully consumed by reactive repairs.
The transition from reactive to preventive requires either additional maintenance capacity or a reduction in reactive demand during the transition period. Most plants accomplish this by starting the PM program on the most critical assets only, reducing the reactive load on those assets first, and expanding the program as the reactive demand on Phase 1 assets drops.
Outsourcing specific PM tasks to OEM service contracts or third-party maintenance vendors is a valid option for assets where the technical knowledge does not exist internally or where the frequency does not justify internal training. Annual overhauls on complex equipment (CNC machines, presses with custom tooling) are common candidates. The decision to outsource should be asset-specific and cost-justified, not a blanket policy.
THE 90-DAY ROLLOUT SEQUENCE
A maintenance program built all at once and deployed across all assets simultaneously will fail. The administrative overhead is too high, the training gap is too large, and the culture change required to sustain it takes time. A phased 90-day start produces a functioning program for the highest-priority assets without overwhelming the maintenance team or the operators.
Days 1 to 30: Complete the equipment list and criticality ranking. Identify the top 10 critical assets. For each, pull the OEM maintenance manual, document the recommended PM tasks, and create the first PM checklists. Implement daily operator checks for these 10 assets.
Days 31 to 60: Launch the work order system. All reactive repairs for the top 10 assets go through work orders from this point forward. Execute the first round of PMs on the documented schedule. Identify and stock critical spare parts for the top 10 assets.
Days 61 to 90: Review the first 60 days of work order data. Adjust PM frequencies based on what the checklists and work orders are revealing. Add the next tier of important assets to the PM program. Brief the maintenance team on planned-to-unplanned ratio as the primary performance metric going forward.
By day 90, the top-priority assets have a functioning PM program, the work order system is generating usable data, and the organization has built the habits required to sustain both. That is a different starting position than day zero.
P7 Equipment and Maintenance is not a ceiling pillar in the Sharpen 10-pillar framework, but it is directly load-bearing on P5 Planning and Flow, which is. A plant that cannot maintain equipment reliability cannot execute a production schedule with confidence. The ceiling pillar problem post covers why the ceiling pillars are the right starting point: fixing P5 without fixing the equipment reliability that P5 depends on produces limited results.
WHAT TO DO NEXT
Building a manufacturing maintenance program from scratch is a 6 to 12-month project, not a 90-day one. The 90-day sequence gets the foundation in place: the equipment list, the criticality ranking, the PM schedules for critical assets, the work order system, and the autonomous maintenance program for operators. The following six months are about executing that foundation consistently and using the data it generates to keep improving.
The free Sharpen diagnostic at /intake takes about 10 minutes and produces a prioritized roadmap across all ten operational pillars. If Equipment and Maintenance is a gap, the diagnostic will surface it alongside the other constraints and help you sequence the work in the order that produces the most operational impact.