A PFMEA (Process Failure Modes and Effects Analysis) is an excellent analysis tool, enabling the team populating and applying this tool to identify areas of weakness and/or risk associated with a specific manufacturing process. A PFMEA is most frequently applied to the analysis of a manufacturing process, but can work just as well when applied to a transactional, or business process. This article primarily refers to the application of the PFMEA to manufacturing or non-transactional processes, but many of the lessons to be learned can be extrapolated to these other applications as well.
PFMEAs are widely utilized in the automotive industry, currently driven by the AIAG 4th addition FMEA instruction manual. Due to the longstanding success of this tool in automotive, PFMEAs have become increasingly popular in many other industries as well. While working for many large automotive-related corporations directly, and as a quality transformation consultant, I have observed a litany of frequently occurring issues with PFMEAs. Although there is considerable training available to teach users how to populate the many segments of the PFMEA document, mismanagement of the tool remains prevalent. These universal pitfalls negate the primary purposes of the PFMEA, which include prioritizing improvement opportunities and reducing risk, both to the business as well as to respective customers. Two key contributors to the misuse of the PFMEA tool are the lack of training on optimizing the content of the document, not merely completing it, and a clear understanding of the critical shortcomings of the tool. As a result, some common mistakes are made over and over, significantly reducing the effectiveness of the tool.
There are a number of different, but similar, versions of the PFMEA form. One commonly used version is shown below (Figure 1). Throughout this article some common errors or misconceptions will be highlighted along with the particular segment of the document in which they typically occur, addressing the content optimization mentioned above.
Figure 1.
The first common issue pertains to incoming material. Although the general rule is to assume that the material coming into the process is not defective, there are some exceptions. For example, if data is available that clearly shows a chronic quality issue with incoming material, it should be captured and the risk of escaping your process should be mitigated. Although it is not desirable to error proof for your supplier (even when that supplier is a previous process across your facility), available data should lead the team to take action. In some cases, it is not possible for the supplier to test for a particular failure mode, as it only presents itself when integrated into your system or product. This is a situation that should be identified as a potential failure mode, and detection should be implemented, until a robust preventive solution can be executed by the supplier.
Another common omission in PFMEAs pertains to test processes. Too often when test processes (quality gates) are captured in a PFMEA the team redundantly lists all of the failure modes/defects that should be detected by utilizing the test process. For example, if an electronic part is assembled backwards at operation 2 and is detected at operation 9, the final, functional test, this should be captured in the PFMEA when operation 2 is being evaluated, and appear in the detection column. It should not be listed exactly the same again when operation 9, the test process, is evaluated. When operation 9 is evaluated the team should identify those failure modes/defects that can occur due to the test process. For example, perhaps the product could be overstressed electrically during the test, or a critical product surface is damaged as the product is placed in the test fixture, or the test process applies excessive force to the product and damages the product connector. This approach is especially critical when a final test is being evaluated, as this may be your last line of defense before the customer receives the product. Thorough evaluation of all potential failure modes that can occur at each test process is absolutely critical in a comprehensive PFMEA.
The previous two issues were related to the column listing the potential failure mode. But since we are thinking about test processes, another common shortcoming on many PFMEA documents is that a quality gate (typically test) will be listed for many, many different failure modes throughout the PFMEA. Particularly with a final, functional test, many teams assume that the test will detect most every type of defect without actually proving that this is the case. Particularly when a PFMEA is performed early in the development process, the team expects the final test to detect just about every non-conformance. The test capability must be proven, it must be demonstrated for each type of defect. This is a chronic issue which I have observed literally hundreds of times. For example, the PFMEA states that when component XYZ is inadvertently omitted during assembly, the test process will detect it. The team should build a product omitting component XYZ and prove that the test process actually will detect it. Without proof, a disastrous quality spill can occur, since everyone mistakenly assumed that the test process would protect the customer.
Severity is an important consideration as it may indicate the impact to your customer, or the end customer. The rankings of 9 or 10 are reserved for the most severe impact, related to safety or regulatory requirements. Many PFMEA participants do not realize that there are two severity ranking tables. One pertains to the severity to your customer or the end customer, the other table ranks the severity within your four walls, the impact to your manufacturing operation. I observed a great example of this whereby a very inexpensive, innocuous component that required machining would frequently escape the machining process. When this occurred the unmachined component would pass to the next process and break the stamping die, guaranteeing a 5+ hour production shutdown as the die was being replaced. The PFMEA team correctly identified the minimal severity to the end customer, but failed to identify the catastrophic results of this defect to their own manufacturing facility. The actual risk in this scenario is particularly high, the top priority of fixing this issue will only be discovered when this internal severity is ranked correctly (8 to 10), or after repeatedly shutting down production to replace the broken dies.
Another common misconception with severity rankings is when a safety product is being evaluated all severity rankings must be a 9 or 10. This is not correct, only those that impact the specific safety-related performance characteristics must be in these categories, not every single failure mode.
The classification category is reserved for those features, functions and characteristics that are selected as critical or special or have some other customer or internal designation identifying them as significant and essential to product performance. These items require additional attention by the PFMEA team. Routinely I have identified critical characteristics in PFMEAs that have a completely objectionable level of risk. For example, a defect with a relatively frequent occurrence and then no offsetting detection capability at all! Critical characteristics should be considered high risk until absolutely proven otherwise, and the emphasis should always be on reducing occurrence through the prevention of defects, not on detection.
The three ranking categories, severity, occurrence and detection are rarely completely understood by PFMEA participants and therefore not applied correctly. When these rankings are persistently underestimated the value in completing a PFMEA is negated. Occurrence is by far the most misused ranking, which is difficult to comprehend as it is the only one of the three that is completely objective, while severity and detection are both somewhat subjective. Occurrence is mathematically calculated, it is absolutely straightforward and simple to derive. It would be difficult to estimate the number of PFMEAs I have reviewed through my preventive error proofing training and consulting, but I would approximate over 90% of them contained occurrence rankings that were drastically underestimated. I regularly see occurrence rankings of 1, 2 and 3 throughout entire PFMEAs. Unless your quality is outstanding, and your manufacturing is exceptionally high volume, these numbers are most likely significantly underestimated. Thoroughly examine the AIAG 4th edition PFMEA ranking definitions for occurrence below in Table 1.
Table 1
The most successful approach I have found to highlight the chronic artificial deflation of occurrence rankings is to customize the ranking table with additional columns that correspond to current manufacturing volumes. Most manufacturing facilities can group their products into a small number of approximated production quantities, which can then be used to calculate the frequency of defects for each quantity to correlate with the 1-10 ranking. This has been completed in Table 1 for a specific client, and illustrates their low volume, medium volume and relatively high volume build quantities. Now take a look at an occurrence ranking of 2. Even when producing 25,000 products per week the defect rate must be less than 1 defect in 40 weeks of production! When producing only 1000 products per week, the occurrence cannot exceed 1 defect in almost 20 years of production! Since a PFMEA is specific to a particular product, the quantities used in the table must represent each product volume, not the consolidated or total production volumes. Try this for your own products and production volumes, you may be amazed at the true frequency of occurrence of many of your defects. It is not unusual for many of these rankings previously identified as 2 and 3 to become 7 or 8 when the occurrence of defects is properly calculated, which increases the risk calculation significantly! If your scrap rates are habitually high, and your PFMEAs show consistent occurrence rankings of 2 and 3 this is a clear indication of improper calculations which once again, negates both purposes of completing PFMEAs, prioritization and risk reduction.
In an effort to maintain a suitable length for this article and allow us to move from content optimization to PFMEA critical shortcomings, I have provided a few additional regularly observed mistakes below without delving into a protracted explanation of each.
- Whenever disposition of the defect is a genuine concern, it should be identified and addressed as a separate failure mode from the error proofing of the actual manufacturing process.
- Set-up/changeover errors and defects must be captured, they may be well known by operations but may not be identified when analyzing a specific product/process combination in isolation.
- Off-line repair activities must be included. These processes typically have high occurrence of errors as they are not well documented, operators may not be well trained, the product may not re-enter the production line at the right location.
- Standardized work is not preventive by default, although frequently listed in the prevention category. For example, a visual inspection operator is not achieving preventive results through standardized work.
- Human visual inspection detection cannot be lower than 7, and is usually 8 since it is typically downstream.
- The current process controls prevention column can only include vision if the vision is used to prevent a defect, not detect one (this is rare).
- When the defect is random the prevention column should not contain such items as first and last piece inspection, auditing, set-up, incoming inspection, etc. Random defects will most likely remain undetected with these approaches as they are targeting sequential defects.
- Many/most failure modes have multiple root causes, many root causes can result in multiple failure modes (see examples below). A PFMEA that aligns each failure mode with a single root cause is incomplete.
Although the PFMEA process can provide an organization with valuable information and data, users must realize it possesses fundamental limitations. Many companies and champions have devised a multitude of additional steps and methods to ensure the highest risk issues and items are properly identified. Common approaches include:
- Setting an RPN (Severity X Occurrence X Detection) hurdle to drive all risk levels below this hurdle (not recommended as it encourages the team to artificially deflate the rankings)
- Setting a Severity hurdle above which these failure modes must receive additional focus
- Setting an Occurrence hurdle to focus the team on prevention versus detection and containment
- Setting a hurdle for all critical and special characteristics to ensure these features and functions receive the most attention when prioritizing action
- Adding columns to the form to capture additional data in an effort to address shortcomings with the PFMEA tool (some of these items that can be added are discussed below)
A significant list of deficiencies with the current PFMEA process and tool could be created. Although the process is not intended to be a standalone method to mitigate all risk for each manufacturing process, knowledge of some of the most critical insufficiencies should be understood by the PFMEA team. Examples of these include the following:
- Complete lack of preference for prevention over detection and containment solutions
- Not comprehending which defects are repairable and which are guaranteed scrap
- Insufficient identification of just how far downstream a defect will be detected
- No capability to document the value of a defect when it is detected
- No requirement to identify which processes are batch and which are single piece flow
- No recognition that implementing additional human visual inspection processes to existing inspection does not result in a significant increase in detection capability
The lack of penalizing a defect detection solution when compared to a defect prevention solution is perhaps the most critical limitation of the PFMEA tool.
Consider the following (RPN = S x O x D):
Option 1 – RPN1 = 9 x 8 x 3 = 216 | Option 2 – RPN2 = 9 x 3 x 8 = 216
Option 1 has high Occurrence but good detection capability, whereas Option 2 has low Occurrence and poor detection. (Occ ranking of 8 = 1 in 50 Occ ranking of 3 = 1 in 100,000 per AIAG manual)
Option 1 is Detection focused, Option 2 is prevention focused. But the PFMEA process would signify that the risk, via the RPN is the same, indicating that the two scenarios are equal. Misunderstanding the difference between options such as these may jeopardize your business. Pursuing a detection-focused strategy will lead to significant rework and repair and/or very high scrap. In addition, this is what can lead to quality spills. You typically will not have quality spills with a prevention-focused strategy as you must create substantial numbers of defects to enable a quality spill.
Items 2, 3, 4, and 5 above are all somewhat related. When defects occur and they are passed forward to be detected downstream, there are many different circumstances that can develop which should drive very different priority and solutions. Catching a defect downstream can describe some extremely diverse situations and a detailed understanding is required to ensure proper business decisions are made so that risk is sufficiently mitigated.
The PFMEA does not differentiate between a defect that is detected at the next process and one that is detected 10 processes later, or perhaps at the very end of the line at final test. This clearly can impact the repairability, number of defective products created particularly for batch processes, value of the product when detection occurs and more. Defect detection downstream is a one-size-fits-all category that can have substantially different financial, quality, productivity and capacity results. Has your facility ever run out of raw material when it was consumed to produce defective product that ended up sitting in the quality hold area?
These are situations I have observed while consulting that may not be identifiable through the use of the standard PFMEA process:
- Defects created while batch assembling a sub-component that are not detected until the sub-component is in the final assembly. This may be days later, and many processes downstream.
- Defects that are created early in the process, but are not detected until end of line testing, when the product is no longer repairable.
- A $0.05 component assembled into a product that is valued at over $5000 when the defect is detected, and the product can no longer be repaired.
- A $10,000 product that requires an entire shift to build, and all inspection and testing occurs at the end of the product build.
- Defective material from an external supplier is built into a product, after the material has tripled in value through machining and processing, and is then detected at a downstream test.
- Sub-components that are allowed to be repaired over and over and continue to re-enter the value stream when a clear quality requirement is two repair cycles only.
Through these examples it becomes clear that not all defect detection downstream solutions are equally risky or should have equal priority. Unfortunately they will have equal detection rankings misleading the PFMEA team. Too often robust downstream detection capability is highly valued, such as with automated final test. When these downstream solutions are thoroughly analyzed the unintended results can be detrimental to quality performance, and a negative impact to critical metrics such as customer complaints, scrap, first time quality, productivity, capacity and more.
In summary, if applied correctly and the content is optimized, the PFMEA process can provide your organization with a great deal of knowledge and data. Mitigating the risk to your business as well as to your customers may require significant consideration beyond what is contained in many of the currently recommended PFMEA formats. A great way to supplement the existing form is to include additional columns to document repairability, product value when a defect is detected, batch or one piece flow, number of processes between defect creation and defect detection, and other important decision criteria that will truly allow your teams to substantially reduce the risk of errors and defects in your manufacturing operations.