As originally published in the SMTA Proceedings.A PACKAGING PHYSICS OF FAILURE BASED TESTING METHODOLOGYFOR SEMICONDUCTOR IC PART RELIABILITY ASSESSMENTJingsong Xie and Ming SunRelEng Technologies, Inc.Bethesda, MD, [email protected] XieEngent Inc.Norcross, GA, USAABSTRACTFunctional testing is repeatedly performed at several IC partmanufacturing stages, from post wafer fabrication topackaging, it is very important to understand itsinefficiencies and weaknesses, such as, time zero staticelectrical functional testing without finding or predictingpotential lifetime and operating stress associated quality andreliability issue. Reducing or eliminating these inefficienciesand weaknesses enables an IC part manufacturer to drivedown the risk of delivering a bad part or potential a bad partin the lifetime to customers and associated cost of the finalproduct. It is also important to understand the reason andphysics of failure before finalize the testing andquality/reliability assurance flow. In this paper, a riskassessment testing methodology built in the fundamentals ofpackaging physics of failure is discussed in terms ofreliability tests and package assembly process flows,associated with package structure, bill of materials (BOM)and failure mode effects analysis (FMEA).Key words: physics of failure, IC testing, reliabilityassessment.INTRODUCTIONThere are three major manufacturing stages of thesemiconductor IC parts, wafer fabrication, packaging andtesting. Every packaging technology of IC parts has its ownpotential weakness induced by its structure, material,process characteristic and assembly flow, even though,when the parts are well fabricated, the weakness won’tcertainly result in an obvious defect or failure and can besuppressed during its lifetime. Due to unavoidable statisticalflaws in the materials, equipment tooling and process usedto fabricate parts, it is impossible to realize 100% yield onany particular IC parts, where yield refers to the ratio ofgood IC pats to the total number of IC parts. A good IC partis one that satisfies all of its performance specificationsunder all specified conditions. The probability of a badsemiconductor part increases in proportion to its structuresand materials complexity. It also increases bymanufacturing sensitivities that occur in semiconductorparts that rely on the control and/or matching ofsemiconductor components or parameters to achieve theirspecified functionality. The shipment of bad parts leads toan incurred replacement cost, potential loss of reputationand furthermore possible loss of market share. The otherside of this problem is not much better as well. When goodparts are represented as bad, it decreases the part yield and,correspondingly, it decreases the earnings of thesemiconductor manufacturer.It has been well known that, testing is repeatedly performedat several IC part manufacturing stages, such as waferprobing after a wafer fabrication, open/short testing after apackaging process and automated testing equipment (ATE)functional testing after a component level assembly.However, it is still very important to understand itsinefficiencies and weaknesses, for example, time zero staticelectrical functional testing without finding or predictingpotential lifetime and operating stress associated quality andreliability issue. Reducing or eliminating these inefficienciesenables an IC part manufacturer to drive down the risk ofdelivering a bad part to customers and associated cost of thefinal product. It is also important to understand the reasonand physics of failure before finalize the testing andquality/reliability assurance flow.Currently, the most frequent quality assurance method aftercomponent assembly is to use sample burn-in pulled fromfinish goods post final testing (FT), which is the methodrelatively more detecting a potential die level defect ratherthan packaging level defect after assembly. This flow wasdeveloped based on very old and simple packagingstructure, such as low pin count low complexity lead frametypes packages, for instance, Small Outline IntegratedCircuit (SOIC), Quarter Flat Package (QFP) and so on. Atthose decades, compared to a fast growing wafer leveltechnology, the package technology was relative maturedand less complicated and challenge for most semiconductorparts. It is understandable that semiconductor industry ismore focus on the die level quality and reliability post FTand goods shipment. Sample Burn-in was a right choice toadd into QA flow after final testing.However, now days, as packaging technology flying, thepackaging structure and process are becoming more andProceedings of SMTA International, Sep. 27 - Oct. 1, 2015, Rosemont, ILPage 341
more complicated and challenge. The low pin count lowstructure complexity of packaging has been a history, thepackaging induced defects or potential defects are makingup of more and more share in the FT rejects. Furthermore,the defects which can cause long term reliability issue, inaddition to time zero quality issue, is also receiving moreand more concern from both component and system levelmanufactures. Obviously, a risk assessment testingmethodology, based on the fundamentals of packagingphysics of failure, is also needed to detect theseinefficiencies and weakness, especially in terms ofreliability tests and package assembly process flows,associated with package structure, bill of materials (BOM)and failure mode effects analysis (FMEA).METHODOLOGYThere are many issues that affect the perceived quality andreliability of a semiconductor product that is delivered to acustomer. For the manufacture supplier, the rules havechanged quite radically with time. Failure mechanismdriven packaging QA and reliability monitor draws upon thephysical concepts and implementation of process orassembly line controls, process stability and effectivemonitor programs in lieu of qualifying a product basedsolely on a fixed list of tests. A manufacture must identifythose failure mechanisms that may be actuated through agiven packaging structure/process change and design andimplement reliability tests adequate to assess the impact ofthose failure mechanisms on component level reliability.Historical sample burn-in based quality assurance step formajor semiconductor manufacturing flow as describedbelow in double solid arrow, while newly added assemblyrisk assessment testing in single solid arrow below.As described in the above flow, the finish goods shipmentnot only depends on final testing result and traditional burnin result, but also depends on the assessment of packagingassembly risk, especially for those complicated or newpackaging structures and processes. Well-designedreliability and monitor testing methods are an essential toensure the parts shipment as well as the component andsystem level QA and reliability of semiconductor finishgoods.The semiconductor assembly industry uses a techniquecalled acceleration testing to assess packaging reliability.Elevated stresses are used to produce the same failuremechanisms as would be observed under normal useconditions, but in a shorter time period. Acceleration factorsare used by device and assembly manufactures to estimatefailure rates based on the results of accelerated testing. Theneeded QA flows and concerns are very obvious to thecomponent and system level manufactures, but problem isthat, the adequate selection of risk assessment test targetedto effectively detect the various package structure or processinduced defects is difficult. It needs a deep understanding ofthe risk and failure mechanism associated with the process,machine, procedure, criteria and very often, experience aswell. The incorrect selection of a risk assessment testingmethod could result in a total failure of defect detection andserious customer return, such as apply a burn-in testing todetect a potential moisture sensitive defect, or use a hightemperature storage testing to detect a chemical corrosionrelated defect. It could also cause serious financial loss ifapply the incorrect risk testing method during lotsdisposition, such as apply a component level reflow methodto detect the contact related failure of lead frame typepackage.Failure mechanism driven reliability monitor draws uponthe concepts and implementations of line controls, processstability, Failure Mode Effect Analysis (FMEA) andeffective monitoring programs in lieu of qualifying aproduct based solely on a pre-designed and fixed list oftests. A manufacture must identify those failure mechanismsthat may be actuated through a given product or processchange, and design and implement reliability tests adequateto assess the impact of those failure mechanisms oncomponent and system level reliability. In order for this tobe effective, the manufacture must establish a thoroughunderstanding and linkage to their reliability-monitoringprogram, though it is very difficult to cover all of potentialcases in the whole assembly process.Figure 1.Typical QA Flow for Semiconductor Finish GoodsShipment after Consideration of Packaging Assembly RiskDifferent from Statistical Process Control (SPC), reliabilitymonitor program is more for monitoring and improvingreliability involving identification and classification offailure mechanisms, development and use of monitors, andinvestigation of failure kinetics allowing prediction offailure rate at use conditions. Failure kinetics are thecharacteristics of failure for a given physical failuremechanism, such as the stressing, the acceleration factor,activation energy, median life, standard deviation,Proceedings of SMTA International, Sep. 27 - Oct. 1, 2015, Rosemont, ILPage 342
characteristic life, instantaneous failure rate, andfurthermore, more important, lifetime prediction of acomponent mounted into system level product.material, resulting in different activation energy in thethermal effect model and then, different time to failure,failure rate and predicted lifetime.In the packaging reliability monitor program and riskassessment, the reliability testing chosen for the detection offailure rate and defect nature at accelerated conditions iscritical to generating lifetime data in a much shorter periodof time. Release of a reliable product to customers isdependent on this concept. Stressing experienced in the useof environment are accelerated or intently enlarged to alevel to accelerate the time to failure of an individual failuremechanism. The key is to not only duplicate the samefailure mechanism but also failure rates as occur in useconditions. Development of acceleration model is performedthrough knowledge of physics of failure, packagingprocesses, structures, materials and operating conditions. Anacceleration factor is calculated as compared to the useconditions. A summary table of some known semiconductorfailure mechanisms and accelerating stresses is describedbelow in Table 1.In the other hand, for a pure packaging related failure rateand lifetime prediction, in which the silicon relatedactivation energy can be fixed, the packaging relatedactivation energy can be determined by experimental testingand reliability testing result, such as a measurement ofresistance in a DOE. The activation energy parameter willhave a big impact onto the time to failure (TF) andacceleration factor (AF), such as for a failure case on copperpillar bump, well defined and accurate activation energy isvery critical to the failure rate and time to failure estimation.As discussed earlier in this paper, packaging complex of adevice has dramatically grown and its failure phenomenaand mechanisms are much more complicated than before,for example, 3D or 2.5D packaging technologies. Not onlytheir materials, processes and structures are very differentfrom the traditional lead frame packages, such as SOIC,TSOP, TO etc, but their impacts on a silicon die are alsovery different. In terms of reliability and lifetime predictionof a component, their stress field and their process-inducedimpact on the ultra-low K device are much morecomplicated and challenging. Obviously, it is necessary tomodify the accelerating factors in the table 1, based on thenew packaging structures, processes, bills of materials(BOM), to satisfy the need of the new generation ofpackaging technologies and processes.-The new field of packaging technologies does not have along history of known failure models when compared totraditional packages as described in table 1. There are noeasily obtained acceleration factors for 3D through siliconvia (TSV) or 2.5D silicon interposer (TSI) or copper wire orsilver wire or copper pillar, micro-bump, coreless substrate,multi-rows lead frame packages, package on package(POP), wafer level packaging and stacked dies etc.Test CoverageTest coverage is a key quantitative measure of a traditionalelectrical test in terms of its capability and effectiveness infault and failure detection. This detection capability andeffectiveness apparently is a major factor to reliabilityprediction as the detection results are to be used to identifyfailures or faults in the prediction.Through previously established models, which were theresults of the extensive study of standard integrated circuitsreliability science on old wafer node and packagingtechnologies, it is possible and feasible to consider that theacceleration factors are composed by two physicalcomponents, silicon device related and packaging related.For each model in the table 1, for instance, Thermal effects(Arrhenius model), the final acceleration factor can berepresented as the result of the dual impacts by both silicondevice and packaging by means of activation energy.Different structures, materials, processes of packaging, willapply different stress into silicon device with ultra-low KTEST DESIGN AND ASSESSMENTTraditionally, reliability test focuses on acceleration whileelectrical test focuses on functionality or failure modes.Achieving an electrical test design with the capability of ICreliability assessment needs to address test design andassessment issues primarily including:--The quantity of failure modes or faults that can bedetected by a set of tests is properly assessed andidentified. A reliability prediction result will beimpacted by those failure modes or faults that can bedetected in a test.Those failure modes or faults to be detected in the testare also properly accelerated and duplicated in the test,so to ensure their occurrence as long as defects leadingto the failure modes or fault do existThe dependency relationship of different failure modesand/or faults are fully understood and properly modeled,so the detection coverage of a test is properly assessedand reliability prediction is correctly conducted.Below are provided discussions associated with each issue.Test coverage is defined as the fraction, which can bedetected by a test, of all failure modes or faults that canoccur for a device under investigation. Therefore, it can wellbe understood that a hundred percent coverage is usually notpossible to achieve in a test. Some primary reasons include,but may not be limited to:--Insufficient knowledge or lack of understanding aboutroot causes or mechanisms of certain failures or faults tosupport duplication, acceleration, and/or detection ofthose failures or faults in a testLack of effective approaches or tools to test or detectProceedings of SMTA International, Sep. 27 - Oct. 1, 2015, Rosemont, ILPage 343
-Non-technical considerations such as cost and timenecessary for a test to be implementedFault and/or Failure Mode Duplication and AccelerationFor electrical test results to be also used in reliabilityassessment, failure or faults need to be accelerated or thesame failure modes or faults to be duplicated during alimited testing period under certain acceleratedenvironmental stress conditions. It is necessary to ensure atleast theoretically that test results exactly reflect what issupposed to happen in actual field operations.Traditionally, an electrical test is defined associated withfailure modes while a reliability test is defined primarilyaccording to environmental conditions and accelerationfactors. In an IC test capable of both parametric/functionaland reliability assessment, those two definitions need to beestablished with certain connection and their relationship tobe clearly stated.Figure 2 provides a key aspect of the relationship. Thefigure indicates an approach of how a set of failure modes tobe correlated with accelerated environmental stressesthrough correspondent failure mechanisms which can beobtained in some routine engineering practices, such asfailure mode, mechanism and effect analysis (FMMEA).As a failure mechanism defines a physical process leadingto occurrence of certain failures under clarified conditions,identification and knowledge of mechanisms responsible forfaults and/or failures under investigation is essential for theduplication or acceleration of the same faults and/or failures.As a failure mechanism is always associated with certainconditions under which it happens, these conditions alsolead to the conditions to accelerate the process to achievethe occurrence of the faults and/or failures during a limitedtesting period.With this duplication criterion being applied, traditionalelectrical tests can then be designed under certainaccelerated environmental conditions and results can then beused for reliability assessment purposes.Basic Failure MechanismsThe term “basic failure mechanism” is defined in this studyto describe those failure mechanisms that are well known inindustry and well documented and uniquely definedphysical processes with known environmental stresses andfactors to accelerate.Basic failure mechanisms meant in this study primarilyinclude, although may not be limited to:---Figure 2. Failure Modes vs Test Points in a TraditionalDependency MatrixA Duplication Criterion of Known Failure MechanismA failure/fault duplication criterion is needed in a test designto define necessary information to duplicate required failuremodes in an accelerated stress environment. According tothe rules to design an acceleration test and to estimate thecorrespondent acceleration factor, the criterion can bedetermined and is stated as:Failure mechanisms or the physical processes leading to thefailure modes or faults to be duplicated or accelerated in atest be fully understood.Material fatigue and overstress mechanisms, such as Mechanical vibration induced fatigue Thermal fatigue CreepSemiconductor and metallization failure mechanisms,such as Electromigration (EM) Hot carrier injection (HCI) Time dependent dielectric breakdown (TDDB) Negative-bias temperature instability (NBTI)Electrochemical, chemical, and oxidation processes,such as Electrochemical migration Dendrite growth Tin whisker Wet and dry corrosion etc.It can be understood that this concept of basic failuremechanisms helps standardization of failure mechanisminformation and correlating acceleration tests that serve thepurpose of this study with those of regular industrystandards for reliability assessment.Root and Induced Failure Mode“Root failure mode” and “induced failure mode” are twoadditional concepts introduced in this study. Consideringfacts that failures modes and/or faults may not necessarilybe independent from each other and one failure/fault can bea consequence or an effect of another, these concepts arehence used to define such possible dependency relationshipamong different failures modes or faults. As an example thata part or a component failure can lead to malfunctioning ofProceedings of SMTA International, Sep. 27 - Oct. 1, 2015, Rosemont, ILPage 344
an assembly or equipment, the former is considered thecause while the latter the effect.Therefore, root failure modes and induced failure modes aredefined as respectively that:-Root failure modes are those independent failure modeswhich are considered sources of failures;Induced failure modes are those dependent failure modeswhich are considered as consequences or effects of otherfailures or faults.As a result, it is understood that root failure modes andinduced failure modes are associated with the followingcharacteristics that:---Root failure modes are considered being directlyassociated with some root causes of failures/faults aswell as failure mechanisms at specified local sites;Root failure modes can be accelerated and duplicated aslong as correspondent failure mechanisms are knownand failure conditions are applied, while any originalinduced failure modes are not considered duplicableunless their correspondent root failure modes are allidentified and duplicated.A root failure mode, due to its relative simplicitycompared to its induced failure mode counterpart, cangenerally speaking, more likely be definedparametrically, while an induced failure mode,depending upon packaging level in discussion, may bedefined in observations or appearance.Based on the discussion above, it is understood that theconcept of root and induced failure modes helpsdifferentiating those failure modes that are more likelyassociating with basic failure mechanisms (the concept ofwhich are defined in the previous section) and can hence beensured with clearly defined acceleration in test, from thosethat cannot. This concept also helps specifying requirementsfor description and modeling of failure mode dependency,which is to be discussed in the following section.IMPLEMENTATION ISSUESTraditionally, electrical tests serve the purpose offunctionality check or verification, while reliability testsfocuses on acceleration usually with simplified functionaltests and/or parametric measurements for failure detection.Both categories of tests develop in two relativelyindependent systems with different sources of supportiveinformation. To serve the reliability assessment purpose, anelectrical test needs to be taken consideration of reliabilityinformation, such as that of FMEA, Failure analysis (FA)and root cause analysis. This need poses challenges toimplementation in existing industrial practices and systems.Some key issues that need to be properly addressed toachieve effective implementation are discussed in detail inthe following sections.To Determine the Total Number of Potentially ExistingFailures and/or FaultsA primary shortcoming of the existing approach of testcoverage assessment is the assumption of a known totalnumber of potential failure modes, which is unfortunatelymostly likely unknown and needs to be determined for agiven product. It is therefore in this study proposed that thetotal number of faults and failure modes are assessed anddetermined from some original sources of failureinformation, such as FMEA. This approach leads to resultsthat are considered close to the true number and willcontinuously improve with its assessment accuracy aspeople’s knowledge accumulates and product quality andreliability improves within existing quality systems.The figure below shows the expected self-correctionmechanism of the approach with the assessment processflow to obtain the total number of potential failure modesand faults.Failure Mode Dependency and its ModelingAs discussed previously, not all failure modes or faults areindependent, which means that detection of some inducedfailure modes can also be used to sense the occurrence ofother failures or faults if it is a necessary condition and leadsto the occurrence of those induced failure modes. Adependency matrix, also known as a D matrix, provides thisdetection relationship. As a result, not all failure modes needa specifically designed test to detect. Only thoseindependent ones or failure mode sets do.A failure mode dependency matrix is a matrix with failuremodes vs test points/locations or defined test tools. Adependency matrix can be derived from the logic flow offunctions, and in the case of ICs, from the relationship oflogic blocks, signals, parameters and functions, which areusually defined in logic designs and schematics.Figure 3. Self-correction Mechanism of the ProposedApproach to Estimate the Total Number of ProductPotential Faults and FailuresProceedings of SMTA International, Sep. 27 - Oct. 1, 2015, Rosemont, ILPage 345
To Determine Failures and/or Faults Applicable to aSpecific Product or a Product DesignThe process to determine failures and/or faults that areapplicable to a given product or product design from allpotentially existing ones is also known as reliability riskidentification. The objective is to identify all reliability risksor potential failures or faults that need to be considered fortargeting a specific design or a product, which in this studyis an IC under investigation.-This process is little possibly conducted manually, andhence practically, requires automated computer process, tobe discussed in the following session.3. Library of IC Logic BlocksThis library defines internal logic configurations of ICs tobe used as a key source of information for IC level testmodeling and test coverage assessment. It containsinformation including:To Achieve Information Standardization and ProcessingAutomationThe fundamental requirement to achieve automatedprocessing is information standardization. Basic informationneeds to be extracted from its original sources. The term,basic information, here means information that isrestrictedly defined, standardized, and computerrecognizable.Three categories of basic information are identified to servethe objectives of this study, including:-Knowledge or information in knowledgebaseFacts or input design informationResources or other information in libraries and databasesIn which the terms of “knowledge” and “facts” are conceptsdefined in computer science of artificial intelligence (AI)and expert system.The libraries and the knowledgebase mentioned aboveinclude:1. Fault and Failure KnowledgebaseThis knowledgebase as a primary part of built-in expertsystem contains information of logic reasoning andinference. It defines the following but not limited to:--Definition of objectsRequired conditionsCorrespondent faults and/or failures includingdescriptions of modes, sites, mechanisms and rootcausesDuplication or acceleration conditionsTest parameters etc.2. IC Part and Package LibrariesThese two libraries define IC and associated configurationsand features.Information in an IC Part Library includes:-Logic blocksPackages etc.Information in an IC Package Library includes:--IC package names and codesGeometrical features and dimensionsMaterialsLead definitions etc.IC logic blocks and categoriesMajor I/OsTest parameters etc.4. Basic Failure Mechanism and Acceleration LibraryA failure mechanism indicates a physical process to undergowith the presence of certain conditions or stresses. Thislibrary is hence to carry information of correlation betweenwell acknowledged failure mechanisms and their knownacceleration or duplication conditions, including:-Failure mechanismsPhenomenon descriptionsFactors and conditions to accelerate or duplicateAssociated industrial test standards for references etc.5. Test Tool LibraryThis is a library of defined test tools for debugging, ATEand other test applications, with information including:-Test names and codesTest point definitionsTest parametersRequired inputs and conditionsTest descriptions etc.To Identify Individual Tests in a DesignIn an automated information processing flow, a set ofapplicable test tools need to be identified and a test coverageassessment result needs to be provided in a test design.Figure 4 provides a basic processing algorithm to achieve anautomated process to identify test tools for a given IC part.With a set of test tools identified and the test coverage alsoassessed for a test design, detection of failures and faults isthen fully defined. Considering with an identifiedcorrelation of failure modes and mechanisms and thendetermined acceleration conditions and information, a testcan then serve the purpose of reliability assessment.IC part numbers (P/Ns) and codesSuppliersI/O definitionsProceedings of SMTA International, Sep. 27 - Oct. 1, 2015, Rosemont, ILPage 346
the reliability assessment purpose. An integration oftraditional IC electrical tests and reliability tests can beachieved with theoretical issues being well addressed whilewith a computer assisted implementation approach yet to beachieved. This study poses a promising practical approachto provide IC designers and providers with potentially muchmore enhanced reliability assessment information withextensive electrical tests.Figure 4. An Automated Process Flow to IdentifyApplicable Test Tools for a Specified ICSUMMARYIn this paper, we have first discussed the failure mechanismand physics based risk assessment methodology and lifetimeprediction for new semiconductor packaging in theproduction quality monitor and lot disposition. Traditionalmodels have been examined and the modification of thesemodels has been proposed to meet the production monitorrequirement of the new packaging technologies. Due to thecomplexity of new packaging technologies, materials andassembly processes, the acceleration factor and time tofailure are critical to the risk assessment result and decisionof parts shipment. The adequate selection of a monitoringtesting method and fully understanding of testing result andrisk assessment based on the physics of failure are not onlya pure technical decision, but also, very often, is a seriousbusiness decision in the semiconductor parts production,parts shipment and capital investment.REFERENCES1. Overview of Quality and Reliability Issues in theNational Technology Roadmap for Semiconductors,Sematech, Inc 1998.2. Douglass, M. R. (1998) Lifetime estimates and uniquefailure mechanisms of the digital micromirrow device(DMD). Reliability Physics Symposium; 1998 IEEEInternational Volume, Issue 31.3. R. Blish, S. Huber, J. McCullen, N. Mencinger, “UseCondition Based Reliability Evaluation of New PackageTechnologies,” Semate
In this paper, a risk assessment testing methodology built in the fundamentals of packaging physics of failure is discussed in terms of reliability tests and package assembly process flows, associated with package structure, bill of materials (BOM) and failure mode effects analysis (FMEA).