Updated: May 15, 2020
In this post, we'll see that in reality manufacturers are making the same mistakes that have been made for over a century, over and over. BUT WHAT MISTAKES, AND WHY? Are there design errors? Is it poor quality in manufacturing? Or is there something deeper?
Vehicle fires have always made the news. Watching the news media today, you’d be forgiven for thinking that this is a new problem that has arrived with electric vehicles, that they are more prone to such thermal events as they are referred to by lawyers and the politically correct. Journalists, of course, prefer the word fire; even better if they get to say deaths in the same sentence. Sensationalist tendencies and political positions result in a range of editorial viewpoints, none of them encouraging adoption of the technology: The most generous say that manufacturers need to build up more experience, the worst suggest that electric vehicles represent an inherently dangerous technology. Established vehicle manufacturers take the opportunity to have a pop at newbies, so you hear soundbites like ‘not so easy making a car, is it’? But looking at a sample of incidents around the world that were frequent enough to prompt governmental inquiry, it can be seen that the old guard aren’t any better. They were just slower in developing their offerings to the marketplace, but if slow was because they were being more careful, that doesn’t appear to equate to less incidents. It doesn’t look as if experience is worth much.
It seems like there are a lot of questions being asked about the safety of electric vehicles. But my experience and exposure to real problems suggests that there are a very small number of underlying causal mechanisms, many shared with (seemingly forgotten) similar incidents on internal combustion engine vehicles. Those that are unique to battery powered vehicles are shared with manufacturing problems that have also been around for at least a century too.
If this is the case, why do the same mistakes get repeated over and over? Why aren't the lessons being learned? Actually, it isn’t the case that design or manufacturing errors are at the root of the problem, it is the failure to detect them in time. Often this is excused when a fault is very rare, or intermittent, or both. Only once millions of a product have been in the field for months do these weaknesses manifest as catastrophes. Infrequent events are also regarded as special cases, but our experience is that they are not special at all.
It isn't the case that design or manufacturing errors are at the root of the problem, it is the failure to detect them in time.
The following hall of shame list shows such a sample of electric vehicle ‘thermal events’ over the last 10 years that prompted serious (usually governmental) investigations starting on the date shown:
Zotye M300 EV Apr 2011: short between battery cells and aluminum container
Chevrolet Volt Jun 2011: battery coolant leak
Fisker Karma Dec 2011: coolant leak at hose clamp
BYD e6 May 2012: short-circuiting of high voltage lines in distribution box
Dodge Ram 1500 Plug-in Hybrid Sep 2012: Thermal Runaway
Toyota Prius Plug-in Hybrid Oct 2012: saltwater ingress
Mitsubishi i-MiEV and Outlander P-HEV Mar 2013: Thermal Runaway
Nissan Leaf Sep 2015: Unresolved
VW e-Golf Dec 2017: Unresolved
Porsche Panamera E-Hybrid Mar 2018: Unresolved
Hyundai Kona Electric Jul 2019: Unresolved
BMW i8 Mar 2019: Unresolved
Tesla Model S and X Oct 2019: NHTSA open investigation into battery fires not related to collision or impact damage to the pack
Porsche Taycan Feb 2020: Unresolved
The list is certainly incomplete, and is purely based upon what one can find spending an hour searching in the public domain. I cannot verify it all as fact, but enough of it is verifiable that we should be able to see and trust any patterns emerging. One thing that jumps out straight away is that it appears to take years to undertake each investigation, and so for those cases after 2016, as far as the public is concerned, they remain unresolved.
Just as an aside, I am interested in currently hot scientific topics like epidemiology and climatology, and I do read background material on them. The topics are interesting, and it can also be useful to read good explanations supported by good evidence, and compare them with bad examples in order to hone my own skills. My opinions on these topics have developed as a result, but I would never dream of offering advice on problems related to them. That’s because I don’t have direct experience, and I haven’t read enough about the experiences of others. However, I was chuffed to hear this year about a classmate of mine from school who has had an Antarctic glacier named after him; the Larter Glacier.
On the other hand, reliability of manufactured products happens to be the one topic I do know a lot about. More than three quarters of my working life has been spent finding causal explanations for a combination of defective products at what is usually referred to as t=zero and also at t=n (after some period of time in the hands of customers). A fair smattering of problems I’ve seen over the last 30 years resulted in vehicle ‘thermal events’. And guess what? Years ago, the same set of causal mechanisms identified in the above list were also at play on non-electric vehicles. So, for those investigations not yet completed, I am prepared to wager my life’s savings that we will eventually see the same thing.
Over the last thirty years, I have had the benefit of being involved with many more failures than would be experienced by a typical engineer working for a large vehicle producing corporation or their supplier base. Even so, I have to admit that for the first fifteen years, I didn’t start to recognize what I now know to be the common underlying systemic problems. For the first couple of years, each project seemed unique. Then I started to learn which the most effective approaches for characterizing certain types of problem. Approaches that facilitated rapid diagnosis through a progressive search. After a few more years, it became clear that too few people knew what we knew, nevertheless I was naïve enough to think that the message would spread. It clearly has not.
Let’s take another look at the list of explained (resolved?) electric vehicle fires (a shorter list) for which there is an explanation, but this time put the causes into some broad categories.
Zotye M300 EV Apr 2011: Electrical Leak (short between battery cells and aluminum cont.)
Chevrolet Volt Jun 2011: Fluid Leak (battery coolant)
Fisker Karma Dec 2011: Fluid Leak (coolant leak at hose clamp)
BYD e6 May 2012: Electrical Leak (short-circuiting of high voltage lines in distribution box)
Dodge Ram 1500 Plug-in Hybrid Sep 2012: Thermal Runaway
Toyota Prius Plug-in Hybrid Oct 2012: Fluid Leak (saltwater ingress)
Mitsubishi i-MiEV and Outlander P-HEV Mar 2013: Thermal Runaway
There are a couple more that I would love to add, but I don’t know how much information about them is already in the public domain. The pattern holds for those too, and two of the three categories cover ALL of the cases involving internal combustion engine vehicle fires in my own thirty year career with just one exception. To date, at TNSFT we only understand the causal mechanism for one high voltage vehicle battery failure, but again I’d bet a lot of money that the rest of the thermal runaways are closely related.
Now, those of you who understand the difference between symptomatic and topographic diagnosis (see John Allen’s post The Analytic Logic Map: Symptomatic and Topographic Problem Solving or read either of my books) might possibly wonder if I’m claiming to already have the causal explanation for all of these ‘thermal events’. That’s obviously not the case, so what is the value of the categories if we still need to diagnose each problem in turn, topographically and employing a progressive search? You may also be wondering why I would choose to rename something universally known as a ‘short circuit’, and instead call it an ‘electrical leak’. That’s because I want to emphasize a couple of important characteristics common to these faults.
Design errors and poor quality manufacturing happen all the time. It’s a fact of life. Once they are understood, corrective action quickly follows. As I said, failing to detect soon enough is the real problem. In too many cases, gaining the necessary understanding takes much longer than it should, compounding the problem. Disastrously, some things are never understood by an organisation and a promising technology is abandoned in favor of some alternative, sometimes the organisation is damaged beyond repair and even goes out of business.
There are two different primary causes of engineers failing to discover that there is a problem before it’s too late, and follow that by failing to diagnose it in short order. One answer applies to the ‘leaks’ and the other to the battery performance. Fixing these are really easy, the main difficulty is getting folks to see what they are doing wrong, to change their mindsets or paradigms.
In this post I’ll address leaks by explaining one key principle, and in part 2 I’ll address battery performance by explaining another. Actually, the very word LEAK is itself creating the problem. It forces us into a mindset that leads down the path we really need to sidestep if we want to avoid the problems in the hall of shame list above. A leak, in common with anything that breaks, and a few other faults, is something we hope not to get. Sensible organisations ensure that they have systems in place to detect them. Some detection systems are incredibly sophisticated and expensive, so it is clear those organisations are serious about it. These systems often allow the leak to be characterized in ways that indicate the size and locations of the leak paths. This means they are approaching it scientifically, and the information sometimes leads to finding a way to fix it (but not necessarily a full causal explanation). The easy ones to understand are when a leak path has been fabricated into the part or has been caused by damage during assembly, although eliminating (as opposed to detecting) those becomes possible when we change our mindset.
It has to be recognized that once such a failure has occurred, it is already too late to avoid serious consequences. Just as important, detecting a few problems in the plant only prevents the worst products from reaching the customer. There will be many, many more whose performance is marginal (they limped past the finish line in the plant) that will fail in the hands of the customer. That is the fundamental physical nature of rare and intermittent faults.
The mindset on leaks, and indeed a range of faults, must change. That’s the hard part. Implementation of the testing is easy and doesn’t cost much. A leak (and many other types of failure) occurs only when a more fundamental or primary function has crossed a threshold. In the case of a leak, that function is to contain energy in some kind of vessel. It doesn’t matter what the domain or form of the energy is; it could be electrical, it could be hydraulic, it could be thermal and so on. This fundamental function can be characterized in ways that reveal how well the function is being performed. That means that you can see just how much margin of safety there is between performance and failure, even with a very small sample, which means it can be done very early in the product’s life cycle. Importantly, it also means that we can understand what factors really drive how well it performs (that is, diagnose performance) from both design and manufacturing viewpoints.
A leak (and many other types of failure) occurs only when a more fundamental or primary function has crossed a threshold
This fundamental function can be characterized in ways that reveal how well the function is being performed.
Flushing out, and diagnosing, marginal performance is the only way to prevent these incidents from occurring. It is demonstrably clear that, whilst the computer aids to design, and the latest project management and manufacturing techniques have raised productivity by an order of magnitude, the same sort of problems are occurring today that were occurring thirty years ago (and longer, but that is outside of my experience. In fact, these type of problem now dominate the landscape, which means that efforts to prevent failure do work, but only when performance of the function is fully characterized. As long as a failure mode is the result of poor performance of a function such as contain energy that is not fully characterized, that failure will never be prevented. Chapter 7 of the book Diagnosing Performance and Reliability explains a methodology for identifying the at-risk functions of a system, and characterization approaches are covered in chapters 2 and 6 in great detail.
As long as a failure mode is the result of poor performance of a function such as contain energy that is not fully characterized, that failure will never be prevented
In part 2 of this post, I’ll outline a common failing in the way manufacturing processes are characterized, the outcome being that many quality problems are never fully understood, so not fixed. Sometimes, this means that no-one is even aware they exist. I’ll explain how that was the case for one electric vehicle battery manufacturer. It is likely the case for others.