Engineering‎ > ‎Stories‎ > ‎

Oops, Sorry

September 13th, 2010

At my first job after college we designed some relatively large inverters. I say relative because no matter how big what you design was there is always some other engineer that says "that's not big, that's flea power." Anyway, over my eleven years working at that company I witnessed some rather spectacular power electronics failures, many of which I was directly responsible for. Here is one such tale of power electronics terror and hilarity.

One of the company's main product lines was built around a three-phase, hard-switched inverter. The largest system we made was 3200 Amps at 480 Volts but the biggest system we could fit in the engineering lab was two cabinets rated for a total of 800 Amps. That test unit was always in various states of being taken apart, reassembled, probed up, and put back together. We used it for all sorts of testing from general controls development, to trying out new hardware, to destructive UL testing, to debugging field returns.

Everybody who worked in the lab knew the unit was a testbed so we never took its operational state for granted. It was well understood that before you energized the unit you made the rounds around the engineering office to make sure nobody had any ongoing tests active. Then after that it was policy to have at least two people verify the test setup before throwing the breaker on.

One day I was working on some controls improvements and I needed to try them out on the "big" engineering unit. I made the usual inquiry to what the state of the unit was. I found out another engineer had brought back a power module from a customer site that was having some capacitor problems (vendor issue) and had installed the returned module in the engineering unit for testing. He and another engineer had just spent days getting the module back from the customer site, rebuilding it with thermal probes in all the large aluminum electrolytic capacitors, and then re-installing it with a data logger in the engineering test system. The were planning on taking data within the next day or so.

Understandably the other engineers were a bit apprehensive about me screwing with the system controls with their test module installed given all the work they had just done. I told them that having the probed up module in the system shouldn't be an issue since my changes were not too experimental. I explained that I needed to perform two quick tests: one test with both cabinets of the big system in parallel at no load and one test with one cabinet at full load by itself. I assured them I would use the cabinet that did not have their test module in it for my full load test. They eventually gave me a tentative thumbs up for testing but asked me not to load that particular module too heavily.  Given the condition of the weakened capacitors they didn't think that module could handle very much.

I started with the dual no-load test and got all the data I needed to take.  The next step was to disable the cabinet with the "weak" test module so I could run the "good" cabinet at full load by itself. Under normal circumstances its not easy to disable one of the cabinets in a multi-cabinet system without all sorts of alarms occurring. However on our engineering test system a lot of the normal controls lockouts were intentionally removed to streamline testing. All I needed to do to disable the cabinet was to pull out the fiber optic cables between the central controls and the cabinet I wanted to keep off.

I pulled a the fiber cables from the unit I wanted to disable, checked it twice, then loaded up the system, and hit the go button. The system ran fine for the first few inverter operations so I really cranked up the load to rigorously test my controls changes. After about a half-dozen runs I heard a rather loud BANG! come from one of the cabinets.  When I say "loud" I mean loud as in "my ears rang for several minutes loud" even though the steel doors of the unit where shut. The bang was so loud I really couldn't really pinpoint which cabinet it came from. The failure tripped the upstream breaker and smoke kept pouring out of the combined wire-way above the unit for a good minute or two after the explosion.

The engineering lab was cordoned off from the production floor with only an open pallet rack. Whenever we had an explosion in the lab everybody from the production floor would stop over to see what happened. If you had a really good explosion, like the one in this instance, you could hear it throughout the entire building, including the engineering cubicles and front office. In those instances a company wide peanut gallery would show up.

Within a few minutes of the explosion I had the entire company looking over my shoulder. I had just shut down and verified the power was off when the engineering contingent arrived, the owners of the test module among them. Right about that time I remember assuring the other engineers not to worry and that their painstakingly probed, test module was completely fine as I had properly disabled it before my testing. I opened the cabinet I thought I had been running and there wasn't a spec of dirt or arc damage inside. Given that the entire engineering lab and production floor was a hazy shade of blue due to all the accompanying smoke that came after the explosion I thought the cabinet looked oddly clean and my stomach started to sink. I shut the door of the first cabinet and moved over to the second cabinet, that contained the other engineer's test module, and braced myself for the worst.

The arc-flash fireball that occurred in that cabinet had to have been impressive given the site that materialized before all of our eyes through the smoke when I opened the door. One of the aluminum power module heatsinks had a golf ball sized crater eaten out of it. That marked ground zero of the failure. There were about a dozen or so vaporized stubs of thermal probes near the crater. Another dozen probe wires were still in place but were completely covered with vaporized metal and a carbon soot veneer along with the rest of the power module. I think all the extra thermal probes that were added to the module provided extra arc paths which amplified the plasma ball during the failure. The module was completely destroyed.

It took less than a second for me to realized I pulled the wrong fiber cables, thereby disabling the wrong unit. I thought I had been hammering on the good cabinet, but all along I was beating on the weakened cabinet with the test module that the other engineers had invested lots of time getting ready for testing.

I remember blurping out an "Oops, sorry" then announcing knowingly that I was "really glad I had the door shut during the explosion". That stupidly obvious statement was all I could manage at the time. I felt I had to say something to break the silence. The two engineers that had spent all the time getting the test module ready, the same module I had turned into molten slag in a matter of seconds, walked away without saying a word.