Originally Posted by Chronos
I've heard it argued that the reason space travel is still so dangerous is that we haven't had enough accidents yet. Once something goes wrong, you can figure out what it was, and take measures to ensure it doesn't happen again. Once that happens enough times, you've found most of the significant failures, and it becomes safe.
This is sort of true; until you experience multiple failures, you cannot make a credible assessment of actual reliability, and therefore, don't know how to evaluate the most likely candidates for failure.
The way this is addressed during development when sparse data is available is by what is called qualification testing; representative subsystems up to the motor and stage level are subjected to loads and environments that exceed the maximum predicted loads (MPL) and environments (MPE) by some margin. If the maximum environment is not well known or characterized, which typical for flight environments or self-induced environments with limited testing, additional margin is applied for uncertainty about the variability of the environments. A typical qualification test level is roughly a factor of 3.5X the expected maximum, and is conducted for a multiple of the exposure duration in flight (usually 3X or more). It is anticipated that a component or system that can experience this amplified environment and duration will be sufficiently robust to survive in flight.
However, there are still several areas of uncertainty that may not be captured. For one, although the attempt is made to test components and systems in a "flight-like manner", the reality is that it is rarely possible to completely reproduce the inertial and dynamic conditions seen in flight. For subsystem level testing, there may be dynamic coupling effects or constraints that are different between ground and flight. And even when you have measured flight data, you will rarely see the stressing case, so even twenty or fifty flights may not result in a possible failure or fragility mode, which is why a log-normal probability distribution is usually used when establishing limit predictions and margins on a statistical basis.
With regard to reliability and failure mode analysis, you can make an estimate based upon global failure modes (demonstrated or by analysis) and the population of demonstrated success that gives a minimum reliability at a given confidence level which depends on the "sample population"; say, for 20 successes and no failures you get an estimate of 97% reliability at a 50% confidence, or 89% reliability at a 90% confidence. However, relying on these estimates without a population that can really exercise the statistical limits is like walking blindfold along the edge of a cliff; you'll never know how close you are to dying until you step off. Statistically, you get "safer" every time you fly successfully, but until you experience long tail or "black swan" conditions, you never really know what failure mode is going to break you. Most detail failure mode analysis is informed guesswork and casual inference without statistical rigor. (Efforts have been made by some, your narrator included, to institute a Bayesian type approach to reliability, but the difficulties in performing quantifiable hypothesis testing and desire to avoid deliberately driving to a failure condition make it impossible to really refine uncertainty bounds on the original prediction.)
The other problem is that when things fail in flight, we often are unable to determine exactly why. Root cause analysis (RCA) is performed to try to identify the most likely failure, but unless the vehicle is instrumented in such a way as to directly measure the failure mode--unlikely if you weren't expecting it--you have to infer what might have caused failure through other data, analysis, or attempting to reproduce a hypothesized failure condition in ground testing.
Originally Posted by Machine Elf
If you read the Rogers Commission Report
(on the Challenger accident) and the report of the Columbia Accident Investigation Board,
they paint a picture of a space agency that had become complacent about safety. Watch for the phrase "normalization of deviance:" NASA had come to accept non-standard behavior of the shuttle system as normal, without fully investigating the consequences of the inevitable deviations from that new normal. In both shuttle disasters, there were BIG clues that a big disaster was looming, but NASA continually downplayed or ignored the implications of what they were seeing. Prior to Challenger, booster O-rings were known to not seal properly in cold weather, showing partial burn-through on many launches. Burn-through was NOT part of how they were designed to perform. Likewise, prior to Columbia, ice and foam peeling off of the external tank (and striking the orbiter) was a known concern.
It is absolutely true in both cases that NASA and the contractors involved had prior knowledge of the SRMs and vehicle operating in an out-of-design condition, and that a culture of accepting deviations grew up in the agency to the point that bringing these up as flight risks was avoided, a position I refer to as being "both risk-adverse and risk-obtuse". However, this is a common problem, in part because the extreme variability of loads and environments never really gives you a homogenous data set, and deviations from a theoretical specified condition (which was defined before hardware was ever built and often never updated to reflect post-production test and flight data) are often accepted if no critical failures occur.
In the case of the problems that caused the launch failure of Challenger
, while the blow-by phenomenon was known before, procedural changes were made to help limit it while analysis showed that it would not result in failure of the motor case at the field joint. While the condition was undesirable, it wasn't regarded as critical, and had been seen on other solid motors before without catastrophic failure. Morton Thiokol knew of the problem with the design of the field joint (called "joint rotation") and had actually inverted the joint and performed other improvements on the Advanced Filament Wound Composite Solid Rocket Motors intended for use on the Air Force "Blue Shuttle" program intended to be launched in polar orbit from Vandenberg AFB. Because of the cost of implementing the change (scrapping the existing casings) and the perceived lack of critical hazard it was not addressed prior to STS-51-L.
However, other factors contributed substantially to the ultimate failure. The temperature and wind shear conditions on the fateful launch day of -51-L were worse than any previously flown at, and were actually outside the thermal qualification limits of the SRM. However, what really caused the failure was a combination of phenomena; in addition to the high wind shear which caused excessive joint rotation, a previously unsuspected pooling of cold oxygen that vented from the External Tank (ET) caused that particular joint to be supercooled. Even at that, while the joint would have leaked it would not have caused the dramatic failure had the gas jet from the leak not been pointed back at the ET, causing it to cut through the skin of the oxidizer tankage and cause structural failure of the ET, which caused the Orbiter to see excessive unstable aerodynamic and inertial loads and therefore breakup. (While you often see ignorant people assert that the Shuttle or tanks "blew up", the fact is the spontaneous combustion of hydrogen and oxygen propellant did a negligible amount of damage to the Orbiter vehicle; structural failure was due to a sudden yaw at high dynamic pressure that broke off one wing and caused the cabin to separate from the main chassis.) The SRMs actually survived this unplanned separation and continued to fly--one end over end--until destructed by the RSO about ten seconds prior to end of action time. The SRMs themselves did not fail catastrophically, but because of a confluence of conditions at the system (total vehicle) level, they resulted in an unexpected catastrophic failure mode for which the Orbiter was never designed to recover.
catastrophe was less excusable, as there was significant data about that particular failure mode and a broad appreciation for the failure modes that could result from damage to the RCC leading edge panels. The worst conditions for this were actually seen on the return-to-flight mission after the Challenger
disaster. However, there was never a feasible design fix given the design of the Orbiter and the parallel staging mode of the launch system.
Originally Posted by Hail Ants
The two Shuttle accidents were the result of the Space Shuttle just being a terrible design from the lowest govt contract bidder. Specifically the 'strapped-on' model instead of the 'stacked' model used on all previous (and probably future) manned boosters.
I'm not sure where you have gotten your information but it almost completely wrong. Rockwell Space Systems, which built the Orbiter part of the vehicle (minus the Space Shuttle Main Engines, which were built by Rocketdyne), was not the lowest bidder. (All of the bidders actually came in within the ceiling, and thus price was not a deciding factor in the selection process.) The STS configuration was defined by the RFP such that a winged shuttle with parallel staging was mandatory, and the use of large solid rocket boosters was determined by NASA in order to reduce schedule risk and save on development costs versus a liquid system. The one bidder who proposed a substantially different configuration (the Chrysler Aerospace SERV
) was immediately discarded, as was the original proposal by Rocketdyne to use the aerospike engines they had already developed and tested for the main engines. And despite the derision that is often lobbed on the STS for using solid propellant boosters, this allowed a high thrust Stage 0 booster without the addition complexity of separate pressurized systems or the bulk of liquid propellant, and the RSRMs have proven to be some of the most highly reliable large-class motors/engines in operation.
In-line vertical motor stacks have their own sets of problems which make them troublesome--specifically, the tendency of high L/D resulting in bending modes which can cause high structural stresses and difficulty in maintaining vehicle control and recovery. Staging on an in-line vehicle is a highly tricky operation of having to separate and then fly away from the downstage without it hitting the upper stack/vehicle, while parallel staging is almost trivial in comparison. Systems such as Delta, Titan, and Atlas have used parallel staging with Stage 0 boosters to get additional initial thrust for decades with only a handful of failures (like the Titan 34D-9 launch), compared to many failures in vertical staging. The Ares I vehicle--colloquially known as "The Stick" by its developers and derisively referred to as "The Corndog" by detractors--had some very substantial problems due to its in-line configuration compared to configurations like the Ares V and Jupiter-130.
There are many complaints that can be leaved on the STS, but most of them stem from a set of functional and performance requirements that were inconsistent, ill-considered, and often politically motivated to make the STS a do-all vehicle. That the vehicle functioned as well and long as it did despite many of the compromises made to the design suggests that, in general, the developing contractors actually did a pretty good job with what they were given. The reality is that the STS should have flown for about ten years while the next generation system was developed to improve upon it, just as Apollo/Saturn development was planned before being curtailed and cancelled.