Страницы: -
1 -
2 -
3 -
4 -
5 -
6 -
7 -
8 -
9 -
10 -
11 -
12 -
13 -
14 -
15 -
16 -
17 -
18 -
19 -
20 -
21 -
22 -
23 -
24 -
25 -
26 -
27 -
28 -
29 -
30 -
31 -
32 -
33 -
34 -
35 -
ou
wanted, but this is sternly frowned upon by management,
and frankly, when you've overheard one, you've pretty
much heard 'em all.
You can tell how long the conversation lasts by the
glow of the calling-cord's lamp, down on the calling-cord's
shelf. When it's over, you unplug and the calling-cord
zips back into place.
Having done this stuff a few hundred thousand times,
you become quite good at it. In fact you're plugging, and
connecting, and disconnecting, ten, twenty, forty cords at a
time. It's a manual handicraft, really, quite satisfying in
a
way, rather like weaving on an upright loom.
Should a long-distance call come up, it would be
different, but not all that different. Instead of
connecting
the call through your own local switchboard, you have to
go up the hierarchy, onto the long-distance lines, known as
"trunklines." Depending on how far the call goes, it may
have to work its way through a whole series of operators,
which can take quite a while. The caller doesn't wait on
the line while this complex process is negotiated across
the country by the gaggle of operators. Instead, the
caller
hangs up, and you call him back yourself when the call has
finally worked its way through.
After four or five years of this work, you get married,
and you have to quit your job, this being the natural order
of womanhood in the American 1920s. The phone
company has to train somebody else -- maybe two people,
since the phone system has grown somewhat in the
meantime. And this costs money.
In fact, to use any kind of human being as a switching
system is a very expensive proposition. Eight thousand
Leticia Luthors would be bad enough, but a quarter of a
million of them is a military-scale proposition and makes
drastic measures in automation financially worthwhile.
Although the phone system continues to grow today,
the number of human beings employed by telcos has
been dropping steadily for years. Phone "operators" now
deal with nothing but unusual contingencies, all routine
operations having been shrugged off onto machines.
Consequently, telephone operators are considerably less
machine-like nowadays, and have been known to have
accents and actual character in their voices. When you
reach a human operator today, the operators are rather
more "human" than they were in Leticia's day -- but on the
other hand, human beings in the phone system are much
harder to reach in the first place.
Over the first half of the twentieth century,
"electromechanical" switching systems of growing
complexity were cautiously introduced into the phone
system. In certain backwaters, some of these hybrid
systems are still in use. But after 1965, the phone system
began to go completely electronic, and this is by far the
dominant mode today. Electromechanical systems have
"crossbars," and "brushes," and other large moving
mechanical parts, which, while faster and cheaper than
Leticia, are still slow, and tend to wear out fairly
quickly.
But fully electronic systems are inscribed on silicon
chips, and are lightning-fast, very cheap, and quite
durable. They are much cheaper to maintain than even
the best electromechanical systems, and they fit into half
the space. And with every year, the silicon chip grows
smaller, faster, and cheaper yet. Best of all, automated
electronics work around the clock and don't have salaries
or health insurance.
There are, however, quite serious drawbacks to the
use of computer-chips. When they do break down, it is a
daunting challenge to figure out what the heck has gone
wrong with them. A broken cordboard generally had a
problem in it big enough to see. A broken chip has
invisible, microscopic faults. And the faults in bad
software can be so subtle as to be practically theological.
If you want a mechanical system to do something
new, then you must travel to where it is, and pull pieces
out
of it, and wire in new pieces. This costs money. However,
if you want a chip to do something new, all you have to do
is change its software, which is easy, fast and dirt-cheap.
You don't even have to see the chip to change its program.
Even if you did see the chip, it wouldn't look like much. A
chip with program X doesn't look one whit different from a
chip with program Y.
With the proper codes and sequences, and access to
specialized phone-lines, you can change electronic
switching systems all over America from anywhere you
please.
And so can other people. If they know how, and if
they want to, they can sneak into a microchip via the
special phonelines and diddle with it, leaving no physical
trace at all. If they broke into the operator's station and
held Leticia at gunpoint, that would be very obvious. If
they broke into a telco building and went after an
electromechanical switch with a toolbelt, that would at
least leave many traces. But people can do all manner of
amazing things to computer switches just by typing on a
keyboard, and keyboards are everywhere today. The
extent of this vulnerability is deep, dark, broad, almost
mind-boggling, and yet this is a basic, primal fact of life
about any computer on a network.
Security experts over the past twenty years have
insisted, with growing urgency, that this basic
vulnerability
of computers represents an entirely new level of risk, of
unknown but obviously dire potential to society. And they
are right.
An electronic switching station does pretty much
everything Letitia did, except in nanoseconds and on a
much larger scale. Compared to Miss Luthor's ten
thousand jacks, even a primitive 1ESS switching computer,
60s vintage, has a 128,000 lines. And the current AT&T
system of choice is the monstrous fifth-generation 5ESS.
An Electronic Switching Station can scan every line
on its "board" in a tenth of a second, and it does this over
and over, tirelessly, around the clock. Instead of eyes, it
uses "ferrod scanners" to check the condition of local lines
and trunks. Instead of hands, it has "signal distributors,"
"central pulse distributors," "magnetic latching relays,"
and "reed switches," which complete and break the calls.
Instead of a brain, it has a "central processor." Instead
of
an instruction manual, it has a program. Instead of a
handwritten logbook for recording and billing calls, it has
magnetic tapes. And it never has to talk to anybody.
Everything a customer might say to it is done by punching
the direct-dial tone buttons on your subset.
Although an Electronic Switching Station can't talk, it
does need an interface, some way to relate to its, er,
employers. This interface is known as the "master control
center." (This interface might be better known simply as
"the interface," since it doesn't actually "control" phone
calls directly. However, a term like "Master Control
Center" is just the kind of rhetoric that telco maintenance
engineers -- and hackers -- find particularly satisfying.)
Using the master control center, a phone engineer
can test local and trunk lines for malfunctions. He (rarely
she) can check various alarm displays, measure traffic on
the lines, examine the records of telephone usage and the
charges for those calls, and change the programming.
And, of course, anybody else who gets into the master
control center by remote control can also do these things,
if he (rarely she) has managed to figure them out, or, more
likely, has somehow swiped the knowledge from people
who already know.
In 1989 and 1990, one particular ъBOC, BellSouth,
which felt particularly troubled, spent a purported $1.2
million on computer security. Some think it spent as
much as two million, if you count all the associated costs.
Two million dollars is still very little compared to the
great
cost-saving utility of telephonic computer systems.
Unfortunately, computers are also stupid. Unlike
human beings, computers possess the truly profound
stupidity of the inanimate.
In the 1960s, in the first shocks of spreading
computerization, there was much easy talk about the
stupidity of computers -- how they could "only follow the
program" and were rigidly required to do "only what they
were told." There has been rather less talk about the
stupidity of computers since they began to achieve
grandmaster status in chess tournaments, and to manifest
many other impressive forms of apparent cleverness.
Nevertheless, computers *still* are profoundly
brittle and stupid; they are simply vastly more subtle in
their stupidity and brittleness. The computers of the
1990s are much more reliable in their components than
earlier computer systems, but they are also called upon to
do far more complex things, under far more challenging
conditions.
On a basic mathematical level, every single line of a
software program offers a chance for some possible
screwup. Software does not sit still when it works; it
"runs,"
it interacts with itself and with its own inputs and
outputs.
By analogy, it stretches like putty into millions of
possible
shapes and conditions, so many shapes that they can
never all be successfully tested, not even in the lifespan
of
the universe. Sometimes the putty snaps.
The stuff we call "software" is not like anything that
human society is used to thinking about. Software is
something like a machine, and something like
mathematics, and something like language, and
something like thought, and art, and information.... but
software is not in fact any of those other things. The
protean quality of software is one of the great sources of
its
fascination. It also makes software very powerful, very
subtle, very unpredictable, and very risky.
Some software is bad and buggy. Some is "robust,"
even "bulletproof." The best software is that which has
been tested by thousands of users under thousands of
different conditions, over years. It is then known as
"stable." This does *not* mean that the software is now
flawless, free of bugs. It generally means that there are
plenty of bugs in it, but the bugs are well-identified and
fairly well understood.
There is simply no way to assure that software is free
of flaws. Though software is mathematical in nature, it
cannot by "proven" like a mathematical theorem; software
is more like language, with inherent ambiguities, with
different definitions, different assumptions, different
levels of meaning that can conflict.
Human beings can manage, more or less, with
human language because we can catch the gist of it.
Computers, despite years of effort in "artificial
intelligence," have proven spectacularly bad in "catching
the gist" of anything at all. The tiniest bit of semantic
grit
may still bring the mightiest computer tumbling down.
One of the most hazardous things you can do to a
computer program is try to improve it -- to try to make it
safer. Software "patches" represent new, untried un-
"stable" software, which is by definition riskier.
The modern telephone system has come to depend,
utterly and irretrievably, upon software. And the System
Crash of January 15, 1990, was caused by an
*improvement* in software. Or rather, an *attempted*
improvement.
As it happened, the problem itself -- the problem per
se -- took this form. A piece of telco software had been
written in C language, a standard language of the telco
field. Within the C software was a long "do... while"
construct. The "do... while" construct contained a "switch"
statement. The "switch" statement contained an "if"
clause. The "if" clause contained a "break." The "break"
was *supposed* to "break" the "if clause." Instead, the
"break" broke the "switch" statement.
That was the problem, the actual reason why people
picking up phones on January 15, 1990, could not talk to
one another.
Or at least, that was the subtle, abstract,
cyberspatial
seed of the problem. This is how the problem manifested
itself from the realm of programming into the realm of
real life.
The System 7 software for AT&T's 4ESS switching
station, the "Generic 44E14 Central Office Switch
Software," had been extensively tested, and was
considered very stable. By the end of 1989, eighty of
AT&T's switching systems nationwide had been
programmed with the new software. Cautiously, thirty-
four stations were left to run the slower, less-capable
System 6, because AT&T suspected there might be
shakedown problems with the new and unprecedently
sophisticated System 7 network.
The stations with System 7 were programmed to
switch over to a backup net in case of any problems. In
mid-December 1989, however, a new high-velocity, high-
security software patch was distributed to each of the 4ESS
switches that would enable them to switch over even more
quickly, making the System 7 network that much more
secure.
Unfortunately, every one of these 4ESS switches was
now in possession of a small but deadly flaw.
In order to maintain the network, switches must
monitor the condition of other switches -- whether they are
up and running, whether they have temporarily shut down,
whether they are overloaded and in need of assistance,
and so forth. The new software helped control this
bookkeeping function by monitoring the status calls from
other switches.
It only takes four to six seconds for a troubled 4ESS
switch to rid itself of all its calls, drop everything
temporarily, and re-boot its software from scratch.
Starting over from scratch will generally rid the switch of
any software problems that may have developed in the
course of running the system. Bugs that arise will be
simply wiped out by this process. It is a clever idea.
This
process of automatically re-booting from scratch is known
as the "normal fault recovery routine." Since AT&T's
software is in fact exceptionally stable, systems rarely
have
to go into "fault recovery" in the first place; but AT&T
has
always boasted of its "real world" reliability, and this
tactic
is a belt-and-suspenders routine.
The 4ESS switch used its new software to monitor its
fellow switches as they recovered from faults. As other
switches came back on line after recovery, they would
send their "OK" signals to the switch. The switch would
make a little note to that effect in its "status map,"
recognizing that the fellow switch was back and ready to
go, and should be sent some calls and put back to regular
work.
Unfortunately, while it was busy bookkeeping with
the status map, the tiny flaw in the brand-new software
came into play. The flaw caused the 4ESS switch to
interacted, subtly but drastically, with incoming telephone
calls from human users. If -- and only if -- two incoming
phone-calls happened to hit the switch within a hundredth
of a second, then a small patch of data would be garbled
by the flaw.
But the switch had been programmed to monitor
itself constantly for any possible damage to its data.
When the switch perceived that its data had been
somehow garbled, then it too would go down, for swift
repairs to its software. It would signal its fellow
switches
not to send any more work. It would go into the fault-
recovery mode for four to six seconds. And then the switch
would be fine again, and would send out its "OK, ready for
work" signal.
However, the "OK, ready for work" signal was the
*very thing that had caused the switch to go down in the
first place.* And *all* the System 7 switches had the same
flaw in their status-map software. As soon as they stopped
to make the bookkeeping note that their fellow switch was
"OK," then they too would become vulnerable to the slight
chance that two phone-calls would hit them within a
hundredth of a second.
At approximately 2:25 p.m. EST on Monday, January
15, one of AT&T's 4ESS toll switching systems in New York
City had an actual, legitimate, minor problem. It went into
fault recovery routines, announced "I'm going down," then
announced, "I'm back, I'm OK." And this cheery message
then blasted throughout the network to many of its fellow
4ESS switches.
Many of the switches, at first, completely escaped
trouble. These lucky switches were not hit by the
coincidence of two phone calls within a hundredth of a
second. Their software did not fail -- at first. But
three
switches -- in Atlanta, St. Louis, and Detroit -- were
unlucky, and were caught with their hands full. And they
went down. And they came back up, almost immediately.
And they too began to broadcast the lethal message that
they, too, were "OK" again, activating the lurking software
bug in yet other switches.
As more and more switches did have that bit of bad
luck and collapsed, the call-traffic became more and more
densely packed in the remaining switches, which were
groaning to keep up with the load. And of course, as the
calls became more densely packed, the switches were
*much more likely* to be hit twice within a hundredth of a
second.
It only took four seconds for a switch to get well.
There was no *physical* damage of any kind to the
switches, after all. Physically, they were working
perfectly.
This situation was "only" a software problem.
But the 4ESS switches were leaping up and down
every four to six seconds, in a virulent spreading wave all
over America, in utter, manic, mechanical stupidity. They
kept *knocking* one another down with their contagious
"OK" messages.
It took about ten minutes for the chain reaction to
cripple the network. Even then, switches would
periodically luck-out and manage to resume their normal
work. Many calls -- millions of them -- were managing to
get through. But millions weren't.
The switching stations that used System 6 were not
directly affected. Thanks to these old-fashioned switches,
AT&T's national system avoided complete collapse. This
fact also made it clear to engineers that System 7 was at
fault.
Bell Labs engineers, working feverishly in New
Jersey, Illinois, and Ohio, first tried their entire
repertoire
of standard network remedies on the malfunctioning
System 7. None of the remedies worked, of course,
because nothing like this had ever happened to any
phone system before.
By cutting out the backup safety network entirely,
they were able to reduce the frenzy of "OK" messages by
about half. The system then began to recover, as the
chain reaction slowed. By 11:30 pm on Monday January
15, sweating engineers on the midnight shift breathed a
sigh of relief as the last switch cleared-up.
By Tuesday they were pulling all the brand-new 4ESS
software and replacing it with an earlier version of System
7.
If these had been human operators, rather than
computers at work, someone would simply have
eventually stopped screaming. It would have been
*obvious* that the situation was not "OK," and common
sense would have kicked in. Humans possess common
sense -- at least to some extent. Computers simply don't.
On the other hand, computers can handle hundreds
of calls per second. Humans simply can't. If every single
human being in America worked for the phone company,
we couldn't match the performance of digital switches:
direct-dialling, three-way calling, speed-calling, call-
waiting, Caller ID, all the rest of the cornucopia of
digital
bounty. ъeplacing computers with operators is simply not
an option any more.
And yet we still, anachronistically, expect humans to
be running