Gary Robson
You are here: Gary Articles JCR AI

Artificial Intelligence and Stenography

by Gary D. Robson
Journal of Court Reporting (Mar 2004)

CAT programs are getting smarter, but how smart are they?

Includes sidebar: Does it look smarter than it is?

In the 1960s, IBM and the CIA built a computer-aided transcription (CAT) system with a "universal dictionary" capable of translating a court reporter's machine shorthand into English. It used an IBM mainframe computer that filled a whole room, and cost a prohibitive amount of money. By the 1980s, CAT software was running on personal computers and priced so that most court reporters could afford it. Features we considered "advanced" in the late '80s included such stunning technology as spelling checkers and color screens.

Most of the CAT systems of that era were similar. They took machine shorthand in, and created printed transcripts. One may have been faster than another, or used fewer keystrokes to perform an editing operation, but they all had the same basic functionality.

As CAT entered the '90s, vendors began to differentiate their systems with productivity rather than price. Automation shaved minutes-or even hours-off of complex tasks like creating transcript indices. That's when the phrase "artificial intelligence" began to creep into CAT advertising.

Conflict Resolution

Probably the most visible and mystical of automation features is conflict resolution. If a reporter writes "here" and "hear" the same way, and can watch the sentence, "I'll hear it when you get here" come up correctly on the screen, it's impressive. When a reporter comes up with a conflict the CAT system hasn't seen before and watches the program learn to resolve it automatically, it looks like nothing short of magic.

In actuality, the process used by the CAT software is quite simple, and has become highly refined in recent years. Automatic conflict resolution (ACR) begins with a big dictionary of parts of speech inside the CAT software. This dictionary tells the software that "frog" is a noun, "run" is a verb, and "lead" could be either one.

When you resolve a conflict, the program does a quick analysis of the conflict's context. Is it at the beginning of a sentence? Preceded by an adjective? Followed by a preposition? All of this information is recorded, along with which choice you selected. As you continue encountering and resolving the conflict in your writing, the program looks for patterns. If you always pick the second choice when the conflict appears at the end of a sentence, then the software will figure that out and start doing it for you.

That description is an oversimplification of the task, because the pattern recognition is complex, but for the amazing effect ACR has upon editing time, it's really quite a straightforward feature to program.

Why, then, does ACR appear to work better on some CAT systems than others? There are several reasons. One is that vendors preprogram rules for common conflicts. There were systems over 20 years ago that could resolve the a/an conflict based on whether the following word began with a vowel. Similarly, conflicts like two/2/too/to are predefined in many CAT programs.

Also, the number of factors taken into consideration in the contextual analysis and pattern recognition varies. The first ACR implementation I designed considered only fourteen factors (start of sentence; end of sentence; followed by noun, verb, adjective, adverb, article, or preposition; and preceded by one of those six parts of speech). With the speed, memory, and computing power of today's computers, far more complex analyses can be performed and stored.

Spelling and Grammar

Few companies write their own spelling checkers anymore. A spelling checker can be licensed and drop ped into a program for far less than it would cost to create one from scratch. This means CAT companies start from the same point when they implement spellcheck capability in their programs. It's what they do with the spelling checker that varies.

The simplest implementation does what Microsoft Word does: it checks the document and flags misspelled words. The potential, as we'll see later in this article, is much greater.

With the potential for increased automation and productivity comes an increased potential for error. Attorneys make up words constantly. Court reporters face marginally-literate witnesses all too often. Bizarre spellings of names have become de rigeur. If a spelling checker in a CAT system "fixes" an invented word or oddly-spelled name, it could well be introducing errors into the transcript. Programmers and court reporters must view automatic spelling correction with some amount of mistrust.

Word Endings

I recall in 1987 helping my wife to "clean up" her translation dictionary, which had been built by a scopist. The dictionary contained over 1,000 words ending in -ing. Since Kathy comes back for a -G in a separate stroke to write that ending, we removed all of those entries and just defined an ing suffix. Now, she could add ing to any root word with no change to her dictionary. Unfortunately, hundreds of -ing words were now misspelled because final consonants weren't being doubled.

At the time, our solution was to add those hundreds of words back in. Shortly after that, CAT programmers began building word ending rules into their software. This is another feature, like ACR, that looks like magic. Again, the magic is in the hands of the programmer, not the software.

The most difficult part of programming smart word endings is determining the rules. When I designed my first word-ending handler, it took me longer to figure out a set of generalized rules for English word endings (along with lists of exceptions, of course) than it took my hot-shot programmer to write the code. English is complicated. Just look at this short example, taken from that old software specification:

For all root words ending in a "y" preceded by a consonant (e.g., try or hungry), treat an "s" or "d" suffix as "es" or "ed." If the suffix begins with any letter other than "y," then change the "y" at the end of the root word to an "i." Examples:
 
Fly + er → flier
body + less → bodiless
happy + ness → happiness
country + fied → countrified
 
Exceptions: This does not apply to the root words lady and baby or to the suffixes like and ship.

Modern systems add another twist to the algorithm. If the root word appears in the spellcheck dictionary, the software can do another lookup after the suffix is added to see if it was done correctly. If not, it can try variants on the theme and check again until it has a hit. This means that new exceptions-remember that words are constantly being added to the English language-do not have to be explicitly programmed. When the spellcheck dictionary is updated, the behavior of the word ending handler changes automatically to match it.

Number Formatting

Proper formatting of numbers is very similar to conflict resolution. Each CAT system is given a set of possible number formats, such as currency, phone number, and spelled-out. Each format allows a specific number of digits (e.g., in the U.S. and Canada, a phone number can be 7, 10, or 11 digits, as in 555-1212, 800-555-1212, or 1-800-555-1212) .

These formats are often made programmable. For example, a ten-digit phone number could be written as (800)555-1212 or 800.555.1212 or 800/555-1212 or in many other ways. In the U.S., you'd write currency as $1,500.25. In other countries, they'll use a different currency symbol or swap the meanings of the decimal and thousands separator (e.g., 1.500,25).

Users of the software are given edit commands and steno triggers to place numbers into specific formats, and the programmer of the CAT system can predefine triggers as well. As an example, if a number is followed by the word "dollars" it gets reformatted into currency. If a reporter writes six feet four inches, the software can turn that into 6'4".

Just as the ACR software learns by contextual analysis of the reporter's behavior, number formatting software can learn, too. Sometimes this is automated, and sometimes reporters are given the opportunity to define the program's behavior explicitly.

But is it artificial intelligence?

In 1950, Alan Turing wrote an article called Computing Machinery and Intelligence, in which he proposed a test-now called the "Turing test" in his honor-to determine whether a machine is intelligent. He said that if a human being could carry on a conversation with a machine using a teletype keyboard, and not be able to tell whether it's human or not, then the machine has intelligence.

The term "artificial intelligence" was coined in 1955 by John McCarthy in a study proposal for Dartmouth College. The definition of AI, and of machine intelligence in general, is a fuzzy one. Few adhere to the Turing test as the sole determiner of AI. Any system capable of learning and deriving rules that the programmer didn't anticipate can be considered artificially intelligent as well.

Let's look at an example. I could program a computer to monitor data coming in from a serial port and analyze it to determine whether it is coming from a Stentura, a Flash Writer, or a StenoRam. Since it is my knowledge that allows it to make the decision, there is no AI involved, although it may look like magic to the court reporter using it.

On the other hand, let's say that I simply program the computer to monitor errors. The software "realizes" on its own that wheneve r a certain type of steno error occurs, the user immediately changes a setting to indicate that the steno keyboard is a Stentura. If the computer figures out what's going on without being programmed to do so, and automatically changes the setting whenever the error occurs, that is clearly artificial intelligence.

In between those extremes is the great gray area of today's CAT software. There are those who have been advertising "artificial intelligence" features in their software for over 15 years. I believe that they would be hard-pressed to find a computer scientist that would look at those old systems and back that claim. AI has been a marketing buzzword since long before aspects of it became reality. On the other hand, the speech recognition algorithms being used in several CAT systems today use technology known as neural networks, which is a branch of AI.

I remember a conversation I had with the head of another CAT company at an NCRA convention in the early '90s. We discussed how our automatic conflict resolution software worked, and determined that we used basically the sa me algorithm. Since we both used preprogrammed context factors and probability analysis, I asked why he called his "artificial intelligence." His very telling response was that our competitors called it AI, so he felt that he had to as well.

Is there AI in the software that handles conflict resolution, word endings, and number formatting today? I don't think there is. There may well be AI elsewhere in some CAT programs, and there almost certainly will be in the future, but saying that CAT software has artificial intelligence is mostly marketing hype, not a statement of fact.

What's Next?

I remember a phrase my mother used when I was growing up. If she mangled a sentence or I misinterpreted what she told me, she'd look at me and say, "Listen to what I mean, not what I say." This is precisely where CAT software has been heading for well over a decade.

The realtime writers of the '70s and '80s had to depend on comprehensive dictionaries, well-defined conflict-free theories, and (of course) phenomenal writing skills to produce clean realtime. Today, reporters of more modest skill using less mature dictionaries can produce realtime that looks quite good. Their software tries to find mistakes and determine what the reporter meant to write.

There are some serious challenges here for CAT programmers. Court reporters are tasked with producing verbatim transcripts. The software can't automatically fix every egregious grammatical error, because what the reporter wrote may well be what the witness actually said. The program has to be able to distinguish between the original speaker's mistakes and the court reporter's mistakes, which can be a very difficult task.

This article has only touched on a few areas of "intelligence" in modern CAT software. There are many others, such as compensation for shadowing and dropping, analyzing untranslates, and automatic generation of index tags, and there are more being added with each new release of CAT software.

Many of these improvements will be irrelevant to today's experienced reporte r. Who cares if the software can automatically fix something that you fixed yourself a decade ago by changing your writing style? Many seasoned realtime reporters disable "intelligent" CAT features because they can write it faster and cleaner themselves, but new reporters are growing dependent on the types of features we've discussed here, and are often unable to use older-generation CAT software at all. This trend is sure to continue.

What lies in the future of CAT? I believe that the future lies in improving the software's ability to understand reporters through detecting and fixing misstrokes, resolving ambiguities and conflicts, and adapting itself to the way each reporter writes. The cross-pollinating of steno and voice-based CAT systems-which has already begun-is sure to bring improvements to both. In other words, we will continue down the road that we are already taking, and the stunning features of today's systems will be the mundane capabilities of the software 20 years from now.


Does it look smarter than it is?

Simple progra mming tricks can often make a program look much more intelligent than it actually is. When I was in high school, I got in an argument with another student about the difficulty of writing a chess-playing program. This was decades before IBM's Deep Blue supercomputer beat the reigning World Chess Champion, Garry Kasparov, and people were still arguing whether a computer would ever beat a skilled chess player. I bet my friend that I could write a chess-playing program from scratch in 24 hours or less.

He set the ground rules: Every move the computer made had to be legal. The computer had to check every move the human made to make sure it was legal. If the computer won or lost, it had to realize it. The key was, my program didn't have to win. It just had to play according to the rules.

The next day, I presented my program. He sat down to check it. The computer moved its king's pawn forward two spaces-one of the most common openings. My friend tried several illegal moves, and the program caught them all. Finally, he made one of the 20 legal initial moves, and the program promptly resigned.

The program had clearly met all of the requirements. It made only legal moves, and forced its opponent to make only legal moves. When it resigned, it "knew" that it had lost. I won the bet, even though my program was probably the most boring chess opponent in the history of the game. It knew only 20 moves (21 if you count letting the human opponent resign).

The point of the story is that it's very difficult to define what constitutes intelligent behavior. What's important isn't what you call it, it's what it can do for you. If a feature in your software saves you two hours a week, do you really care whether writing it took a programmer an afternoon or a team of 20 software engineers all year?