Gary Robson
You are here: Gary Articles JCR → Xscript Standards

Transcript Format Standards

by Gary D. Robson
Journal of Court Reporting (May 2000)

In the beginning, there was paper. Typewritten transcripts on paper, that is.

As we move forward into new technologies like Legal XML, we must keep in mind the advantages of the paper system that we are starting to leave behind. Sure, paper had a lot of disadvantages, but it was standard. It didn't matter whether the transcript was prepared by dictation, by a notereader, by a mask reporter, or by a Gregg or Pittman shorthand reporter - any lawyer could read it.

When we moved into the age of CAT, we started to move away from standardization. The early CAT systems had no compatibility whatsoever. In the age of the Xscribe XEC-3, the Stenograph Cimarron I, and the Baron Solo, if a reporter purchased a standalone CAT system, and the agency had a system from a different vendor, there was simply no way to electronically transfer that transcript from one system to another.

ASCII Disks: The First Standard

Then came the ASCII disk. Court reporters were able to write out their transcript in an electronic form that attorneys could load directly into their computers. This didn't help much between court reporters on competing CAT systems at first, because CAT software could only write ASCII diskettes, not read them. That ability came later, and came slowly.

ASCII, the American Standard Code for Information Interchange, is a way of representing characters inside a computer. Computers store everything as numbers, and ASCII maps each text character to a number. As an example, the letter "A" is represented by a 65, and a dollar sign ($) by a 36. The ASCII standard allows text files to be easily transferred from computer system to computer system or from program to program by guaranteeing that "DOG" will always be stored internally as 68 79 71.

Just making a file ASCII doesn't guarantee that another computer program can read it, any more than writing a Russian word in English letters instead of Cyrillic letters makes it readable to someone who doesn't speak Russian. Reading ASCII files was a difficult proposition for CAT programmers. They had to deal with many different formats for page numbering, line numbering, question numbering, spacing, and representations of all the characters that ASCII doesn't specify, like accented letters. Not only that, but ASCII doesn't include any standard for representing character formatting, like bold, italics, underlining, superscript, or subscript.

RTF/CRE: A Solution For Court Reporters

In 1995, a group of CAT software companies decided to work together and come up with a better solution. The goal was a format that would allow a transcript to be transferred from any CAT system to any other CAT system with no loss in format or information at all.

The group began with Microsoft's RTF (Rich Text Format), which is supported by most word processors. RTF nailed down all of the various formatting options for characters, paragraphs, and pages, but didn't provide everything needed for a generalized solution.

When the CRE (Court Reporting Extensions) were developed, they added everything a transcript could want, even including such specialized information as embedded steno notes, time stamps, and global tables. Using RTF/CRE, which is now supported by every major CAT vendor, court reporters can move transcripts, steno notes, and court reporting dictionaries freely between systems.

XML: A Solution For Attorneys, Too

But what about attorneys? RTF/CRE is of little benefit to them for two reasons. First, much of the information (e.g. steno notes and g lobal tables) is utterly useless to them. More importantly, RTF/CRE does not have any generalized way to deal with what programmers call "meta-information," or data about the transcript that isn't actually spoken during the proceedings, like the information on the caption page.

As an example, an attorney with a hard disk full of depositions may wish to pull up the deposition of Bob Jones. The format of a transcript may differ between court reporting firms, and the name of the deponent can appear in different places and different context. The attorney can't simply search for the text, "Bob Jones," as it could appear in hundreds of other depositions when different deponents referred to that individual.

A new solution is in the works that solves this problem, and it is called Legal XML. Like RTF/CRE, i t began with a general standard. XML (Extensible Markup Language) bears some similarity to RTF, in that it is based on ASCII text with "tags" embedded in it to carry non-textual information like character formats. XML has its roots in the markup language upon which the World-Wide Web is based. In fact, the next generation of Web browsers (programs like Netscape Navigator and Microsoft Internet Explorer that allow you to view Web pages) are expected to offer support for XML, making it about as universal as a format can get.

The XML format handles all of the formatting issues for the transcripts. Legal XML includes a set of extensions to XML for all of the meta-information that attorneys want.

This has been attempted before, with proposals like CADI and REDI, but neither achieved wide acceptance and usage. More significantly, neither tied in with the all-pervasive Web, a capability that dramatically enhances Legal XML's chances of success.

It's important to understand that Legal XML is not a program or a product. It is a format. Your recording of Beethoven's Fifth Symphony has the same music on it whether it comes from an LP, an eight-track tape, or a CD, but the CD offers advantages like better sound quality, and possibly even some computer files or video clips. Similarly, your transcript looks the same whethe r it comes from paper, ASCII diskette, RTF/CRE file, or Legal XML file, but the Legal XML file can have just a bit more information.

Where does this information come from? It doesn't just appear out of thin air. You'll have to put it there. When the Legal XML organization settles on which information is required, CAT vendors will need to create a way for you to enter this data. Some items, like the case number, may be entered using "fill-in-the-blanks" forms. Other information, like names of witnesses, may be automatically extracted from the transcript by your CAT software. We may even see timestamps in your realtime transcript used to automatically determine starting and ending times for testimony.

What Legal XML Means To Court Reporters

Is Legal XML the panacea we've been searching for all these years? That remains to be seen. To date, court reporters have had little involvement in the definition of Legal XML, a situation that will probably change by the time you read this. As of this writing, there is no move afoot to absorb the CRE components of RTF/CRE into Legal XML, meaning that it cannot be used as a general data interchange tool for court reporters on different CAT systems. That could change quickly, however.

Legal XML is what's known as an "open standard." No single company controls it. The Legal XML non-profit organization has open membership, so that anyone may join and contribute to the effort. The standard is extended and enhanced through cooperation and consensus.

The beauty of a tag-based format like RTF/CRE or Legal XML is that computer programs can simply ignore any information that they don't understand. To see an example of this, save a transcript from your CAT system as an RTF/CRE file, and then open it in a word processor like Microsoft Word or WordPerfect. The transcript and its formatting will be there, and the CRE information will not show up. The same can be done with Legal XML, allowing it to contain information required by CAT software, which legal software can just ignore.

The court reporting profession can be and will be a part of the development of the Legal XML standard in the future. Like the paper transcripts we started with so many years ago, the Legal XML-based transcripts of the coming century will be universal, and any lawyer or court reporter will be able to work with them.