Skip to main content

View Diary: The Joys of Speech Recognition Tech (39 comments)

Comment Preferences

  •  It's difficult to do right... (1+ / 0-)
    Recommended by:
    johnnygunn

    ...because of varying sound quality issues on phones, as well as a wide variety of accents and dialects.  But they have gotten better.  

    I use a product on my computer to transcribe interviews. Generally speaking, you can count on it taking one hour per 10 minutes of interview to do a verbatim, word for word interview transcription.    

    The software I use was originally designed for blind computer users, and has been developed to the point where it is now used for live broadcast transcribing and court reporting.  If you've ever been in a courtroom where the reporter is wearing this crazy mask that looks like a fighter pilot's oxygen mask(called a stenomask), he or she is using live voice transcription.  The person basically repeats every word that is said into a computer program that is programmed to his or her voice.  The computer automatically generates text.  The mask is soundproofed so you don't hear them repeating every word said.        

    The software I use allows you train it for a specific person by having them read a paragraph into the voice recorder.  That paragraph allows the software to learn the accent and speech patterns of the person by comparing what they read to an exemplar.  It learns that they clip the "g" on words ending in "ing," for instance, or that they say "Warshington" instead of "Washington," or "nukuler" instead of "nuclear."

    When it does make a mistake, you can correct it on screen and that allows the software to fine tune itself even more.    

    My computer knows my voice well enough to transcribe what I say about 98% accurately.  With a voice that is not mine, it's about 75-85%.

    Nonetheless, it cuts a 90 minute interview transcript from taking nine hours to taking two.

    I love it because it means there are 7 more hours in the day for me (or an intern - the usual suspects who suffer through transcriptions) to do research or writing.  

    Heavy accents (whether Southern, Jersey/NYC/Boston or foreign) can play havoc with it.  It seems to be most accurate on midwestern English.  

    Poor audio quality also is a problem.  I record on a TASCAM DR-100 mk.2 with dual Sennheiser Evolution e835 Cardioid mics mounted on desktop holders.  I also choose to conduct interviews in areas where there is no background noise, including fans, heating/AC ducts, hum from fluorescent lights, etc, so my recordings are radio quality.  

    Poor phone connections and the cheap mics on many phones can cause severe problems.   Mics that pick up background noise, where the gain is really high causing clipping or wrapping to occur frequently, or where the person's mouth is too close causing plosives and sibilants to come through are an issue.  For people who use their smart phones as voice recorders (many do), there are companies that sell add-on microphones that are of much higher quality to avoid some recording issues.            

    As the software (and to some extent hardware) continue to progress, it will get way better.    

    •  Telephone-based phone systems (2+ / 0-)

      have to deal with a whole host of additional problems.  Messages "spoken" to callers are usually recorded and processed in a studio.  But caller input, whether voice or keypad, arrives over low quality lines.

      We always designed our systems to allow for callers who have the old rotary phones that use pulses instead of tones.  I wonder if anyone still uses those old phones?  

      Pulse dialing dates from 1891, well over a century ago.  Touch Tones are 50 years old, ancient, by technology standards.

      Even Democrats can be asses. Look at Rahm Emanuel.

      by Helpless on Sat Nov 23, 2013 at 12:43:58 PM PST

      [ Parent ]

      •  Yep, like I mentioned, low quality lines plus (2+ / 0-)
        Recommended by:
        johnnygunn, Helpless

        ...low quality phones.  iPhones record fairly well.   I've had several people record interviews on their iPhones, and it came out sounding pretty good.  Not exactly radio-studio quality like my setup, but pretty usable. (My setup also cost $1500 to put together, and is not exactly user friendly)  

        I've had some use other phones as recording devices and they sounded like shit.  Picked up hiss, every background sound within 30 feet, tons of plosives and sibilants, lots of clipping.  Almost unusable.  

        So between the low quality connection on many phone calls, plus a wild variety of microphone and encoding quality on different phones, and the thousands of different accents and dialects, your job is very, very difficult I imagine.  

        Nonetheless, considering these limitations, many of the systems I've used have worked pretty well.  

Subscribe or Donate to support Daily Kos.

Click here for the mobile view of the site