
Sphinx-4 is a state-of-the-art, speaker-independent, continuous speech recognition system written entirely in the Java programming language. It was created via a joint collaboration between the Sphinx group at Carnegie Mellon University, Sun Microsystems Laboratories, Mitsubishi Electric Research Labs (MERL), and Hewlett Packard (HP), with contributions from the University of California at Santa Cruz (UCSC) and the Massachusetts Institute of Technology (MIT).
The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “research-ready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source under a very generous BSD-style license.
Sphinx-4 comes along with several acoustic and language
models capable of handling a variety of tasks ranging from simple digit
recognition to large vocabulary n-Gram recognition. Because it is written
entirely in the Java programming language, Sphinx-4 can run on a variety
of platforms without requiring any special compilation or changes.
In developing Sphinx-4, we often are dealing with large graphs
that define the search space. When debugging the system, we often will
want to visualize these large graphs to ensure that they are constructed
properly. To do this we use aiSee.
We've instrumented Sphinx-4 to dump out upon request GDL for the important data structures. With this we can explore our large data structures using aiSee, which is an essential part of our toolkit for developing Sphinx-4.
The above example graph shows the various high level
components in a typical
Sphinx-4 configuration and how the components
relate to each other. Below is an example of a very small (isolated digits) search graph.
Paul Lamere, Sun Labs
