Products > Speech Science Lab (SSL)

SSL-Professional Edition
a comprehensive software package for education and research in speech science

Application areas of SSL
  • Education and Research in Speech Science, Phonetics, Linguistics, Speech and Hearing.
  • Speech Analysis, Synthesis, Speech Recognition, Perception, Speaker Recognition etc.
  • Spectral Analysis of chanting, music, animal sounds etc.
  • Spectral Analysis of machinery noise etc.
  • Forensic Sciences

    Several Doctoral theses, graduate level dissertations have been done using SSL. A large number of projects have been done using SSL. Some Examples: Speech Analysis of Indian languages - Development of speech in children - Voice transfomration - Analysis of frog calls, cuckoo calls - Quality control of automobile horns etc.

    SSL is easy to operate and user friendly software package. It requires no programming skills.

An Experimental Course in Speech Science

    SSL can be used as an educational tool for an experimental course in speech science. Necessary speech database is provided. A comprehensive Course book guides the user while systematically explaining the concepts in speech science. It covers topics such as temporal and spectral properties of speech signals, Spectrogram reading, Articulatory-acoustics. Course Detail

Modules

Utility Module

Signal Recording, Display and Editing: Speech signal can be recorded and saved into a file. Sampling frequency, duration of signal to be recorded can be specified. Speech Signal can be displayed partly or wholly. Signal can be normalized; signal can be scaled;,a part or whole of a signal can be played. Edit options: copy, delete, insert silence, insert file.

Signal Manipulation: Basic signal processing tasks such as adding signal files, scaling, lowpass filtering, pre-emphasis, down-sampling etc are provided.

Wavespec Module: The user can visualize the signal waveform along with the short-time spectrum, spectral envelope and auto-correlation function. Variables can be set by the user to obtain these functions. Intensity, fundamental frequency and formants for a segment around a marked location are shown. Two signals can be opened for a comprative study.

Labeling Module

Labeling: The signal file to be labeled is opened. The Program automatically marks the segments based on the manner of production into classes such as 'voiced', 'unvoiced', 'burst or stops', 'silence', 'mixed' etc. Also, tentative segment boundaries are shown. The user has to highlight a phonetically significant segment and attach a phoneme or allophone label and context. The user assigned label along with the beginning and ending locations, phonetic context are saved in a database along with the Language and Speaker's identity taken from a header attached to the signal file. Label file can be printed in text mode with label, context, beginning and ending locations in msec.

Database Access

    The user can retrieve the speech signal corresponding to any desired phonetic segment. For example, the user can inquire and retrieve segments of vowel /a/ in the entire database occurring in a particular CVC context for a particular speaker for a particular language. All occurrences of vowel /a/ for the specified conditions can be pulled out from the files and concatenated. Thus the variation in vowel /a/ as a function of speaker or phonetic context or grammatical category can be studied.

Spectrograph Module

    Spectrograph is a tool for the generation of 'spectrogram' - a three dimensional pictorial representation of a signal. The x-axis represents the time, the y-axis represents the frequency and the energy at that time-frequency location is shown by the gray level or in color. The dynamic variation of temporal and spectral properties in the signal are clearly seen in the spectrogram.
    Initially spectrograph was developed as an aid for the hearing impaired so that they can visualize the speech and be able to read the spectrogram as a substitute for auditory processing. But spectrogram is now an indispensable tool for a phonetician to make fine distinctions and to note the subtle variations of a given phoneme in different contexts.
    SSL provides a tool for an efficient computation and presentation of spectrogram in a variety of formats. Broad-band, Narrow-band and Very broad-band spectrograms of any desired duration of a signal can be obtained. Optionally, spectral section at the marked time location can be obtained. Mouse pointer reads out the dB level at any chosen time-frequency location. Gray or color spectrogram can be obtained. The frequency axis can be in Hz, mel or Bark. The frequency-scale can be expanded. The contrast can be enhanced or reduced and dB level can be increased or decreased.
    If the signal file has been labeled by the user then the labels are shown below the spectrogram. The label fields can be edited using the spectrogram as a reference.
    If the signal file has been analyzed by the user then Formant and Pitch (F0) tracks can be superposed on the spectrogram.

Two channel spectrogram is useful to compare temporal and spectral properties of two signals such as original and synthesized speech signals. Facility is available for contrast enhancement; Gain editing; Color Vs Gray Scale; y-axis scale in Hz or mel or Bark.

ACOPHON

ACOPHON is an acronym for acoustic-phonetics. Analysis programs in SSL are of two types

  1. Acophon-I: Block analysis or Uniform frame rate analysis - There are two models: LP based and Formant based.
  2. Acophon-II: Interactive Analysis at user specified locations, interpolation and synthesis - Formant based.

ACOPHON-I Module

Block Analysis-Editing-Synthesis: In Analysis, speech signal is divided into a number of overlapping blocks or frames. For example frames or blocks of 40 msec with inter-frame interval (resolution) of 10 msec yields: Frame #1 between 0 to 40 msec, Frame #2 between 10 and 50 msec, Frame #3 between 20 and 60 msec and so on. Thus there will be 100 frames per second of speech. For each frame or block the following acoustic parameters are extracted: Linear prediction coefficients, Cepstral coefficients, autocorrelation coefficients, Parcor coefficients, Formant frequencies, F0 or Pitch, Source and Speech Intensity, glottal parameters, manner class (voiced/unvoiced/burst/mixed etc). Once the parameters have been extracted they can be used for a variety of applications such as speech recognition, coding, efficient storage or compression of speech, voice mail, speech synthesis etc.


    Two models are available for synthesis: Voice Source excited Linear Prediction model and Formant based model. A Graphic editor cum synthesis tool is available to correct or to purposely manipulate the source and filter parameters. Acoustic parameters can be averaged or linearly interpolated or scaled between any two locations marked by the user. Speech signal can be synthesized. Estimated and edited parameters can be transferred to a database of acoustic parameters along with a phoneme (allophone) label and context. The parameters along with the speaker's and language identity are saved in the database. Synthesized signal can be saved. Acoustic parameters from the database can be loaded at any desired location.

ACOPHON-II Module

Interactive Analysis-Synthesis: There are two models, Cascaded Formant Model and Hybrid (Modified Klatt's) Model. Speech signal is made up of phonemes. In an utterance each phoneme has three major events or targets: Onset, mid-part and transition into the next phoneme.
    In Interactive Analysis the user selects the phonetically significant events or targets in a given utterance. Analysis is performed interactively at the chosen targets. Estimated parameters can be edited and validated using segmental analysis-synthesis approach. There is a facility to introduce pole-zero pairs. The acoustic parameters analyzed at a target can be transferred to a database. The analysis parameters are labeled according to the phoneme category and context. Also, analyzed parameters corresponding to a series of targets can be saved in a file. An Interactive editing tool can be used to either create a new parameter file or edit an existing file. Parameters can be loaded from a database and concatenated. During editing, mixed excitation with controlled voiced Vs noise intensities, source-filter interaction etc. can be introduced. The concatenated Parameters' file can be saved into a database. Synthesized speech signal can also be saved.
    In ACOPHON-II synthesis, the acoustic parameters are linearly interpolated between the targets. After a sufficient number of sentences are analyzed, a generalization can be arrived leading to high quality Text-to-Speech Synthesis system for any desired language.
    There is sufficient flexibility in the models so that they may be adopted for the synthesis of speech sounds that occur in various languages such as aspirated stops, retroflex consonants, nasalized sounds etc.

Articulatory Acoustics (Vowels)

    In SSL, it is possible to position the articulators and obtain the formant frequencies for the set positions. Also vowel sound can be synthesized and played. Conversely, given a vowel its formant frequencies can be estimated. Then the articulatory positions of the model can be manipulated till the spectrum generated by the model matches with that of the signal. Thereby the ariculatory positions for a given pronunciation can be deduced. This needs to be extended to consonant production.


PC Requirement, Support Hardware and Accessories (Click to open the link.)

Go Back   

Copyright 2002 Voice and Speech Systems

...  
...
...

Home

...
...

Voice Awareness

...
...

Products

...
...

Vagmi Nada

...
...

Technical Support

...
...

Clientele

...
...

Expertise

...  
...

Resume

...
...

Publications

...
...

Contact Us