Reliability in the Transcription of Disordered Speech

 

1.1    Introduction

There are competing tensions in the phonetic transcription of disordered speech. First, there is the need to produce as accurate a transcription as possible to aid in the analysis of the speech of the patient being investigated, and to inform the patterns of intervention that will be planned in remediation. Opposed to this requirement is the problem of reliability; it has been suggested that the more detailed a transcription is, the less reliable it tends to be. This research project has two main aims: to test whether the recent introduction of specialist symbols for aspects of atypical speech can produce reliable transcriptions, and whether aspect of acoustic instrumentation can help resolve disagreements in transcription. The overall objective, therefore, is to produce a set of measures which, if undertaken by clinical phoneticians and speech and language therapists, will improve the description of disordered speech and thereby facilitate appropriate therapeutic measures.

 

1.2  Transcription

1.2.1 Narrow and Broad Transcription

As just noted, accurate phonetic transcription is required for adequate analysis of the patterns of disordered speech in a patient, and therefore for effective and efficient intervention by the therapist. This importance has been highlighted by many researchers in the field. For example, Carney (1979) warned of the dangers of inappropriate abstraction in transcription by using broad, less detailed, symbolization reflecting the phonological units of the target pronunciation, and thereby running the danger of over- or underestimating the patient's phonological abilities.

        Such dangers are discussed also in Buckingham and Yule (1987), who note (p.123) "without good phonetics, there can be no good phonology". Their focus of interest is in 'phonemic false evaluation', a process whereby listeners assign sounds to a particular category or sound unit of the target system, ignoring differences at a phonetic (or 'sub-phonemic') level. With speech disordered patients this often results in sounds being categorized as belonging to units other than that intended by the speaker. Buckingham and Yule stress the importance of accurate phonetic description to allow for an analysis that distinguishes between a disorder that involves phonological simplification (e.g. complete merger of certain phonological units), and one where phonetic differences between the patient's and the target system need to be highlighted.

        Ball (1988) and Ball, Rahilly and Tench (1996) illustrate some of the problems associated with a broad transcription of disordered speech data. An example from the latter source is given below, using material from disordered child phonology:

 

Subject A. Age 6;9. Broad Transcription.

                pin                  [pIn]                                               ten                [ten]

                bin                  [pIn]                                    done                       [t¿n]

                cot                  [kAÉt]                                              pea                [piÉ]

                got                  [kAÉt]                                              bee                [piÉ]

 

This data set suggests that there is a collapse of phonological contrast: specifically the contrast between voiced and voiceless plosives in word-initial position. This clearly leads to homonymic clashes between, for example, 'pin' and 'bin' and 'cot' and 'got' respectively. As word-initial plosives have a high functional load in English, such a loss of the feature contrast [±voice] in this context clearly requires treatment. It would appear from this data, that an initial stage of treatment would concentrate on the establishment of the notion of contrast with this sounds, before going on to practice the phonetic realization of this contrast.

        However, if we look at a narrow transcription of the same data, the picture alters.

 

Speaker B. Age 6;9. Narrow Transcription.

                pin                  [pîIn]                                              ten                [tîen]

                bin                  [pIn]                                    done                       [t¿n]

                cot                  [kîAÉt]                                             pea                [pîiÉ]

                got                  [kAÉt]                                              bee                [piÉ]

 

It is clear from this transcription that there is not, in fact, a loss of contrast between initial voiced and voiceless plosives. Target voiceless plosives are realized without vocal fold vibration (voice), but with aspiration on release (as are the adult target forms). The target voiced plosives are realized without aspiration (as with the adult forms), but also without any vocal fold vibration. It is this last differences that distinguishes them from the target form. For, while adult English 'voiced' plosives are often devoiced for some of their duration in initial position, totally voiceless examples are rare.

        The narrow transcription shows, therefore, that the difference between the speaker's pronunciation of these sounds and the target is minimal. The notion of contrast does not need to be established and, as aspiration is the main acoustic cue used by adults to perceive the difference between these groups of plosives, the child's speech may well sound only slightly atypical.

 

1.2.3  Transcriber Reliability

While it can be demonstrated that narrow transcriptions of disordered speech are important to avoid the kinds of misanalysis shown above, there is also evidence to suggest that narrow phonetic transcription — as opposed to broad — often produces problems of reliability. Reliability as used in this context has two main exponents: inter-judge and intra-judge reliability. Inter-judge reliability refers to measures of agreement between separate transcribers when dealing with the same data. Agreement measures usually involve one-to-one comparison of the symbols and the diacritics used, though it is possible to refine such measures through including features such as 'complete match', 'match within one phonetic feature' (such as voice, place, etc.), and 'non-match'. Intra-judge reliability refers to measures of agreement between first and subsequent transcriptions of the same material by the same judge.

        In order for narrow transcriptions to be trusted as bases for analysis and remediation, it is important that issues of reliability are dealt with. However, as Shriberg and Lof (1991) have noted, barely three dozen studies from the 1930s onwards have addressed this issue. Their own study was based on a series of transcriptions of different patient types undertaken for other purposes over several years. They used a series of different transcriber teams to undertake broad and narrow transcriptions of the data, and then compared results across consonant symbols, vowel symbols and diacritics (these last based on the Shriberg and Kent, 1982, system). Their results cover a large range of variables, but in essence there is a good level of agreement (inter- and intra-judge) with broad transcription, but on most measures narrow transcription does not produce acceptable levels of reliability.

        Shriberg and Lof's (1991) study is clearly an important one, but suffers from the fact that the data used was not primarily intended for such an investigation, that the symbols utilized lacked the recent developments towards a comprehensive set for atypical speech sounds, and that access to acoustic instrumentation was not available.

 

1.2.4  The extIPA symbols

One of the problems encountered with the transcription of disordered speech is that the transcriber is likely to need to deal with non-normal speech sounds while using a transcription system devised only to deal with the speech sounds of natural language. The International Phonetic Alphabet (IPA) is the symbol system used by most clinical phoneticians and speech-language therapists. It was, however, drawn up to transcribe the range of speech sounds found normally in language. There are numerous possible speech sounds not recorded in natural language that nevertheless occur with relative frequency in a range of speech disorders. One possible explanation, then, for the sorts of reliability results reported in Shriberg and Lof (1991) could lie in the fact the transcribers have not always been adequately equipped to undertake narrow transcription through a lack of specialist symbolization.

        Ball (1988, 1991), Duckworth et al (1990) Ball et al (1994) have charted the development of specialist symbol systems for disordered speech from the 1970s to the present. This work culminated in the adoption of the 'Extensions to the International Phonetic Alphabet for the transcription of disordered speech and voice quality', now known by the abbreviation 'extIPA'. This system is described in Duckworth et al (1990), with additions noted in Bernhardt and Ball (1993); examples of the system in use with a variety of atypical segmental and suprasegmental speech are available in Ball (1991) and Ball et al (1994).

        The extIPA system introduces a range of new symbols and diacritics to cope with non-normal place and manner of articulation, phonatory activity, nasalization and nasal friction together with velopharyngeal friction, reiteration, together with means of marking prosodic features such as voice quality, tempo, loudness and pausing. A range of atypical speech, including children's articulation disorders, cranio-facial disorders, fluency problems and acquired neurogenic disorders in adults can be covered by these symbols, though it must be recalled that many phonologically disordered patients may never use such atypical sounds.

        These symbols are gradually being introduced into the training of speech-language therapists in Britain, though until now there has been no research undertaken to see whether the use of this dedicated symbol set has enabled high inter- and intra-judge reliability scores to be obtained in the transcription of speech disordered patients. It may well be the case that the use of this symbolization will allow transcribers to avoid the tendency to abstract away from 'difficult' sounds to a symbol used for a more familiar similar sound, caused by the lack of a symbol specifically for the sound in question. On the other hand, we may find an 'overload' effect, in that transcribers will find it difficult to learn and/or apply a still larger set of symbols than the standard IPA set.

        It is one of the aims of this study to investigate the effect of the extIPA system on reliability measures in the transcription of disordered speech. To this end, the speech of a variety of patient types will be investigated. Of most interest will be speech that contains at least some atypical sounds, so that reliability in the specific subset of the extIPA symbols can be investigated as well as overall. Nevertheless, patients with less severe disordered speech will also be included to see whether the additional training the transcriber receive in extIPA might aid their abilities with the IPA itself.

 

1.2.5 Instrumental Analysis

Shriberg and Lof (1991) conclude their study by pointing to a future 'marriage' of instrumental phonetic description with impressionistic transcription (see also Ball 1988), to overcome the problems of narrow transcription reliability. Recent studies show that this development is beginning to occur with some clinical phonetics cases. Klee and Ingrisano (1992) and Ball and Rahilly (1993). Recent software development also highlights the growing use of computer technology as an aid to speech analysis and transcription, Most notable in this regard is the Kay Elemetrics CSL Phonetics Tutorial that provide spectrographic and electropalatographic traces for a wide range of IPA symbols that can be matched with traces captured by the user to aid in the correct assignment of symbols. This system is currently being extended for Kay by the applicant to the extIPA symbol set.

        Another aim of this study will be how far the CSL Tutorial program can aid in transcribing the speech samples collected. This system will be referred to after impressionistic transcriptions have been analysed, to see to what extent — if any — it can resolve uncertainties and inconsistencies between transcribers and between transcriptions of the same transcriber.

 

1.3  Research Questions

The research questions to be addressed in this project are as follows.

1) What level of inter-judge reliability is found in the narrow phonetic description of disordered speech using additional symbols specifically designed for this area (the extIPA symbols).

2) What level of intra-judge reliability is found in the narrow phonetic description of disordered speech using the extIPA.

3) What relation if any exists between inter- and intra-judge disagreements and the type and severity of speech disorder, and the type of speech sample (i.e. word-list as opposed to spontaneous speech).

4) To what extent can consensus be reached on disagreements in transcription through accessing acoustic instrumental analyses of the speech samples, and what sound types are most liable to such agreement.

5) How can the results of the project inform a training and analysis programme to maximise reliability in the narrow transcription of disordered speech.

 

1.4  Data Collection and Method

1.4.1  Initial Training

The research assistants will undergo a short period of intensive training in the use of the extIPA symbols, conducted by the applicant.

 

1.4.2  Subjects

Subjects will be accessed through existing links with local Speech and Language Therapy services in both Health Centres and Hospitals, and through patients working with other members of the academic staff in the Department of Communication.

        Selection criteria are those of type of disorder and severity of disorder. We intend investigating five types of speech disorder that should illustrate a range of atypical speech sounds covered by extIPA symbols. These are: child articulation disorders, cranio-facial disorders (cleft palate), adult disfluency (stuttering), adult apraxia of speech, and adult dysarthria. As it is the disordered speech that is the focus of the study, there is no requirement to match subjects in terms of age, sex, time since onset etc.

        In terms of severity, we wish to include both severe and moderately disordered data in our analyses. To this end we will seek to select two subjects in the severe grouping and two in the moderate grouping for each disorder type: resulting in 20 subjects in toto. The classification of subjects into moderate and severe groupings before undertaking an analysis of their speech is not, of course, straightforward. In this regard we will rely on the judgements of the subjects' speech and language therapists and our own informal assessment. Permission from the University Ethics Committee has been obtained for the use of subjects' speech in this study.

 

 

1.4.3  Data

The data to be collected will be of two types. First, spontaneous speech will be elicited from the subjects. This will naturally differ in amount and topic from subject to subject, but for the purposes of narrow phonetic transcription, a large amount of such material is not necessary. To aid direct comparability each subject will also be required to undertake a standard picture elicitation procedure (in this case that of the Edinburgh Articulation Test). This will also allow us to investigate the claim of Shriberg and Lof (1991) that continuous speech produces higher reliability scores in narrow transcription than do word-lists.

        The data will be recorded on high quality portable digital auditory tape recorders (DAT). Where possible, subjects will be recorded in the Phonetics Laboratory of the University of Ulster; otherwise a quiet area will be utilized to ensure good quality recordings.

        Video recordings will be made of all data acquisition sessions, as visual information is important for transcribing certain sounds (including atypical sounds such as linguolabials and dentolabials).

 

1.4.4  Analysis

All recordings will be transcribed by all three researchers as soon as possible after they are made. Transcriptions will be repeated after two months to minimize the effect of memory of the first transcription session. Transcriptions will be only of segmental information; an examination of reliability in suprasegmental (prosodic) transcription is beyond the scope of this project. The transcriptions will be narrow (i.e. aiming to include the maximum amount of information); there will be no comparison with broad transcriptions as we know from previous studies (see Shriberg and Lof 1991) that they consistently produce high reliability scores, though as noted above their accuracy is doubtful. Finally, the focus of the transcription is on the consonant system, precise values of vowels will not be sought, though features such as nasality will be marked.

        To assess reliability, a straightforward matching procedure will be undertaken. Unlike Shriberg and Lof (who were comparing broad and narrow transcription ratings) we do not intend to discriminate between symbols and diacritics; the match will be between segments, whether these are represented by a symbol alone or a symbol plus diacritic. In cases of mismatch, we will note whether the mismatch is near (within one phonetic feature) or not near (more than one phonetic feature different).

        As with Shriberg and Lof (1991) agreement tables for symbols and for phonetic features (such as place, voicing etc.) will be drawn up, with word position (initial, medial and final) identified. Percentage agreements will be worked out together with measures of near agreement.  Non-parametric inferential statistics will be used to support trends in the data. As well as inter- and intra-judge reliability, we will examine the relationships between subject type and severity of disorder, and examine disagreement trends between the three judges.

        Following each transcription, acoustic instrumental analysis of relevant parts of the tape will be undertaken using the Kay Elemetrics CSL system of the Phonetics Laboratory of the University of Ulster. This will concentrate on examining areas of disagreement at both the inter- and intra-judge level. The transcribers may access the Kay Phonetics Tutorial programs for both IPA and extIPA symbols to help in examining these disagreements, and if consensus can be reached through this procedure, it will be noted separately. We will then examine any trends of consensus reaching through the use of acoustic instrumentation.

 

1.5  Strategic Implications

One of the main aims of this project is to provide principled guidance in the undertaking of narrow phonetic transcription of disordered speech. It is hoped that a programme may be drawn up to guide clinical phoneticians and speech and language therapists, as well as lecturers on communicative disorders degree courses how best to approach this task. It should show which sound types — both normal and disordered — regularly demonstrate high levels of disagreement, and which sound types seem most amenable to the aid of instrumental analysis. It is expected it will also demonstrate the value of the extIPA symbols and the extIPA CSL Tutorial, and so aid in the further dissemination of this new tool.

 

1.6  Expected Outputs

Results will be made available to all those with an interest in the outputs of the research through a variety of channels. A final report on the project will form the basis of a workbook in phonetic description of disordered speech aimed at students and therapists to improve their skills in this area.

        We would also aim to publish several papers in the academic journals in the field of communication disorders and phonetics. The applicant has considerable experience in publishing in this area, and would be able to aid the research assistants to increase their publishing profile.

        We would aim to present papers at the Annual Congress of the International Clinical Phonetics and Linguistics Association (Montreal, 1998), and at the XIV International Congress of Phonetic Sciences, Berkeley (August 1999), and at an appropriate annual Convention of the American Speech-Language-Hearing Association. Work in progress would be reported to the irregular meetings of the British and Irish Group of the International Clinical Phonetics and Linguistics Association (ICPLA-BIG).

 

References

 

Ball, M. J. (1988) The contribution of speech pathology to the development of phonetic description. In Ball, M. J. (ed.), Theoretical Linguistics and Disordered Language. London: Croom Helm.

Ball, M. J. (1991) Recent developments in the transcription of non-normal speech. Journal of Communication Disorders, 24, 59-78.

Ball, M. J. and Rahilly, J. (1993) Transcribing disfluent speech: a case study. ICPLA North-West Pacific Regional Group Meeting, University of British Columbia.

Ball, M. J., Code, C., Rahilly, J. and Hazlett, D. (1994) Non-segmental aspects of disordered speech: Developments in transcription.  Clinical Linguistics and Phonetics, 8, 67-83. 

Ball, M. J., Rahilly, J. and Tench, P. (1996) The Phonetic Transcription of Disordered Speech. San Diego: Singular Press.

Bernhardt, B. and Ball, M. J. (1993) Characteristics of atypical speech currently not included in the Extensions to the IPA. Journal of the International Phonetic Association, 23, 35-38.

Buckingham, H. W. and Yule, G. (1987) Phonemic false evaluation: theoretical and clinical aspects. Clinical Linguistics and Phonetics, 1, 113-25.

Carney, E. (1979) Inappropriate abstraction in speech-assessment procedures. British Journal of Disorders of Communication, 14, 123-35.

Duckworth, M., Allen, G., Hardcastle, W. and Ball, M. J. (1990) Extensions to the International Phonetic Alphabet for the transcription of atypical speech. Clinical Linguistics and Phonetics, 4, 273-80.

Klee, T. and Ingrisano, D. (1992) Clarifying the transcription of indeterminable utterances. Paper presented at ASHA Convention, San Antonio.

Shriberg, L. and Kent, R. D. (1982) Clinical Phonetics. New York: Macmillan.

Shriberg, L. and Lof, G. (1991) Reliability studies in broad and narrow transcription. Clinical Linguistics and Phonetics, 5, 225-79.

 

 

 

2.1 Summary

 

The literature amply illustrates the problem of inaccurate description of disordered speech through imprecise phonetic transcription of clinical speech material. Such inaccurate description will often result in wrong diagnosis and thus inappropriate management programme being implemented. There is also the danger of inaccurate prognosis, with a knock on effect on resource planning.

 

This project will investigate inter- and intra-scorer reliability measures for the narrow transcription of a range of disordered speech types when transcribers are trained in the use of the new symbols: 'Extensions to the International Phonetic Alphabet for the Transcription of Disordered Speech' (extIPA). It will further ascertain the effect of access to acoustic phonetic data on the resolution of transcription disagreements.

 

3.1 Aims

1) To investigate what level of inter-judge reliability is found in the narrow phonetic description of disordered speech using additional symbols specifically designed for this area (the extIPA symbols).

2) To investigate what level of intra-judge reliability is found in the narrow phonetic description of disordered speech using the extIPA.

3) To ascertain what relation if any exists between inter- and intra-judge disagreements and the type and severity of speech disorder, and the type of speech sample (i.e. word-list as opposed to spontaneous speech).

4) To evaluate the extent to which consensus can be reached on disagreements in transcription through accessing acoustic instrumental analyses of the speech samples, and what sound types are most liable to such agreement.

5) To produce a programme of training and analysis to maximise reliability in the narrow transcription of disordered speech.

 

3.2 Method

The research assistant will undergo a short period of intensive training in the use of the extIPA symbols and acoustic analysis (where necessary), conducted by the applicants. The research assistant will be responsible for the data collection.

 

Subjects will be accessed through existing links with local Speech and Language Therapy services in both Health Centres and Hospitals, and through patients working with other members of the academic staff in the School of Behavioural & Communication Sciences. Selection criteria are those of type of disorder and severity of disorder. We intend investigating five types of speech disorder that should illustrate a range of atypical speech sounds covered by extIPA symbols. These are: child articulation disorders (developmental verbal dyspraxia), cranio-facial disorders (cleft palate), adult disfluency (stuttering), adult apraxia of speech, and adult dysarthria. As it is the disordered speech that is the focus of the study, there is no requirement to match subjects in terms of age, sex, time since onset etc.

 

In terms of severity, we wish to include both severe and moderately disordered data in our analyses. To this end we will seek to select two subjects in the severe grouping and two in the moderate grouping for each disorder type: resulting in 20 subjects in toto. The classification of subjects into moderate and severe groupings before undertaking an analysis of their speech is not, of course, straightforward. In this regard we will rely on the judgements of the subjects' speech and language therapists and our own informal assessment. Permission from the University Ethics Committee will be obtained for the use of subjects' speech in this study.

 

The data to be collected will be of two types. First, spontaneous speech will be elicited from the subjects. To aid direct comparability each subject will also be required to undertake a standard picture elicitation procedure . This will also allow us to investigate claims that continuous speech produces higher reliability scores in narrow transcription than do word-lists. The data will be recorded on high quality digital auditory tape recorders (DAT).

 

3.3 Analysis

All recordings will be transcribed by all three researchers as soon as possible after they are made. Transcriptions will be repeated after three months to minimize the effect of memory of the first transcription session. Transcriptions will be only of segmental information; an examination of reliability in suprasegmental (prosodic) transcription is beyond the scope of this project. The transcriptions will be narrow, i.e. aiming to include the maximum amount of information. The transcription will cover both the consonant and the vowel systems.

 

Following each transcription, acoustic instrumental analysis of relevant parts of the tape will be undertaken in the Phonetics Laboratory of the University of Ulster. This will concentrate on examining areas of disagreement at both the inter- and intra-judge level. We will then examine any trends of consensus reaching through the use of acoustic instrumentation.

 

3.4 Timescale

Year 1: training of RA in use of extIPA symbols and acoustic instrumentation; commencement of data acquisition.

 

Year 2: further data acquisition; data analysis and re-analysis sessions.

 

Year 3: completion of analysis sessions; preparation of transcription guidelines programme; dissemination of results; preparation of final report.

 

 

4.1 Novelty

No research has been undertaken on the use of the extIPA symbols in narrow transcription of disordered speech. While work exists on transcription reliability in both normal and disordered speech using the ordinary International Phonetic Alphabet, no attempts have been made to provide guidelines in transcription linking transcription with acoustic instrumentation.

 

 

4.2 Significance

Virtually all practising Speech-Language Therapists utilise phonetic transcription in their description of the speech of their clients, as few have access to instrumental techniques. This research therefore is important, as it will provide explicit, principled guidance in the undertaking of narrow phonetic transcription of disordered speech. It is hoped that a programme may be drawn up to guide clinical phoneticians and speech and language therapists, as well as lecturers on communicative disorders degree courses how best to approach this task. It should show which sound types — both normal and disordered — regularly demonstrate high levels of disagreement, and which sound types seem most amenable to the aid of instrumental analysis.