METHODS: Primary care physicians in PBRNs in the Netherlands, United Kingdom, United States, and Canada enrolled 1773 children aged 6 to 180 months who contributed 6358 tympanograms during 3179 visits. The physicians were trained in the use and interpretation of tympanometry using the Modified Jerger Classification. We determined the level of agreement between physicians and experts for interpretation of tympanograms. One comparison used the 6358 individual ear tracings. A second comparison used the 3179 office visits by children as the unit of analysis.
RESULTS: The distribution of expert interpretation of all tympanograms was: 35.8% A, 30% B, 15.5% C1, 12% C2, and 6.8% uninterpretable; for visits, 37.8% were normal (A or C1), 55.6% abnormal (B or C2), and 6.6% could not be classified. There was a high degree of agreement in the interpretation of tympanograms between experts and primary care physicians across networks (k=0.70-0.77), age groups of children (k=0.69-0.73), and types of visits (k=0.66-0.77). This high degree of agreement was also found when children were used as a unit of analysis.
CONCLUSIONS: Interpretations of tympanograms by primary care physicians using the Modified Jerger Classification can be used with confidence. These results provide further evidence that practicing primary care physicians can provide high quality data for research purposes.
Tympanometry has been assessed and is sometimes promoted as a useful tool in the management of children with ear infections and effusions.1-6 Recently a group at the Centers for Disease Control and Prevention7 recommended tympanometry as a procedure of value when the diagnosis of acute otitis media is uncertain. It provides an objective assessment of the status of the middle ear8-10 and for some children correlates with hearing loss.11-12 The feasibility of using hand-held tympanometers in family practice has been established,3-13 but the accuracy of the interpretations of tympanograms made by primary care physicians is unknown.14 We report the level of agreement of interpretations of tympanograms between practicing primary care physicians and experts.
Methods
As part of a study of acute otitis media, 131 primary care physicians obtained 6358 tympanograms from 1773 children aged 6 to 180 months during 3179 routine practice visits: 2236 in the Netherlands, 1594 in the United Kingdom, and 2528 in North America. Data from Canada and the United States were combined, because the practices were united in one network (The Ambulatory Sentinel Practice Network), and followed the same study standards. Visits occurred either at the time of the diagnosis of a new episode of acute otitis media, or at 2- or 5-month study follow-up visits. Diagnostic criteria for acute otitis media included either otoscopic evidence of a bulging tympanic membrane, drainage of pus, or a red ear accompanied by ear pain.
A study coordinator trained each physician in the otoscopic examination of the ear, the use of the Welch Allyn Micro Tymp 2 (Skaneateles Falls, NY), and tympanogram interpretation. The study physicians were observed and coached as necessary until they were able to demonstrate competence to the study coordinator. The physicians were provided with a calibrated tympanometer and printer. The Modified Jerger Classification1 which includes 5 categories (A, C1, C2, B, and uninterpretable) was used. This established classification is based primarily on the pressure at which acoustic admittance is greatest (A: -99 to 200 daPa; C1: -199 to -100 daPa; C2: -399 to -200 daPa; B: less than -399, seen as a flat tracing)
Tympanograms were forwarded to national data centers and blindly reinterpreted by 1 of 3 national study coordinators. The study coordinators identified difficult to interpret tympanograms, reached agreement about rules to be used in their interpretation, and informed the participating physicians of these rules during the ongoing study. These national coordinators and the criterion referee interpreted a set of 52 tympanograms randomly selected from a pool of difficult to interpret tympanograms. The k statistic, a chance-corrected measure of agreement, was calculated using SPSS software (Chicago, Ill) to determine inter-rater agreement.15 A k of 0.75 or greater represents excellent agreement beyond chance, and values between 0.40 and 0.75 represent fair to good agreement. Kappas for expert interrater reliability ranged from 0.77 to 0.95. Conflicts among the interpretations of the expert national study coordinators were resolved by the most experienced investigator, who served as the study’s criterion standard.13