To review methods of assessing bulbar redness, particularly with respect to the practicality of comparing different rating systems.
The published literature was reviewed and discussed by a panel of experts and a narrative review prepared.
Bulbar hyperemia is a common clinical sign and an important indicator of ocular disease. As bulbar hyperemia is a frequent side effect of topical glaucoma medications, accurate objective measurement is important to allow comparison of clinical studies. A number of different measurement systems have evolved to allow quantification of subjectively assessed redness to be rendered into a form that allows between-treatment comparisons and longitudinal changes in both clinical research and practice. Whereas widespread use of image-based rating scales has improved the assessment of bulbar redness in clinical practice and clinical research, these techniques are less than ideal. The scales are subject to an intrinsic subjectivity and are suboptimal in differentiating the physiologic phenomenon of bulbar hyperemia. There is also a degree of interobserver and intraobserver variation; in some studies, average variation in scores exceed half the extent of the whole scale. Moreover, a lack of interscale validation has led to confusion in comparing the results from clinical studies that use different scales. In a recent series of studies, cross-calibration between the various scales in use has been attempted.
Whereas naive comparisons between the results obtained in studies using different bulbar redness scales can lead to erroneous conclusions, the tools exist to permit meaningful comparisons between rating systems and scales.
Eur J Ophthalmol 2015; 25(4): 273 - 279
Article Type: REVIEW
Article Subject: Oculoplastic eyelid/lacrimal disease
AuthorsChristophe Baudouin, Keith Barton, Michele Cucherat, Carlo Traverso
- • Accepted on 22/04/2015
- • Available online on 21/05/2015
- • Published in print on 25/05/2015
This article is available as full text PDF.
Bulbar hyperemia, a common finding, can be an important indicator of ocular disease. Whereas it may be a sign of nothing more than mild irritation upon wakening or a small foreign body in the eye, it may also be an indication of a sight-threatening condition like an infection or acute angle-closure glaucoma. Bulbar hyperemia belongs to a large class of biomedical parameters where soft measurement—quantification using human perception rather than direct physical measurement—is the appropriate or realistic form of measurement (1).
Hyperemia is a soft physical sign but also a common patient-reported finding. Being a frequently observed adverse effect of topical glaucoma medications, its accurate objective measurement is essential in comparative studies.
Although many biomedical assessments may be rendered into objective parametric measurements, some, particularly those associated with the severity of symptoms, cannot. In the specific case of ocular hyperemia, standardization of photographs in order to make external interpretations in a reading center is highly challenging and has rarely been used in a clinical setting. Thus, there is a long history of the use of rating scales to convert such subjectively assessed parameters into an ordinal form, allowing between-treatment comparisons and enabling the assessment of changes both in clinical research and in clinical practice. A definition of a grading scale as “A tool that enables quantification of the severity of a condition with reference to a set of standardized descriptions or illustrations” has been proposed (2).
A variety of rating scales exist: 0-3 ordinal scales are commonly used in medicine (typically 0 = absent, 1 = mild, 2 = moderate, and 3 = severe), but are relatively coarse and are not sensitive to detecting changes (3); visual analogue scales allow the observer to rate signs as a continuum; image-based scales are also commonly used, where the observer is invited to choose an image that corresponds most closely to the clinical examination. The latter type is commonly used in dermatology and ophthalmology, where direct observation is possible.
Illustrative rating scales have obvious advantages over purely descriptive scales since they reduce the subjectivity of the measurement by introducing objective standardized reference points (4-5-6). The first photographic rating scale to be applied to bulbar redness was reported in 1987 by McMonnies and Chapman-Davies (M-CD) (4, 7). Since then, a number of predominantly image-based scales have been introduced. The interscale validation of these systems has been neglected, and a situation has arisen where it is difficult to compare results from clinical studies that use different scales, or to reliably assess the clinical course of a patient when different scales were used.
The aim of the current review is to describe the scales currently used for measuring the severity of bulbar hyperemia, to contextualize these scales, and to propose a method for comparing grading assessments between the different scales.
The conjunctiva is richly vascularized by tortuous arteries and straight veins in its submucosal layers (8). Vasodilation of these vessels results in the appearance of red eye or bulbar hyperemia. The vasodilation of conjunctival vessels results in enhanced blood flow, leakage of fluid and protein from capillaries, edema, and, in some cases, the triggering of an inflammatory cascade (9).
In general, bulbar hyperemia is an indicator of ocular irritation or inflammation and can be triggered by a variety of stimuli. Hyperemia is a feature of a number of clinical conditions, including morning eye congestion (10), allergic or infective conjunctivitis (11), dry eye (12), contact lens use (2), and as a common adverse effect of glaucoma treatment (13).
Hyperemia in glaucoma treatment
Topical intraocular pressure–lowering treatments often induce hyperemia. Hyperemia is the most frequent side effect of prostaglandin analogue therapy and a frequent cause of treatment discontinuation (14). The differences between prostaglandin analogues have been investigated in a large number of studies that were also assessed by meta-analyses (15-16-17-18-19-20-21-22-23-24). Comparison of studies is hampered by the heterogeneity of rating scales, and further confounded by naive comparisons between studies using different scales, which is discussed later.
It has become clear that the hyperemia is not a consequence of prostaglandin toxicity per se, and it has even been suggested in in vitro experiments that some prostaglandin analogues used in glaucoma treatment have antioxidative properties (25). Indeed, prostaglandin analogues themselves are not directly proinflammatory in in vivo studies, but at least some of the inflammatory effects of commercial glaucoma medications may be related to preservatives, further strengthening the argument for preservative-free preparations (26, 27).
Quantification is a necessity mainly to follow the evolution of signs in the context of clinical treatment or a study. It is desirable that the measurement technique produces accurate, precise, and reproducible results and is easy to use, but most importantly, it must be able to detect small changes in bulbar redness. Descriptive scales (mild, moderate, severe, for example) have generally given way to ordinal and interval scales based on images assessing bulbar redness and several scales have evolved.
The first attempt at an image-based bulbar hyperemia grading scale was by M-CD in 1987 (4, 7), specifically for hyperemia in contact lens wearers. In subsequent years, a number of scales (discussed later) have been introduced in an attempt to improve interobserver or intraobserver variability, but this remains a problem (28). This has led to a situation where a variety of scales are in simultaneous use and, as will be discussed, the mapping of one rating scale onto another is far from perfect. For these reasons, a number of authors have suggested that bulbar redness not be compared between rating scales. However, this leads to significant problems where it is necessary to, for example, compare the results of clinical trials that use different rating scales.
Scales for evaluating bulbar redness
McMonnies and Chapman-Davies scale
The M-CD scale was conceived as a 6-point photographic reference scale specifically for the assessment of hyperemic responses to contact lens use. Six reference photographic images are used to rate hyperemia on a 0-5 scale. In the hands of experienced observers, the intraobserver and interobserver reliability is high (4).
Brien Holden Vision Institute scales
The Brien Holden Vision Institute scales (formerly known as the Cornea and Contact Lens Research Unit [CCLRU] and prior to that as the Institute for Eye Research [IER]) is a comprehensive suite of photographic image-based assessment tools for determining the severity of contact lens complications. It includes a 5-point bulbar redness photographic scale (1 = very slight, 2 = slight, 3 = moderate, 4 = severe [grades less than 1 are permitted]) (29, 30). Although this commonly used scale is relatively coarse, subdivision of the ordinal points appears to be common in clinical practice (31).
Like the Brien Holden Vision Institute scales, the Efron grading scales comprise a series of pictures, in this case rendered by an artist, for describing the severity of contact lens complications. In common with the Brien Holden Vision Institute scales, the cardinal points are frequently subdivided in an attempt to obtain greater resolution and precision in clinical practice.
Validated Bulbar Redness system
This system was developed to combine psychophysical and physical attributes to produce a scale validated against objective, colorimetric data on the degree of redness (32). The scale uses a 0-100 range with which reliable and repeatable results can be obtained independently of the observer’s level of experience. The scale is currently available on a commercial basis as 5-image (Validated Bulbar Redness [VBR]-5) or 10-image (VBR-10) versions. Unlike some other scales, the VBR scale has been validated against objective photometric data with approximately equal steps between grades.
Objective and automated systems
There have been a number of attempts to develop objective and/or automated assessments of bulbar hyperemia.
The widespread availability of personal computers as well as improvements in digital video and image processing hardware and software prompted the development of a number of techniques for automated objective measurement of bulbar redness (33-34-35-36). Some of these techniques could discriminate subtle changes during the diurnal cycle (33) or measure the effects of drugs (36), for example. Some but not all approaches demonstrated good correlation with subjective rating scales (33-34-35-36). Moreover, some of the methods had relatively exotic (for the time) hardware and/or software requirements.
As well as the use of more subtle detection algorithms, later investigations benefited from the easy availability of more sophisticated software and hardware.
Fieguth and Simpson (28) developed a combined image analysis technique utilizing edge-detection methods for determining the expansion of small arteries, and integrated measures of redness to detect the enlargement of very small, unresolvable, blood vessels. Thirty different eye images were analyzed by both an automated system and 72 clinicians using a 0-100 subjective rating scale. The correlation between clinician-rated redness and the image analysis technique was high and better than that observed in some earlier studies (33), perhaps because of the use of a combined edge-detection and integrated redness approach. Extraction of the red color plane from digital images proved the best correlation sensitivity combination for palpebral hyperemia, using an image analysis suite for quantifying changes in ocular physiology in a technique developed by Wolffsohn (42) and Purslow (31), further developed and subsequently validated in a prospective randomized study (37).
A method using a commercial spectrophotometer was proposed by Sorbara and colleagues (38) and, while the correlation with the IER/CCLRU subjective rating scale was only moderate, there was less variability in the measurement of low levels of hyperemia than with the IER/CCLRU. The authors hypothesized that the moderate correlation with the IER/CCLRU scale might be related to the difference in interpretation of redness between the 2 methods.
An image-analysis technique using edge detection and red plane analysis was 50 times more sensitive than the Efron scale employed by optometrists, although the range of severity investigated lay generally in the less-severe range of the Efron scale (39). A subsequent prospective, randomized study was employed to determine the correlation between the objective image analysis approach and ratings on the IER/CCLRU and Efron scales. There were strong correlations between the image analysis method and subjective methods for both bulbar hyperemia and palpebral redness. The reliability of the objective measurements was considerably higher than for the IER/CCLRU and Efron scales.
A proprietary automated system has been described that utilizes redness intensity and fine horizontal conjunctival vessels as key parameters in determining redness (40). The latter feature is purported to be particularly associated with bulbar redness associated with dry eye. Although the repeatability and variability of the method is excellent, the system has not been validated against widely used subjective methods. Moreover, the investigators’ deployment of the Efron and VBR scales may have been flawed.
A promising new automated Ocular Redness Index has been proposed that can utilize ocular images from different sources and using different file formats (41). This method combines a software algorithm that includes white balance correction to allow images taken under different lighting conditions to be compared with a sophisticated region of interest selection tool to enable disparate images to be objectively quantified for ocular redness without the need for clinically trained operators. A further benefit is that the Ocular Redness Index is a continuous scale. Good agreement and correlation was obtained with the Efron and VBR scales.
These methods have the benefit of objectivity and are less subject to the vagaries of observer training, experience, and attention than are the image-based rating systems that have become the de facto standard for clinical trials; nevertheless, it remains an important goal to be able to compare studies that use different rating methods. Future clinical studies may employ objective methods, but the results of these studies will need to be compared to those of historical studies.
Comparison of scales
The widespread use of image-based grading scales has improved the assessment of bulbar redness in clinical practice and clinical research. In a number of areas, these techniques are less than ideal. Such scales are subject to an intrinsic subjectivity and are suboptimal to differentiate the physiologic phenomenon of bulbar hyperemia (28, 35, 39). There is also a degree of interobserver and intraobserver variation (6, 28); in some studies, average variation in scores exceed half the extent of the whole scale (28). The subjectivity in such rating scales is further demonstrated by the strong bias towards multiples of 5, which are chosen far more frequently than other numbers (
Whole number bias. Graph shows distribution of median-removed grades (28). A total of 72 clinicians graded 30 images for bulbar redness on a 0-100 scale. Reference images were provided at 25, 50, and 75 points. The distribution is generally Gaussian with spikes associated with multiples of 5 and 10 points.
Subjective rating scales appear to be the preferred method for estimating bulbar redness for clinical appraisal and research. The plethora of available scales raises a significant problem in terms of conversion of scores from one scale to another. This is of considerable concern in clinical research where, for example, it becomes difficult or impossible to compare bulbar redness between studies using different rating scales. Differences between the scales have prompted a number of authors to propose that the results should not be compared (42-43-44). Until relatively recently, there has been little research on the relationship between grading scales; Efron et al (43) reported differences in score between their grading scale and the IER/CCLRU scale.
In a study in which M-CD, Efron, IER/CCLRU, and VBR scale reference images were subjected to fractal analysis and compared with photometric evaluation, the results indicate that although the scales individually could be considered accurate, differences between the scales are too severe to permit cross-calibration via photometrically derived redness (44). In a subsequent attempt at cross-calibration, the same authors developed a psychophysical method in which naive participants ranked printed copies of the reference images from the M-CD, IER/CCLRU, and Efron scales relative to each other. The ranking of the images was then used to determine average perceived redness and thence used to cross-calibrate the scales. In the first of these studies (45), participants were invited to arrange the reference images from the grading scales along a 0-100 continuum and the position of each image used to determine its perceived redness. Although the images from each scale were arranged in the correct order, the increments between each image in a scale were far from equal, and in some cases adjacent images from the same scale were not perceived to display a different degree of redness (
Relative perceptual redness scores: nonanchored scaling (45). Participants ranked reference images from bulbar redness rating scales in order of perceived redness along a 0-100 continuum. Note the nonidentical redness range covered by each scale, the unequal distribution of the images along the severity scale, and overlapping or near-overlapping position of images on the McMonnies and Chapman-Davies (MC-D) and Validated Bulbar Redness (VBR) scales. IER = Institute for Eye Research.
In the most recent of this series of studies, Schulze and colleagues (46) applied their technique to sample images. In this study, participants ranked sample bulbar images on a 0-100 continuum, aided by the reference images from one of the grading scales that were placed according to the calibrated position obtained in the previous study (45). The recalibrated grading scales produced highly reproducible redness estimates; thus, redness estimates produced, for example, using the IER/CCLRU and Efron scales were similar when cross-calibrated (
Relative perceptual redness scores: anchored scaling (46). Methodology is the same as
How to compare hyperemia data from different studies
Because different trials use different measurement techniques for determining hyperemia, a degree of heterogeneity is likely to exist. Differences in terms of definition, measurement, follow-up duration, and other factors contribute to this heterogeneity. These differences among trials may lead to a between-trial variation in the rate or degree of hyperemia irrespective of any real differences. In order to minimize such differences, meta-analysis studies typically utilize odds ratio (OR) or other relative measure such as relative risk. Using OR, the observed rate of hyperemia in the experimental group of a particular trial is compared with the rate observed in the control group, thus providing a measure adjusted for the particular characteristics of the trial.
The differences in the measurement process between trials influence the absolute rate or frequency of hyperemia in each treatment group by the same extent. Thus, the OR is independent of the measurement process. The use of OR (or risk ratio) thus avoids breaking the randomization, allows an intratrial adjustment, and permits between-trial comparisons. A more sensitive method gives a higher rate of hyperemia, but the ratio between the rate observed in the experimental and control groups will be probably the same as that observed in a trial with a less sensitive method. This is the fundamental principle of indirect comparison techniques also known as adjusted indirect comparison. The advantage of the indirect comparison meta-analysis over the classic meta-analysis is that it permits comparison of results for treatments that were not directly compared in a single trial.
Meta-analysis avoids pooling rates that are confounded by the differences between trials and groups. It is the relative treatment effects observed in each trial that are pooled. The sole assumption required is the constancy of relative effects across trials, an assumption tested in meta-analysis by the heterogeneity test.
However, the relative effect cannot adjust for potential bias arising from evaluator nonmasked trials, where the knowledge of the actual treatment received by the patients could influence the measurement.
Adjusted indirect comparison is a relatively new technique that facilitates reduction in cross-trial differences (47, 48). As an example, in the meta-analysis by Cucherat et al (24), an adjusted indirect comparison was performed between preservative-free latanoprost (T2345) and the other prostaglandin analogues. The relative tolerability of T2345 was estimated by the rate of hyperemia and intratrial ORs were pooled together in order to estimate the indirect OR of T2345 compared to the other treatments. The differences across trials in the definition and measure of hyperemia were, therefore, taken into account.
In another meta-analysis, the same approach, based on the OR, was used, but indirect comparisons were not performed; the direct comparisons already performed in each trial of latanoprost versus travoprost and bimatoprost were pooled (49).
A number of bulbar redness grading scales have evolved, allowing a degree of flexibility in choosing the best scale for a particular task. In the context of clinical research, this makes it difficult to compare one scale with another.
Naive comparisons between scales are likely to lead to misleading conclusions; the results of the studies of Schulze and colleagues (45, 46) illustrated in
Conversion table for bulbar redness rating scale
|CCLRU = Cornea and Contact Lens Research Unit; IER = Institute for Eye Research; MC-D = McMonnies and Chapman-Davies; VBR = Validated Bulbar Redness.|
|Adapted with permission from Schulze et al (46).|
Tools exist to rationally convert bulbar redness rating scale scores from one scale to another. This cannot be done naively and must be performed with due consideration of the differences between the scales. Visual rating scales look likely to remain the preferred method of assessing bulbar redness for some time to come. Standardization on one particular scale appears unlikely for the foreseeable future, so the tools developed for converting among scales will be of considerable value.
In the context of glaucoma treatment, hyperemia has considerable significance since it is a compliance-limiting factor and can decrease treatment efficacy. The implementation of more sophisticated comparisons between studies, perhaps using adjustments to ORs to compensate for the use of different rating scales, would permit a more rational interstudy comparison.
Dr. J.F. Stolz provided assistance with the preparation of the manuscript, funded by Laboratoires Théa.
- Baudouin, Christophe [PubMed] [Google Scholar] 1, * Corresponding Author (email@example.com)
- Barton, Keith [PubMed] [Google Scholar] 2
- Cucherat, Michele [PubMed] [Google Scholar] 3
- Traverso, Carlo [PubMed] [Google Scholar] 4
Quinze-Vingts National Hospital and Vision Institute, University Paris 6, Paris - France
Moorfields Eye Hospital, London - UK
Faculté de Médecine Laennec, Lyon - France
Clinica Oculistica, IRCCS Azienda Ospedaliera, Universitaria San Martino, Genoa - Italy