Article Type
Changed
Fri, 09/18/2020 - 10:55

In an earlier article, we looked at the meaning of the P value.1 This time we will look at another crucial statistical concept: that of confounding.

Dr. Manol Jovani is a therapeutic endoscopy fellow at Johns Hopkins Hospital in Baltimore.
Dr. Manol Jovani

Confounding, as the name implies, is the recognition that crude associations may not reflect reality, but may instead be the result of outside factors. To illustrate, imagine that you want to study whether smoking increases the risk of death (in statistical terms, smoking is the exposure, and death is the outcome). You follow 5,000 people who smoke and 5,000 people who do not smoke for 10 years. At the end of the follow-up you find that about 40% of nonsmokers died, compared with only 10% of smokers. What do you conclude? At face value it would seem that smoking prevents death. However, before reaching this conclusion you might want to look at other factors. A look at the dataset shows that the average baseline age among nonsmokers was 60 years, whereas among smokers was 40 years. Could this be the cause of the results? You repeat the analysis based on strata of age (i.e., you compare smokers who were aged 60-70 years at baseline with nonsmokers who were aged 60-70 years, smokers who were aged 50-60 years with nonsmokers who were aged 50-60 years, and so on). What you find is that, for each category of age, the percentage of death among smokers was higher. Hence, you now reach the opposite conclusion, namely that smoking does increase the risk of death.

What happened? Why the different result? The answer is that, in this case, age was a confounder. What we initially thought was the effect of smoking was, in reality, at least in part, the effect of age. Overall, more deaths occurred among nonsmokers in the first analysis because they were older at baseline. When we compare people with similar age but who differ on smoking status, then the difference in mortality between them is not because of age (they have the same age) but smoking. Thus, in the second analysis we took age into account, or, in statistical terms, we adjusted for age, whereas the first analysis was, in statistical terms, an unadjusted or crude analysis. We should always be aware of studies with only crude results, because they might be biased/misleading.2

In the example above, age is not the only factor that might influence mortality. Alcohol or drug use, cancer or heart disease, body mass index, or physical activity can also influence death, independently of smoking. How to adjust for all these factors? We cannot do stratified analyses as we did above, because the strata would be too many. The solution is to do a multivariable regression analysis. This is a statistical tool to adjust for multiple factors (or variables) at the same time. When we adjust for all these factors, we are comparing the effect of smoking in people who are the same with regard to all these factors but who differ on smoking status. In statistical terms, we study the effect of smoking, keeping everything else constant. In this way we “isolate” the effect of smoking on death by taking into account all other factors, or, in statistical terms, we study the effect of smoking independently of other factors.

How many factors should be included in a multivariable analysis? As a general rule, the more the better, to reduce confounding. However, the number of variables to include in a regression model is limited by the sample size. The general rule of thumb is that, for every 10 events (for dichotomous outcomes) or 10 people (for continuous outcomes), we can add one variable in the model. If we add more variables than that, then in statistical terms the model becomes overfitted (i.e., it gives results that are specific to that dataset, but may not be applicable to other datasets). Overfitted models can be as biased/misleading as crude models.3

 

 


What are we to do about other factors that may affect mortality independently of smoking (e.g., diet), but which are not found in our dataset? Unfortunately, nothing. Since we do not have that information, we cannot adjust for it. In this case, diet is in statistical terms an unmeasured confounder. Unfortunately, in all observational studies there is always at least some degree of unmeasured confounding, because there may be many factors that can influence the outcome (and the exposure) which are not part of the dataset. While some statistical tools have been developed to estimate unmeasured confounding, and therefore interpret the results in its light, unmeasured confounding remains one of the major limitations of observational studies.4

Randomized, controlled trials (RCTs) on the other side do not have this problem in theory. With properly designed RCTs, all confounders, both measured and unmeasured, will be balanced between the two groups. For example, imagine an RCT where some patients are randomized to take drug A or drug B. Because patients are randomly allocated to one group or the other, it is assumed that all other factors are also randomly distributed. Hence, the two groups should be equal to each other with respect to all other factors except our active intervention, namely the type of drug they are taking (A or B). For this reason, in RCTs there is no need to adjust for multiple factors with a multivariable regression analysis, and crude unadjusted results can be presented as unbiased.

There is however a caveat. What happens if one patient who was randomized to take drug A takes drug B instead? Should she still be counted in analysis under drug A (as randomized) or under drug B (as she took it)? The usual practice is to do this and present both. In the first case, we will have the intention-to-treat (ITT) analysis, and in the second case, the per-protocol analysis (PPA). The advantage of the ITT is that it keeps the strength of randomization, namely the balancing of confounders, and therefore can present unbiased results. The advantage of the PPA is that it measures what was actually done in reality. However, in this case there is a departure from the original randomization, and hence there is the possibility of introducing confounding, because now patients are not randomly allocated to one treatment or the other. The larger the departure from randomization, the more probable the introduction of bias/confounding. For example, what if patients with more severe disease took drug A, even though they were randomized to take drug B? That will have an influence the outcome. For this reason, outcomes of the ITT analysis are considered the main results of RCTs, because PPA results can be confounded.

In summary, when reading studies, do not simply accept the results as they are presented, but rather ask yourself: “Could they be confounded by other factors, and therefore be unreliable? What steps did the authors take to reduce confounding? If they presented only crude analyses, and this was not justified by a RCT design, do they recognize it as a major limitation?” There are many nuances in every paper that can be appreciated only through a careful reading of the methods section. Hopefully, this article can shed some light on these issues and help the readers to not be confounded.
 

References

1. The P value: What to make of it? A simple guide for the uninitiated. GI and Hepatology News. 2019 Sep 23. https://www.mdedge.com/gihepnews/article/208601/mixed-topics/p-value-what-make-it-simple-guide-uninitiated

2. VanderWeele TJ et al. Ann Stat. 2013 Feb;41(1):196-220.

3. Concato J et al. Ann Intern Med. 1993 Feb 1;118(3):201-10.

4. VanderWeele TJ et al. Ann Intern Med. 2017 Aug 15;167(4):268-74.

Dr. Jovani is a therapeutic endoscopy fellow in the division of gastroenterology and hepatology at Johns Hopkins Hospital, Baltimore.

Publications
Topics
Sections

In an earlier article, we looked at the meaning of the P value.1 This time we will look at another crucial statistical concept: that of confounding.

Dr. Manol Jovani is a therapeutic endoscopy fellow at Johns Hopkins Hospital in Baltimore.
Dr. Manol Jovani

Confounding, as the name implies, is the recognition that crude associations may not reflect reality, but may instead be the result of outside factors. To illustrate, imagine that you want to study whether smoking increases the risk of death (in statistical terms, smoking is the exposure, and death is the outcome). You follow 5,000 people who smoke and 5,000 people who do not smoke for 10 years. At the end of the follow-up you find that about 40% of nonsmokers died, compared with only 10% of smokers. What do you conclude? At face value it would seem that smoking prevents death. However, before reaching this conclusion you might want to look at other factors. A look at the dataset shows that the average baseline age among nonsmokers was 60 years, whereas among smokers was 40 years. Could this be the cause of the results? You repeat the analysis based on strata of age (i.e., you compare smokers who were aged 60-70 years at baseline with nonsmokers who were aged 60-70 years, smokers who were aged 50-60 years with nonsmokers who were aged 50-60 years, and so on). What you find is that, for each category of age, the percentage of death among smokers was higher. Hence, you now reach the opposite conclusion, namely that smoking does increase the risk of death.

What happened? Why the different result? The answer is that, in this case, age was a confounder. What we initially thought was the effect of smoking was, in reality, at least in part, the effect of age. Overall, more deaths occurred among nonsmokers in the first analysis because they were older at baseline. When we compare people with similar age but who differ on smoking status, then the difference in mortality between them is not because of age (they have the same age) but smoking. Thus, in the second analysis we took age into account, or, in statistical terms, we adjusted for age, whereas the first analysis was, in statistical terms, an unadjusted or crude analysis. We should always be aware of studies with only crude results, because they might be biased/misleading.2

In the example above, age is not the only factor that might influence mortality. Alcohol or drug use, cancer or heart disease, body mass index, or physical activity can also influence death, independently of smoking. How to adjust for all these factors? We cannot do stratified analyses as we did above, because the strata would be too many. The solution is to do a multivariable regression analysis. This is a statistical tool to adjust for multiple factors (or variables) at the same time. When we adjust for all these factors, we are comparing the effect of smoking in people who are the same with regard to all these factors but who differ on smoking status. In statistical terms, we study the effect of smoking, keeping everything else constant. In this way we “isolate” the effect of smoking on death by taking into account all other factors, or, in statistical terms, we study the effect of smoking independently of other factors.

How many factors should be included in a multivariable analysis? As a general rule, the more the better, to reduce confounding. However, the number of variables to include in a regression model is limited by the sample size. The general rule of thumb is that, for every 10 events (for dichotomous outcomes) or 10 people (for continuous outcomes), we can add one variable in the model. If we add more variables than that, then in statistical terms the model becomes overfitted (i.e., it gives results that are specific to that dataset, but may not be applicable to other datasets). Overfitted models can be as biased/misleading as crude models.3

 

 


What are we to do about other factors that may affect mortality independently of smoking (e.g., diet), but which are not found in our dataset? Unfortunately, nothing. Since we do not have that information, we cannot adjust for it. In this case, diet is in statistical terms an unmeasured confounder. Unfortunately, in all observational studies there is always at least some degree of unmeasured confounding, because there may be many factors that can influence the outcome (and the exposure) which are not part of the dataset. While some statistical tools have been developed to estimate unmeasured confounding, and therefore interpret the results in its light, unmeasured confounding remains one of the major limitations of observational studies.4

Randomized, controlled trials (RCTs) on the other side do not have this problem in theory. With properly designed RCTs, all confounders, both measured and unmeasured, will be balanced between the two groups. For example, imagine an RCT where some patients are randomized to take drug A or drug B. Because patients are randomly allocated to one group or the other, it is assumed that all other factors are also randomly distributed. Hence, the two groups should be equal to each other with respect to all other factors except our active intervention, namely the type of drug they are taking (A or B). For this reason, in RCTs there is no need to adjust for multiple factors with a multivariable regression analysis, and crude unadjusted results can be presented as unbiased.

There is however a caveat. What happens if one patient who was randomized to take drug A takes drug B instead? Should she still be counted in analysis under drug A (as randomized) or under drug B (as she took it)? The usual practice is to do this and present both. In the first case, we will have the intention-to-treat (ITT) analysis, and in the second case, the per-protocol analysis (PPA). The advantage of the ITT is that it keeps the strength of randomization, namely the balancing of confounders, and therefore can present unbiased results. The advantage of the PPA is that it measures what was actually done in reality. However, in this case there is a departure from the original randomization, and hence there is the possibility of introducing confounding, because now patients are not randomly allocated to one treatment or the other. The larger the departure from randomization, the more probable the introduction of bias/confounding. For example, what if patients with more severe disease took drug A, even though they were randomized to take drug B? That will have an influence the outcome. For this reason, outcomes of the ITT analysis are considered the main results of RCTs, because PPA results can be confounded.

In summary, when reading studies, do not simply accept the results as they are presented, but rather ask yourself: “Could they be confounded by other factors, and therefore be unreliable? What steps did the authors take to reduce confounding? If they presented only crude analyses, and this was not justified by a RCT design, do they recognize it as a major limitation?” There are many nuances in every paper that can be appreciated only through a careful reading of the methods section. Hopefully, this article can shed some light on these issues and help the readers to not be confounded.
 

References

1. The P value: What to make of it? A simple guide for the uninitiated. GI and Hepatology News. 2019 Sep 23. https://www.mdedge.com/gihepnews/article/208601/mixed-topics/p-value-what-make-it-simple-guide-uninitiated

2. VanderWeele TJ et al. Ann Stat. 2013 Feb;41(1):196-220.

3. Concato J et al. Ann Intern Med. 1993 Feb 1;118(3):201-10.

4. VanderWeele TJ et al. Ann Intern Med. 2017 Aug 15;167(4):268-74.

Dr. Jovani is a therapeutic endoscopy fellow in the division of gastroenterology and hepatology at Johns Hopkins Hospital, Baltimore.

In an earlier article, we looked at the meaning of the P value.1 This time we will look at another crucial statistical concept: that of confounding.

Dr. Manol Jovani is a therapeutic endoscopy fellow at Johns Hopkins Hospital in Baltimore.
Dr. Manol Jovani

Confounding, as the name implies, is the recognition that crude associations may not reflect reality, but may instead be the result of outside factors. To illustrate, imagine that you want to study whether smoking increases the risk of death (in statistical terms, smoking is the exposure, and death is the outcome). You follow 5,000 people who smoke and 5,000 people who do not smoke for 10 years. At the end of the follow-up you find that about 40% of nonsmokers died, compared with only 10% of smokers. What do you conclude? At face value it would seem that smoking prevents death. However, before reaching this conclusion you might want to look at other factors. A look at the dataset shows that the average baseline age among nonsmokers was 60 years, whereas among smokers was 40 years. Could this be the cause of the results? You repeat the analysis based on strata of age (i.e., you compare smokers who were aged 60-70 years at baseline with nonsmokers who were aged 60-70 years, smokers who were aged 50-60 years with nonsmokers who were aged 50-60 years, and so on). What you find is that, for each category of age, the percentage of death among smokers was higher. Hence, you now reach the opposite conclusion, namely that smoking does increase the risk of death.

What happened? Why the different result? The answer is that, in this case, age was a confounder. What we initially thought was the effect of smoking was, in reality, at least in part, the effect of age. Overall, more deaths occurred among nonsmokers in the first analysis because they were older at baseline. When we compare people with similar age but who differ on smoking status, then the difference in mortality between them is not because of age (they have the same age) but smoking. Thus, in the second analysis we took age into account, or, in statistical terms, we adjusted for age, whereas the first analysis was, in statistical terms, an unadjusted or crude analysis. We should always be aware of studies with only crude results, because they might be biased/misleading.2

In the example above, age is not the only factor that might influence mortality. Alcohol or drug use, cancer or heart disease, body mass index, or physical activity can also influence death, independently of smoking. How to adjust for all these factors? We cannot do stratified analyses as we did above, because the strata would be too many. The solution is to do a multivariable regression analysis. This is a statistical tool to adjust for multiple factors (or variables) at the same time. When we adjust for all these factors, we are comparing the effect of smoking in people who are the same with regard to all these factors but who differ on smoking status. In statistical terms, we study the effect of smoking, keeping everything else constant. In this way we “isolate” the effect of smoking on death by taking into account all other factors, or, in statistical terms, we study the effect of smoking independently of other factors.

How many factors should be included in a multivariable analysis? As a general rule, the more the better, to reduce confounding. However, the number of variables to include in a regression model is limited by the sample size. The general rule of thumb is that, for every 10 events (for dichotomous outcomes) or 10 people (for continuous outcomes), we can add one variable in the model. If we add more variables than that, then in statistical terms the model becomes overfitted (i.e., it gives results that are specific to that dataset, but may not be applicable to other datasets). Overfitted models can be as biased/misleading as crude models.3

 

 


What are we to do about other factors that may affect mortality independently of smoking (e.g., diet), but which are not found in our dataset? Unfortunately, nothing. Since we do not have that information, we cannot adjust for it. In this case, diet is in statistical terms an unmeasured confounder. Unfortunately, in all observational studies there is always at least some degree of unmeasured confounding, because there may be many factors that can influence the outcome (and the exposure) which are not part of the dataset. While some statistical tools have been developed to estimate unmeasured confounding, and therefore interpret the results in its light, unmeasured confounding remains one of the major limitations of observational studies.4

Randomized, controlled trials (RCTs) on the other side do not have this problem in theory. With properly designed RCTs, all confounders, both measured and unmeasured, will be balanced between the two groups. For example, imagine an RCT where some patients are randomized to take drug A or drug B. Because patients are randomly allocated to one group or the other, it is assumed that all other factors are also randomly distributed. Hence, the two groups should be equal to each other with respect to all other factors except our active intervention, namely the type of drug they are taking (A or B). For this reason, in RCTs there is no need to adjust for multiple factors with a multivariable regression analysis, and crude unadjusted results can be presented as unbiased.

There is however a caveat. What happens if one patient who was randomized to take drug A takes drug B instead? Should she still be counted in analysis under drug A (as randomized) or under drug B (as she took it)? The usual practice is to do this and present both. In the first case, we will have the intention-to-treat (ITT) analysis, and in the second case, the per-protocol analysis (PPA). The advantage of the ITT is that it keeps the strength of randomization, namely the balancing of confounders, and therefore can present unbiased results. The advantage of the PPA is that it measures what was actually done in reality. However, in this case there is a departure from the original randomization, and hence there is the possibility of introducing confounding, because now patients are not randomly allocated to one treatment or the other. The larger the departure from randomization, the more probable the introduction of bias/confounding. For example, what if patients with more severe disease took drug A, even though they were randomized to take drug B? That will have an influence the outcome. For this reason, outcomes of the ITT analysis are considered the main results of RCTs, because PPA results can be confounded.

In summary, when reading studies, do not simply accept the results as they are presented, but rather ask yourself: “Could they be confounded by other factors, and therefore be unreliable? What steps did the authors take to reduce confounding? If they presented only crude analyses, and this was not justified by a RCT design, do they recognize it as a major limitation?” There are many nuances in every paper that can be appreciated only through a careful reading of the methods section. Hopefully, this article can shed some light on these issues and help the readers to not be confounded.
 

References

1. The P value: What to make of it? A simple guide for the uninitiated. GI and Hepatology News. 2019 Sep 23. https://www.mdedge.com/gihepnews/article/208601/mixed-topics/p-value-what-make-it-simple-guide-uninitiated

2. VanderWeele TJ et al. Ann Stat. 2013 Feb;41(1):196-220.

3. Concato J et al. Ann Intern Med. 1993 Feb 1;118(3):201-10.

4. VanderWeele TJ et al. Ann Intern Med. 2017 Aug 15;167(4):268-74.

Dr. Jovani is a therapeutic endoscopy fellow in the division of gastroenterology and hepatology at Johns Hopkins Hospital, Baltimore.

Publications
Publications
Topics
Article Type
Sections
Disallow All Ads
Content Gating
No Gating (article Unlocked/Free)
Alternative CME
Disqus Comments
Default
Use ProPublica
Hide sidebar & use full width
render the right sidebar.
Conference Recap Checkbox
Not Conference Recap
Clinical Edge
Display the Slideshow in this Article