General Considerations for Reviewing Research
There's a tendency on the part of most people (lay person and professionals alike), to blindly accept the so called "conclusions" of scientific research. In today's fast paced media world, soundbites and headlines rule the day. We're constantly bombarded with the results of scientific research and the supposed implications in our lives.
Unfortunately, what usually passes as science in today's world is far from scientific. This is particularly true in the field of exercise. Most research studies performed in the field of exercise are largely invalid due to poor or non-existent controls, faulty design, inadequate supervision and flawed measuring devices. In addition, researcher bias inevitably creeps in and taints conclusions. The drive for publicity, the "publish or perish" syndrome, and competition for research grants causes researchers to "rush to judgment" and make wide sweeping conclusions that the evidence does not support. In the mean time, the public is bounced back and forth like a ping pong ball with contradictory "facts" and advice.
True scientists seek the truth -- whatever the cost -- even if the truth means they end up with egg on their faces or jeopardize their grants. Exhaustive steps are taken to prevent personal bias from clouding their view. True scientists are skeptics -- they question everything. For them to accept the results of research and a new theory or scientific "law" or "principle established, the weight of the evidence must be very great indeed. True scientists welcome criticism of their work -- even applaud it. We do not see this in the exercise research field.
The information contained in this article is merely a start to helping you better understand how to evaluate research studies and recognize both good and bad research. Also, this information will help you detect erroneous conclusions and flaws so bad research will not mislead you.
Clinical vs Laboratory vs Epidemiological Research
Clinical research is not technically "research" in the classic sense. Rather it is simply "observation" and "reporting" of the results of a course of treatment that has occurred in the clinical setting. Typically, there is no "control group", and no controls have been set up to isolate a particular variable. A group of people receive treatment and are observed for what occurs after. While this type of study can be very valuable, less reliable conclusions can be drawn because steps were not taken to rule out other possible reasons for the observed results or lack thereof.
In contrast, laboratory research is designed and carried out in such a way so as to specifically study the cause and effect relationship, if any, of a particular treatment or course of action. Numerous steps are taken to isolate a particular variable and rule out the influence of other undesired variables. A control group which receives no treatment, serves as a standard for comparative purposes of the effects of no intervention or treatment. Strictly controlled laboratory research is the only research type capable of accurately identifying true "cause and effect" relationships.
Epidemiological research is the study of the incidence and prevalence of a specific disease in a population or group. No actual research is conducted. Instead, a population is studied and data collected, and statistical analyses performed, for the purpose of determining similarities between members of the affected group. Researchers then attempt to find an "association" between a certain action, behavior, lifestyle, food, drug, etc, and the disease being studied.
Association vs Cause and Effect
Perhaps the most serious and frequent error made in the interpretation of research of any kind is the confusion and misinterpretation of an apparent relationship between two things or events -- this being "association" between two things, and an actual "cause and effect" relationship where "A" directly causes "B". The failure to make the distinction between association and cause has directly led to a multitude or erroneous conclusions and incorrect practices.
Strictly speaking, it is impossible to prove causality from an association, regardless of the strength of association. If there is a relationship (association) between two things or events, A and B, this relationship may be of four kinds:
1. A causes B
2. B causes A
3. A and B share a common cause (collateral)
4. A and B are associated by random chance (coincidence)
Therefore, simply because two things or events occur together, this tells us nothing of whether one caused the other, vice versa, or whether a separate thing caused both, or even whether the connection is strictly a fluke. Furthermore, no amount of additional data reinforcing, strengthening, and supporting the association can prove a cause between the two. Proof of a cause is only obtainable by direct, controlled experiment.
Study Size
The number of subjects assigned to various treatment and/or control groups in a study has significant implications for the ability to draw conclusions from the data collected. A large sample size is necessary in order to assure that an adequate cross section of the population at large is being studied. In addition, small sample sizes do not allow for accurate statistical analysis and extrapolation of results to the population as a whole. The larger the sample size, the more confident we can be regarding the meaning of the data and any conclusions drawn from it. Generally speaking, a minimum sample size of 25 subjects is necessary for meaningful data analysis. However, even this minimum standard is suspect.
Many exercise related studies are performed using far too few subjects (less than the 25 minimum standard), to draw any meaningful conclusions from the data. Therefore, this flaw must be considered when interpreting the results of these studies.
Length of Study
Particularly in health/fitness related studies, the length of time of the intervention and overall study length can dramatically effect the outcome of the study. Many physiological adaptations occur slowly over time. If the study is too short, the changes that occur over a short time period may not be measurable, or be so slight as to be considered insignificant. This may lead researchers to conclude that the intervention did not produce results when in fact it had.
Additionally, when comparing the relative effects of two different types of treatment or protocols, the effects may seem similar over the short term. However, when charted over longer periods of time, one protocol or treatment may actually be shown to be significantly more effective.
Subject Demographics
Another important consideration is the characteristics of the sample being studied. For example untrained, sedentary individuals possess the potential for more rapid improvement in physical fitness than subjects who have been exercising consistently for years and who are already physically fit. Untrained subjects will typically respond to nearly any treatment that gets them up and moving because nearly any physical activity represents an overload stimulus to them. Conversely, the threshold level of stimulation required to cause changes in a highly trained subject is much higher. Therefore, what works with previously sedentary subjects may not be effective with highly fit subjects.
Examples of other considerations relating to subject characteristics are: men vs. women, genetically superior athletes vs. genetically average non-athletes, young subjects vs. older subjects, and motivated and/or supervised subjects vs. unmotivated and/or unsupervised subjects.
Obviously it would not be appropriate to compare the muscle building results of genetically superior athletes to that of a group of genetically average non-athletes. Furthermore, treatments that work well with younger individuals may not be suitable for frail, elderly subjects.
Method of Measurement
The manner in which the data was collected and the measuring tools used to acquire the data are two of the most important considerations when evaluating a research study. Due to the incredibly complex nature of human physiology and the problems surrounding invasive measuring techniques, many physiologic events and adaptations simply cannot be measured directly. Researchers are often forced to measure an indirect effect or "marker" of other physiologic effects, and then make assumptions as to the meaning of these indirect measurements relative to the studied effect.
Compounding these problems is the unfortunate fact that many so called "measuring devices" are in and of themselves incapable of accurately measuring the variable they purport to measure. These devices allow outside influence from other factors, which in turn produces false or misleading measurements. The data collected with these devices cannot be trusted or relied upon. An analogy to this would be, attempting to measure a subjects body weight with a scale that is inaccurate. You may step on the scale and obtain a "number", but if the scale was not accurate this number cannot be considered your true body weight.
In studies where changes in muscular strength are documented and compared, it is extremely important to have accurate and reliable measurements of pre and post strength. There exists numerous factors which can effect muscular performance and measurement. One rep max testing and/or comparison of pre and post "training weights" from the various exercises performed, are not valid methods of assessing muscular strength. These assessments fail to account for or control numerous factors that will influence performance other than true strength changes (i.e., skill, subject experience, bodily proportions, motor learning variables, and subjects capacity at the outset). Therefore, studies that use these methods of strength measurement are invalid from the outset because the method of data collection is invalid.
Presently, the only scientifically valid, reliable, and accurate method of muscular strength measurement is multiple position, isometric testing performed with a MedX® computerized strength testing machine. Machines are currently available to test the strength of the following muscle groups: muscles that "extend" the knee (quadriceps); muscles that "flex" the knee (hamstrings); muscles that "extend" the lumbar spine (erector spinae group); muscle that "rotate" the torso; muscles that "extend" the cervical spine; and the muscles that "rotate" the head.
Supervision
Particularly in exercise studies, supervision of subjects being exercised is extremely critical. Most subjects are highly unmotivated to exert themselves, to say nothing of poor motor skills and inability to perform said exercise properly. Unless subjects are very closely supervised and monitored during their training (application of the treatment), it is unlikely that subjects will train intensely enough or with proper technique to produce optimal results.
Most exercise studies, if they are supervised at all, are supervised by undergraduate and graduate level college students of questionable supervisory and/or instructional abilities. Others involve "self reporting" where the subjects go home and exercise on their own and simply report to the researchers what they did.
Another point to be made with regard to supervision and data collection is that many times studies are nearly entirely performed, supervised, and data collected, by students. The teacher/professor authoring the study sometimes has had very little, if anything, to do with the actual operation of the study. He/she relies on the information submitted by the students, and merely "writes up" the paper and submits it for publication in a research journal. Obviously, subjects supervised by and data collected by inexperienced students brings considerable suspicion to the reliability and accuracy of the study.