Type 2 error occurs when a researcher fails to reject a false null hypothesis.

The type II error (beta) is the probability of inappropriately accepting the null hypothesis (no difference in treatment effect) when a true difference in outcome exists.

From: Clinical and Translational Science, 2009

Type I and Type II Error

Alesha E. Doan, in Encyclopedia of Social Measurement, 2005

Conclusion: The Trade-off between the Two Errors

Type I and type II errors present unique problems to a researcher. Unfortunately, there is not a cure-all solution for preventing either error; moreover, reducing the probability of one of the errors increases the probability of committing the other type of error. Although a researcher can take several measures to lower type I error, or alternatively, a type II error, empirical research always contains an element of uncertainty, which means that neither type of error can be completely avoided.

Type I error has historically been the primary concern for researchers. In the presence of a type I error, statistical significance becomes attributed to findings when in reality no effect exists. Researchers are generally adverse to committing this type of error; consequently, they tend to take a conservative approach, preferring to err on the side of committing a type II error. The major drawback to exclusively emphasizing type I error over type II error is simply overlooking interesting findings. Typically, once statistical relationships are discovered, more studies follow that confirm, build upon, or challenge the original findings. In other words, scientific research is cumulative; therefore, false positives are revealed in subsequent studies. Unfortunately, in the presence of a type II error, the line of inquiry is often discarded, because in most fields of research, a premium is placed on statistically significant results. If a type II error has been committed and that particular line of inquiry is not pursued further, the scientific community may miss valuable information.

Ultimately, the scientist must decide which type of error is more problematic to his or her research. Essentially, the investigator is confronted with the question of what type of error is more costly. The answer to this question depends on the purpose of the research as well as the potential implications in the presence of a false positive (type I error) or false negative (type II error) findings.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0123693985001109

Brain Imaging

Eileanoir B. Johnson, Sarah Gregory, in Progress in Molecular Biology and Translational Science, 2019

3.7.3 Correction for multiple comparisons

The results of statistical analyses are susceptible to both Type I and Type II errors. A Type I error refers to the incorrect rejection of a true null hypothesis (a false positive). A Type II error is the acceptance of the null hypothesis when a true effect is present (a false negative). The more statistical comparisons performed in a given analysis, the more likely a Type I or Type II error is to occur. While an understanding of these two scenarios is necessary for all researchers undertaking statistical analysis, the nature of neuroimaging analyses and the volume of statistical comparisons conducted during many types of neuroimaging analyses mean that these errors are more likely to occur than within other fields148,149 (Lindquist and Mejia, 2015; Hupé, 2015). To account for this, statistical adjustments can be made to correct for multiple comparisons. This is done by adjusting a statistical threshold dependent on the number of comparisons being performed, while the Bonferroni correction is the most widely known148 (Lindquist and Mejia, 2015), within neuroimaging correction for family-wise error rate (FWE) and correction for false discovery rate (FDR) are the two most widely used methods.

FWE refers to the likelihood of making one or more Type I errors when conducting multiple tests (a family of tests), it includes the Bonferroni method. Correcting for FWE aims to limit the risk of a Type I error occurring in any of the statistical tests being performed, but is often too stringent, thus increasing the risk of Type II errors. In comparison, FDR controls for the number of Type I errors across all significant results, aiming to reduce the number of false positives only within the subset of voxels found to be significant. The choice between FWE and FDR is often dependent on the software used, since many software tools include one or the other as a default option to control for multiple comparisons.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/S1877117319300638

Data Quality Control

Carl A. Anderson, in Analysis of Complex Disease Association Studies, 2011

Post-analysis QC

Even the most stringent QC protocol will not eliminate all type-1 and type-2 error, so care is still needed when interpreting association signals. Intensity data should be manually inspected for genotype clustering errors prior to designing replication studies, which ideally should utilize a different genotyping platform to that used in the GWA study. Cluster plots should be checked on a per cohort basis (for example, at a single SNP the cluster plot for the case genotypes should be compared to that in the controls). Markers that are inconsistently called should be removed from further analysis and not taken forward for replication.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123751423100070

Design and Conduct of Clinical Trials for Breast Cancer

V. Suzanne Klimberg, ... Thomas Wells, in The Breast (Fifth Edition), 2018

Simon's Minimax Design

Simon's minimax two-stage design8 minimizes maximum sample size subject to type I and type II error probability constraints. The following values provide an example of Simon's minimax design:

Clinically uninteresting level = 5% response rate

Clinically interesting level = 20% response rate

Type I error (α) = 0.05

Type II error (β) = 0.20

Power = 1 − type II error = 0.80

Stage I: Reject the drug if the response rate is ≤0/13

Stage II: Reject the drug if the response rate is ≤3/27

In this example, the first stage consists of 13 patients. If no responses are seen in the first 13 patients, the trial is terminated. Otherwise, accrual continues to a total of 27 patients. If there are at least four responses in the total of 27 patients, the trial may move to a phase III study. The average sample size is 19.8, and the probability of early termination is 51% for a drug with a response rate of 5% (low activity).

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780323359559000271

Introduction to Clinical Trial Statistics

Richard Chin, Bruce Y. Lee, in Principles and Practice of Clinical Trial Medicine, 2008

3.6.2 Type I and II Errors

When hypothesis testing arrives at the wrong conclusions, two types of errors can result: Type I and Type II errors (Table 3.4). Incorrectly rejecting the null hypothesis is a Type I error, and incorrectly failing to reject a null hypothesis is a Type II error. In general, Type II errors are more serious than Type I errors; seeing an effect when there isn't one (e.g., believing an ineffectual drug works) is worse than missing an effect (e.g., an effective drug fails a clinical trial). But this is not always the case. One of the major decisions before conducting a clinical study is choosing a significance level. As seen in Table 3.5, changing the significance level affects the Type I error rate (α), which is the probability of a Type I error, and the Type II error rate (β), which is the probability of a Type II error, in an opposite manner. In other words, you have to decide whether you are willing to tolerate more Type I or Type II errors. Type II errors may be more tolerable when studying interventions that will meet an urgent and unmet need.

TABLE 3.4. Hypothesis Testing Errors

Null hypothesis trueNull hypothesis false
Reject null hypothesis Type I error Correct
Fail to reject null hypothesis Correct Type II error

TABLE 3.5. The Significance Level and Error Rates

Type I error rateType II error rate
(α)(β)
Higher significance level (e.g., p < 0.05)
Lower significance level (e.g., p < 0.01)

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012373695600003X

Challenging Behavior

Laura Lee McIntyre, in International Review of Research in Developmental Disabilities, 2013

2.7.4 Power, Moderators and Mediators

Related to sample size is the issue of power to detect significant treatment effects. Power is influenced by type I and type II error, sample size, and the magnitude of treatment effects (Cohen, 1992). Thus, when the sample size is small, power to detect small to medium treatment effects is compromised. When power is low, it is virtually impossible to conduct analyses that examine moderators and mechanisms of treatment outcomes. Larger scale studies that are adequately powered can explore variables predictive of treatment outcomes as well as explore variables that account for changes in dependent measures. For example, it could be hypothesized that parent training interventions are effective in reducing children’s challenging behavior because of changes in parenting and parent–child interactions. Parenting behavior is assumed to mediate, or partially mediate, the relation between treatment and outcome, yet these assumptions are not often empirically tested in the IDD parent training literature. Likewise, studies that examine subgroup differential outcomes or explore variables predictive of treatment outcome (moderational analyses) are rare in the IDD literature. Although studies of treatment moderators and mediators are in the IDD parent training literature, three exceptions are worth highlighting. Two papers from the RUPP PT studies report on moderators and mediators of treatment outcomes for children with PDD and challenging behavior. Farmer et al. (2012) reported on the original sample (described by Aman et al., 2009) and conducted moderational analyses examining demographic variables and noncompliance on treatment outcomes. Farmer et al. found that more noncompliance at baseline, regardless of treatment group assigned (medication alone vs combined medication and parent training), predicted better outcomes posttreatment. Scahill et al. (2012) reported on the original RUPP PT sample and investigated the effects of treatment on children’s adaptive behavior. Scahill et al. concluded that a reduction in children’s challenging behavior promotes improvements in adaptive behavior. Although not tested in a meditational model, this relation could be investigated empirically in a larger sample with great power. Hudson, Reece, Cameron, and Matthews (2009) reported on moderational analyses using a subsample of participants with complete pre- and posttreatment data from the large-scale implementation trial (Hudson et al., 2008). Child gender, age, and disability diagnosis were explored as possible moderators of treatment outcomes. Results from moderational analyses suggest main effects for the Signposts parent training intervention but no significant moderators of treatment (Hudson et al., 2009). Studies examining moderators and mediators are important in enhancing our knowledge of intervention effectiveness for subgroups and for enhancing our understanding of underlying mechanisms of treatment outcomes.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124016620000087

Barbara C. Tilley, Yuko Palesch, in Stroke (Fourth Edition), 2004

Design of Phase II Trials

Sample size for Phase II trials can be computed through the use of standard methods for one-sided tests with modification to the type I and type II error. In Phase II trials, the null hypothesis is that the treatment equals some minimal acceptable success measure or maximum acceptable failure measure, a single number derived from historical data (proportion or mean). The alternative hypothesis is that the treatment is worse than the historical control rate.33 Based on study results, if the null hypothesis is rejected by the trial investigator, the therapy will not be studied further. If the null hypothesis is not rejected, the therapy is carried forward into a Phase III trial. Thus, false-positive treatments are tested further in a Phase III study, in which the treatment effect is scrutinized with smaller error probabilities at the expense of a larger sample size. One-tailed alpha (chance of calling effective treatment ineffective) and beta (chance of missing an ineffective treatment) levels in the range of 0.10 to 0.15 are recommended.32

In December 1993, the Alteplase ThromboLysis for Acute Noninterventional Therapy in Ischemic Stroke (ATLANTIS) Part B Study33 began to evaluate the efficacy of giving intravenous recombinant t-PA (rt-PA), or alteplase, for ischemic stroke 3 to 5 hours after symptom onset. The primary favorable outcome measure was a National Institutes of Health (NIH) Stroke Scale (NIHSS) score of 0 or 1 at 90 days after stroke. Thirty-two percent of participants receiving placebo and 34% of participants receiving alteplase had favorable outcome at 3 months (P = .65). The estimated sample size for the Phase III trial was 968, but only 613 participants were enrolled. The data and safety monitoring committee terminated participant enrollment in the trial for lack of efficacy.

We used the effect sizes from the original trial, with one-sided alpha = 0.10 and beta = 0.15, to estimate the required sample size for a Phase II trial of 169 participants. We estimated the sample size for this two-stage Phase II design, allowing one look at the data halfway through the study, through the use of EAST 2000 (Cytel Software Corporation) with an O'Brien-Fleming stopping boundary,34 as described later (see “Interim Analysis”). Under this Phase II design, after enrollment of 85 consecutively treated participants, the study would continue if the number of positive outcomes (NIHSS score ≤1 at 90 days after stroke) was 29 or higher. In the Phase III trial, the observed number of successes in the first 85 consecutively treated patients was 27. If we used these Phase III data as if they were from a Phase II trial, the same conclusion would have been reached as was reached in the Phase III trial, but with 85 rather than 613 participants.35 Broderick and colleagues36 completed a Phase II trial of combined intravenous and intra-arterial t-PA using the design just described and determined continuing to Phase III was worthwhile.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B0443066000500699

Critical appraisal

William Lee, Matthew Hotopf, in Core Psychiatry (Third Edition), 2012

How does it fit in with the rest of the literature?

In any literature, differences in findings between studies are inevitable. This should not be seen as a problem, or even necessarily requiring explanation beyond the issues of Type 1 and Type 2 errors described above. It is expected and normal for well-conducted studies with the same aims and methodologies to both miss true findings and detect false ones. In this case, one would expect to see the estimates of effect the studies produce to be grouped around a single ‘best guess’ value.

In reality, however, one is often faced with studies with different sample sizes and methodologies in different settings. If the body of the literature is pointing in the same direction, then the message can be clear, but if studies disagree, then sense needs to be made of it. First, the powerful biases in case–control studies can produce spurious results (Mayes et al 1988) and the results of only the very best conducted case–control study should be considered of equivalent value to a cohort study. Further, no single example of observational research should be considered when compared to a well-conducted randomized controlled trial.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780702033971000100

Hypothesis Testing

Laura Lee Johnson, ... Pamela A. Shaw, in Principles and Practice of Clinical Research (Fourth Edition), 2018

Abstract

Several key statistical concepts are fundamental not only for hypothesis tests but also for most statistical analyses that arise in clinical studies. Commonly used terms, such as critical values, p-values, and type I and type II errors are defined. We summarize examples of hypothesis testing for the one-sample and two-sample settings and consider methods for dichotomous (binomial) data and continuous data (modeled by the normal distribution, also known as the bell curve). Throughout the chapter as we introduce new ideas we illustrate them with examples from clinical research studies. We sum up the chapter with a list of common mistakes, misconceptions, and a few special considerations.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128499054000241

Biostatistics—Part II

Dr.Jennifer Elder Ph.D., in Clinical Trials (Second Edition), 2016

XV Writing a Sample Size Section of a Clinical Study Protocol

Every Clinical Study Protocol should include some justification of the sample size utilized in the study. The justification should include the following information: the primary endpoint, the null and alternative hypotheses being tested, the test statistic, the type I and type II error rates, estimates of treatment effect and/or variability, and any other information that would be helpful should someone need to reproduce the calculation. An example of a sample size justification which could be included in a Protocol is as follows.

For this phase III, randomized, multicenter study of HIV-infected subjects, the null hypothesis (H0) is that the proportion of subjects who have HIV-1 RNA below the lower limit of detection at 48 weeks when taking a drug candidate added to standard of care is the same as the proportion of subjects who have HIV-1 RNA below the lower limit of detection at 48 weeks when taking standard of care alone. From review of the available literature, the standard of care will have 80% of subjects with HIV-1 RNA below the lower limit of detection at 48 weeks. Based on the results of the recently completed phase II study, it is expected that this proportion will be 90% in subjects receiving the drug candidate added to standard of care. Based on these assumptions and using a z-test statistic, 325 subjects per group (650 total subjects) in the ITT analysis is sufficient to detect a 10% difference in the proportion of subjects with HIV-1 RNA below the lower limit of detection at 48 weeks with 95% power and a 5% type I error rate.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128042175000102

Does type 2 error reject null hypothesis?

Understanding a Type II Error A type II error does not reject the null hypothesis, even though the alternative hypothesis is the true state of nature. In other words, a false finding is accepted as true. A type II error can be reduced by making more stringent criteria for rejecting a null hypothesis (H0).

When can a type 2 error occur?

A type II error occurs when a false null hypothesis is accepted, also known as a false negative. This error rejects the alternative hypothesis, even though it is not a chance occurence.

What causes a type 2 error to occur?

Type II error is mainly caused by the statistical power of a test being low. A Type II error will occur if the statistical test is not powerful enough. The size of the sample can also lead to a Type I error because the outcome of the test will be affected.

What is a Type 2 hypothesis error?

A Type II error means not rejecting the null hypothesis when it's actually false. This is not quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell you whether to reject the null hypothesis.