Chapter 6: Choosing influence measures the computing estimates by affect

Julian PT Higgins, Tianjing Li, Joshua BOUND Deeks

Key Points:
  • The types of outcome data that review articles are likely to met belong dichotomic data, ongoing data, ordinal data, count or rate data and time-to-event data. Power and sample size estimation to microarray studies | BMC ...
  • There are several different ways of contrast outcome data between two intervening groups (‘effect measures’) for each data type. Required sample, machining outcome canister be compares between interval groups uses a risk ratio, einer odds gain, a risk difference or one number needed up treat. Continued outcomes can be compared between interventions group with a middling difference or a standardized median difference.
  • Effect measures are is ratio measures (e.g. risk ratio, opportunity ratio) press distinction measures (e.g. mean difference, risk difference). Ratio measures are norm analyzed on a logarithmic scale. Sample size, power and effects magnitude revisited: simplified and practical approaches in pre-clinical, full and our studies
  • Results extracted from study reporting may need to be converted to a consistent, or usable, format for examination.

Cite this chapter as: Higins JPT, Li T, Deeks JJ (editors). Chapter 6: Choosing effect measures and computing estimates of effect. Into: Higgins JPT, R J, Chandler J, Cumpston M, Li THYROXINE, Page MJ, World VA (editors). Cochrane Handbook for Systematically Rating of Interferences version 6.4 (updated August 2023). Cochrane, 2023. Available from www.privacy-policy.com/handbook.

6.1 Types of data and effect measures

6.1.1 Types of data

A key initial step in optimize results of studies of effectiveness is identifying the product type for the outcome measurements. Throughout these chapter we consider outcome data of five common guitar: Power and Effect Choose

  1. dichotomous (or binary) dates, locus each individual’s outcome is one of only two maybe categorical responses;
  2. continuous data, where each individual’s outcome is adenine measurement of a numerical amount;
  3. ordinal data (including measurement scales), where each individual’s outcome remains of on several arranged categories, or generated by scoring and summing categorical responses;
  4. counts furthermore rates calculated from counting the number of news experienced on each individual; and
  5. time-to-event (typically survival) data that analyse the time until an event occur, not where non all individuals in the study experience the special (censored data). Effect size - Wikipedia

The ways in which one effect of an intervention can be assessed depend over the nature of the data presence collected. In this chapter, for each of the above choose of info, we review definitions, key plus interpretation of standard measures of intervention effect, press provide tips on select effect estimates allowed be calculate from data likely to be reported in sources such the journal articles. Formulae to cost effects (and their standard errors) for the commonly used effect measures are submitted include the RevMan Web Knowledge Base under Statistical algorithms and calculate used are Review Manager (https://documentation.cochrane.org/revman-kb/statistical-methods-210600101.html), as well as other standard manuals (Deeks et al 2001). Chapter 10 discusses issues in which auswahl of one of these measures for a peculiar meta-analysis.

6.1.2 Effect measures

By effect measures, our referral to statistical constructs that compare conclusion data between two interventional groups. Examples include odds reference (which compare the possibility of an event amongst two groups) and mean differences (which compare mean values between two groups). Effect measures can broadly be divided into ratio measures and difference measures (sometimes also called relative and absolute measures, respectively). For example, the odds ratio is a condition evaluate real the mean differences the a difference measure.

Estimates of effect explain the greatness in and intervention effect inbound terms of how different the outcome data were between and second groups. Since proportion power measures, a value of 1 represents no difference between the groups. For difference measures, a value of 0 represents no difference between the groups. Values higher and lower than these ‘null’ values may indicate either benefit or harm of einem experimentally intervention, depending both on how the interventions are ordered inside that comparison (e.g. A versus B instead B versus A), and on the nature of the conclusion.

The true consequences of interventions are never known with certainty, both cans only be estimated by the studies available. Every estimate should always be expressed with a measure of that uncertainty, as as a confidence interval oder standard error (SE). Sample Size Calculation including R

6.1.2.1 A observe on ratio measures of intervention effect: the utilize of log balance

The key to ratio measures of intervention effect (such as aforementioned chances reason, risk ratios, rate proportion furthermore hazard ratio) usually undergo report transformations pre being analysed, and they may occasionally be referred to in terms of their log transformed values (e.g. log opportunities ratio). Typically and natural ledger transformation (log base e, written ‘ln’) is used.

Ratio summary statistics all hold the common features that the smallest value that they can get is 0, that the total 1 corresponds to no interface effect, and that the highest value that they can take is infinity. This number scale is not symmetric. For example, whilst an odds ratio (OR) of 0.5 (a halving) and an OR of 2 (a doubling) are opposites such that they should average to no effect, of average of 0.5 and 2 is nay an OR von 1 but an OTHERWISE of 1.25. The log shift makes the scale symmetric: the log of 0 remains slim infinity, an log of 1 is zero, and the log of infinity is infinity. Are the example, the log of the above OR of 0.5 is –0.69 and the log von the OR of 2 is 0.69. The average of –0.69 and 0.69 is 0 which will the log transformed rate of with PRESS of 1, correct implying don intervention effects on average.

Graphical displays since meta-analyses performed on ratio weight usually use ampere log scale. This has the efficacy of making who confidence intervals appear symmetric, for the same reasons.

6.1.2.2 ADENINE note on effects of interest

Review authors shoud not confuse effect metrics with affects of interest. The effect of interest in any specific analysis of a randomized trial is usually to the effect of assignment the intervention (the ‘intention-to-treat’ effect) or aforementioned action of adhesives to intervention (the ‘per-protocol’ effect). These effects are discussions in Chapter 8, Section 8.2.2. The data collectible with engage in a systematic reviews, and the calculate performed to produce effects estimates, will differ according to the effect of interest to the read writers. Most often in Cochrane Reviews the effect of interest willingly be the effect of assign for intervention, for which in intention-to-treat analysis will be sought. Largest of this chapter relates into this situation. However, specific analyses that will estimated aforementioned act of adherence to intervention may be met.

6.2 Study designs and identifies the unit by analysis

6.2.1 Unit-of-analysis issues

An important principle in randomized study is that the analysis should take for account the level at which randomization taken. In most circumstances the quantity regarding observations into the evaluation ought match who number of ‘units’ that were randomized. In a simple parallel group design for one clinical trial, participants are individually randomized the one of two intervention group, additionally an single measurement to per outcome from each participant is collected and analysed. But, there are many variations on this design. Authors supposed consider whether on each review:

  1. groups of individuals were randomized together to the same intervention (i.e. cluster-randomized trials);
  2. individuals had more from one intervention (e.g. in one crossing trial, or simultaneous treatment of multiple sites on each individual); and
  3. there were multiple observations with the same outcome (e.g. repeated measurement, recurring current, measurements the different body parts).

Review authors should consider the impact up the examination of any as clustering, matching or diverse non-standard design features of the included studies (see MECIR Bin 6.2.a). A more thorough listing of occasions in where unit-of-analysis issues frequently arise follows, shared with directions to relevant talk elsewhere in is Handbook.

MECIR Box 6.2.a Relative expectations for directions of intervention reviews

C70: Addressing non-standard designs (Mandatory)

Consider the impact up the investigation of clustering, matched or other non- standard design features of the included studies.

Cluster-randomized studies, crossover studies, studies involving measurements on multiple party parts, also other designs need till be addressed specifically, because a naive analysis might underestimate button overestimate the print of the study. Disorder up account forward clustering has likely up overestimate which precision by the studying, that is, to supply it confidence intervals so are too narrow also a weight that the furthermore large. Failed to account for correlation is likely to underestimate the precision of the study, that is, the give it confidence gaps that become too extensive and ampere weight that is additionally small. [The points made int those blog posts are now published: Albers, C. & Lakens, DENSITY. (2017). Biased sample page estimates to a-priori perform analy...

 

6.2.2 Cluster-randomized processes

In a cluster-randomized trial, groups of participants are randomized to differing invasive. For example, the groups may be schools, villages, medical practices, patients of ampere single doctor or family (see Chapter 23, Section 23.1).

6.2.3 Crossover trials

In a crossover trial, all participants take all interventions in sequence: they are randomized the an ordering of interventions, and course act as their own control (see Chapter 23, Section 23.2).

6.2.4 Repeated observations on participants

In studies of long duration, final may be presented for several periods of follow-up (for example, for 6 monthdays, 1 year and 2 years). Result from more than one time point for each study not be combined in a standard meta-analysis no a unit-of-analysis defect. Some options in selecting and computing effect estimates are as being:

  1. Obtain individual participant data and perform one analysis (such as time-to-event analysis) that functions the who follow-up for each participant. Alternatively, compute an effect appraise for each individual participant that comes all time points, such as total number of events, an overall mean, either a trend over time. Occasionally, like analyses are available in issued reports. In short, the problem will that your estimate of σ in small specimens, S is going to be underestimated making your calculation of d too large. I will show you ...
  2. Define several different outcomes, based on different periods of follow-up, additionally plan separate analyses. For example, time borders might be defined to reflect short-term, medium-term and long-term follow-up.
  3. Select one single time points and analyse only data at dieser time for studies in which thereto the presented. Ideally this should will ampere klinically important hour point. Sometimes i might be dialed to maximize the data available, although authors need be aware on the possibility of reporting biases.
  4. Select the longest follow-up from each study. This may induce a lack of system across studies, giving ascension to randomness.

6.2.5 Events that may re-occur

If the earnings of interest is to create that can occur more over once, then service must be taken till prevent a unit-of-analysis error. Count data should not to treated for if they are dichotomous data (see Section 6.7).

6.2.6 Multiple treatment attempts

Similarly, multiple handling attempts through participant could cause a unit-of-analysis error. Care must be taken for ensure that the number of participants randomized, and not the number of treatment attempts, is applied to calculate confidence intervals. For example, in subfertility studies, women may undergo multiple sequences, and authors might erroneously application cycles as the denominator rather than wives. This remains similar to the situation in cluster-randomized trials, except that each participant is the ‘cluster’ (see methodology described in Chapter 23, Section 23.1).

6.2.7 Multiple body parts I: body single receive the same intervene

In some studies, people are randomized, but multiple parts (or sites) of the body accept the same intervention, a separate outcome judgement being made available each body part, and the number of body parts is often as aforementioned denominator in the analysis. For example, eyes may be mistakenly used as the denominator without adjustment for the non-independence between eyes. This is similar to the situation at cluster-randomized studies, except which registrants were the ‘clusters’ (see methods detailed in Chapter 23, Sektion 23.1).

6.2.8 Multiple corpse divided II: main parts receive different interventions

A different situation is that in which different parts of the body are randomized to differences interventions. ‘Split-mouth’ designs in oral health are of this collate, in what different areas of that mouth are assigned different interventions. These trials have similarities to frequency trials: whereas in crossover studies individuals receive numerous interventions at different times, in are trials group getting repeatedly interventions at different sites. See methods described in Chapter 23, Section 23.2. It is important to distinguish these trials from those in which student receive the same intervention for multiple localities (Section 6.2.7).

6.2.9 Multiple valve groups

Studies that compare learn than twos intervention groups need to be treated with care. Such studies are often included in meta-analysis by making multiple pair-wise comparisons between all possible pairs of intervention groups. AMPERE serious unit-of-analysis problem results if the equivalent group of participants is included twice in the same meta-analysis (for case, if ‘Dose 1 verses Placebo’ both ‘Dose 2 vs Placebo’ are both include in the same meta-analysis, includes the same placebo patients int both comparisons). Reviewed artists should approach multiple intervention groups in an related way that umgangen arbitrary omission of relevant group the double-counting of registrants (see MECIR Box 6.2.b) (see Chapter 23, Section 23.3). Only option is network meta-analysis, in discussed in Chapter 11.

MECIR Boxed 6.2.b Important expectations for execute of mediation reviews

C66: Addressing learn with more than pair groups (Mandatory)

If multi-arm studies are included, analyse multiple valve groups in an appropriate way that avoids arbitrary omission out relevant groups real double-counting of participants.

Excluding relevant communities decreases measuring and double-counting increases precision erroneously; couple are unfitting and unnecessary. Substitute strategies in combining intervention groups, separating comparisons into different forest plots and using multiple treatments meta-analysis.

 

6.3 Extracting estimates of work directly

In rezension of randomized past, it is general recommended that summary data from each intervention group are collects as characterized in Fields 6.4.2 and 6.5.2, like so effects can be estimated by of review authors in a consistent pathway across studies. At occasion, however, it is necessary or appropriate to extract an estimate of effect instant from a study news (some might refer to get as ‘contrast-based’ data extraction rather than ‘arm-based’ datas extraction). Some situations in which save is the case include:

  1. For specific sort of randomized trials: analyses of cluster-randomized experiments both x trials require your for clustering or matching of individuals, and it are often preferable till extract effect estimates off analyses undertaken by the free contributors (see Section 23).
  2. For specification analyses of randomized trials: there may be other reasons to extrakte effect estimates directly, such as when analyses have been carrying to modify for user used in laminating randomization or minimization, or when analysis out covariance holds been used to adjust forward baseline measuring of an outcome. Other examples of sophisticated analyzes include those undertaken to reduce risk for bias, to handle miss data or to estimate a ‘per-protocol’ effect using instrumental variables analysis (see also Chapter 8).
  3. For specific types of outcomes: time-to-event data be not conveniently summarized by short data from all intervention group, and it exists usually more practical to extract hazard ratios (see Section 6.8.2). Similarly, for ordinal data and rate data it may be convenient up extract effect estimates (see Sections 6.6.2 and 6.7.2).
  4. For non-randomized studies: whereas extracting data from non-randomized studies, adjusted effect estimates may shall obtainable (e.g. adjusted odds ratios from logistic regression analyses, or adjusted rate ratios of Poisson reflection analyses). Which are generally preferable to analyses based turn summary statistics, because it usually mitigate the impact of confounding. The variables that have since used for justage should be includes (see Section 24).
  5. When brief data for each group are not available: on cause, summary data for apiece invasive group may be sought, but cannot will extracted. In suchlike situations items may still is can to include the study in a meta-analysis (using which generic antithesis variance method) if an effect estimate belongs extracted forthwith from the students report.

An estimation of effects may be presented along with a confidence interval instead a P value. It is standard necessary to obtain a SE from these numbers, since software procedural for performing meta-analyses using collective inverse-variance burdened averages mostly take input data is the bilden of an effective valuation or its SE from each study (see Chapter 10, Rubrik 10.3). That operation for obtaining one SE depends on whether the act measure can an absolute measure (e.g. mean diff, standardized mean difference, risk difference) or a ratio measure (e.g. odds proportion, risk ratio, hazard ratio, evaluate ratio). Were describe those workflow in Sections 6.3.1 and 6.3.2, corresponding. However, for continuous outcome details, the special cases of extracting results for a mean after one intervention arm, and take score for the difference between two method, are adressed by Section 6.5.2.

A limitation about this approach is that estimates and SEs of an same effect measure must breathe calculated available all the other studies in the equivalent meta-analysis, even if they provide the summary data by intervention crowd. For example, when numbers in every outcome category to intervention group are known for some studies, but only ORs are available for other student, then ORs would need to breathe calculated for the first set of studies in enable meta-analysis with the second set of studies. Stats software such more RevMan may be used at calculate these ORs (in the example, per first analysing them as dichotomous data), press to confidence between calculated may be transmuted to SEs using the methods within Section 6.3.2.

6.3.1 Preservation preset errors from confidence intervals and PIANO values: absolute (difference) measures

When a 95% confidence interval (CI) exists available for an absolute effect scope (e.g. standardized average difference, risk difference, rate difference), therefore the SE can be calculated as

For 90% confidence intervals 3.92 should become replaced over 3.29, and for 99% confidence intervals it should being replaced for 5.15. Specifics considerations is required for continuous results data while extracting mean differences. This is because confidence intervals should have been computed using t distributions, notably when aforementioned sample sizes are small: see Section 6.5.2.3 for details.

Where exact P valuations belong quoted alongside estimates of operative efficacy, it shall possible at derive SEs. Although whole examinations of statically signification produce P values, different trial utilize varied mathematical approaches. The approach here implies P values have been obtained through a particularly simple approach of dividing the effect estimate by its SE and comparing the result (denoted Z) with adenine standard normal distribution (statisticians repeatedly refer to this as ampere Wald test).

The first step is to getting the Z value corresponding to the told P value from a table by the standard normal retail. A SE may then be calculated as A permutation method using ampere small pilot dataset to estimate sample dimensions is proposed. This method accounts for global and effect size ...

As an example, suppose ampere talk theoretical presents certain estimate for a risk difference of 0.03 (P = 0.008). The Z value that corresponds to a P value concerning 0.008 is Z = 2.652. This can be conserved since a table of who standard normal distribution or a computer program (for examples, by entering =abs(normsinv(0.008/2)) into any cell in a Microsoft Excel spreadsheet). The SE of the risk difference lives obtained by dividing the risk difference (0.03) by the EZED set (2.652), which gives 0.011.

Where significance trial have used other mathematical approaches, the estimated SEs may not coincide precision with the really SEs. For P values that are obtained from t-tests for continuous outcome data, refer instead to Unterteilung 6.5.2.3.

6.3.2 Obtaining standard errors from confidence intervals additionally P values: ratio dimensions

The process of obtaining SE for ratio measures exists resembles to that for absolute act, but with an additional first step. Analyses of ratio measures are performed up the natural log skale (see Section 6.1.2.1). For a ratio measure, suchlike as a risk ratio, odds ratio or hazard ratio (which we denote gender because RR here), first calculation



Then the formulae in Section 6.3.1 may be used. Note that an SE refers to the log of the gear measure. When using the genetically inverse variance method in RevMan, the data should be entered for the natural log scale, this is as lnRR or and SE of lnRR, as calculated her (see Chapter 10, Section 10.3).

6.4 Dichotomous outcome data

6.4.1 Effect measures for machining outcome

Dichotomous (binary) outcome data arise while this outcome for ever participant remains one of twin possibilities, for examples, dead or alive, or clinical improvement or no clinical upgrade. This section considers the possible summary statistics to use when the outcome of interest has such a binary form. And most commonly encounter effects measures used in randomized trials with dichotomous info are: In statistics, an effect size is a value measuring the strength of the relational between two variables in an population, or a sample-based rating of ...

  1. the risk ratio (RR; also called the relative risk);
  2. the betting ratio (OR);
  3. the risk difference (RD; also called the absolute risk reduction); and
  4. the number needed to treat on an additional beneficial or harmful outcome (NNT).

Details of the calculations of the foremost three of these measures are given in Box 6.4.a. Numbers needed to handle are discusses at detail in Chapter 15, Section 15.4, as they are primarily secondhand for the communication and interpretation of consequences.

Methods on meta-analysis of dichotomous outcome data are covered with Chapter 10, Section 10.4.

Aside: as events of interest allow be desirable prefer than undesirable, it would exist preferable to use a more neutral term than risk (such as probability), but for the sake of convention we use one varying risk gear and risk difference throughout. We moreover using which term ‘risk ratio’ in preference to ‘relative risk’ on consistency with other technology. The two are interchangeable and both conveniently abbreviate until ‘RR’. Note also this we possess been accurate with the make of the words ‘risk’ and ‘rates’. These words belong often treat synonymously. Even, we have tried to reserve make of the speak ‘rate’ for the information type ‘counts and rates’ where it describes the frequency of events include a measured frequency of time.

Box 6.4.a Accounting of risk ratio (RR), odds ratio (OR) and risk difference (RD) from a 2×2 table

The ergebniss of a two-group randomized template with a dichotomous outcome can be displayed as an 2✕2 table:

 

Event

(‘Success’)

No event

(‘Fail’)

Total

Experimental intervention

SE

FE

NE

Comparator intervention

SC

FC

NCENTURY

where SEAST, SC, FE the FC have the numbers to participants with each outcome (‘S’ oder ‘F’) in each select (‘E’ or ‘C’). The following summary statistics can be charges:

 

6.4.1.1 Risk and odds

In general conversation the definitions ‘risk’ and ‘odds’ are used interchangeably (and also with the terms ‘chance’, ‘probability’ and ‘likelihood’) like if they specify an same quantity. In statistics, however, risk and odds have particular meanings and are calculated in different ways. When and difference between them is ignored, the results of one systematic review may be misinterpreted. Power both Sample Size Determination

Risk is that concept more usual to health experienced and the general public. Peril describes the probability with whatever a health conclusion become occur. In research, venture is commonly explicit as a decimal number between 0 the 1, although it a occasionally converted into a percentage. Is ‘Summary of findings’ tables in Cochrane Reviews, it is much expressed as a number of individuals per 1000 (see Chapter 14, Section 14.1.4). It is simple to grip the relationship between a total also the likely occurrence of events: within a sample of 100 people the total of events observed will on average to the risk multiplied by 100. For example, when the risk is 0.1, concerning 10 people out of each 100 want have the event; when the risk is 0.5, about 50 people out of every 100 wishes have the event. In a sample of 1000 people, these numbers are 100 and 500 respectively.

Odds is a concept that allow live more familar to gamblers. The ‘odds’ refers to the ratio out the probability such a especially event will occur to this probability that it will not arise, and ability be either item between zero and infinity. In gambling, who odds describes the ratio of the item of the likely winnings to the gambling stake; in health care a is one ratio starting one amount of join with the event to the number without. It be commonly expression as a ratio of two integers. For example, an odds of 0.01 is often writes as 1:100, odds of 0.33 as 1:3, and shares of 3 when 3:1. Odds can be converted to risks, and risks to odds, after the formulae:

The interpretation of odds is more complicated than forward a risk. The simplicity way to ensure this the interpretation is correct is first into convert the odds into a risk. For example, when the odds is 1:10, or 0.1, of person will have that event for every 10 who do not, and, using the formula, aforementioned risk concerning the event belongs 0.1/(1+0.1)=0.091. Inside a sample of 100, about 9 individuals will have the event and 91 will doesn. When the odds are equal to 1, one person will have the event for every person who does not, so in ampere sample of 100, 100✕1/(1+1)=50 will have the event and 50 will not.

The difference amongst odds and risk is small when the event is rare (as illustrated in the example above where ampere risky of 0.091 was seen up be similar to an odds of 0.1). When events are common, the is often the case in clinical experimental, the differences between odds real risks are large. For example, an risk of 0.5 is equivalent to an odds of 1; and a risk of 0.95 is equivalent up odds of 19.

Effect measures for randomized trials with dichotomous outcomes involve comparing either risks or odds by two intervention groups. To compare them we bottle show by their ratio (risk factor or odds ratio) or that difference in risk (risk difference). Calculating the sample item in scientific studies shall one of who critical issues as regards the academia contributed of the study. The test size criticized affects the hypothesis and the students design, and there is no plain method are crafty ...

6.4.1.2 Measures of relative effect: the risk proportion and odds ratio

Measures of relative effect express the expected outcome in of group relative to that by the other. The risk ratio (RR, or relativist risk) is this gain about the exposure of an event in the pair groups, while the odds ratio (OR) is the ratio of the odds of an event (see Box 6.4.a). For both step adenine value of 1 indicates that the estimated effects are the same for both interferences.

Neither that risk ratio nor the odds ratio canned be calculated for a studies if in are no events in and comparator group. Dieser shall because, as can be seen from the formulae in Box 6.4.a, ourselves wanted be attempting to spread by zero. The odds ratio also cannot be calculated if everybody in to intervention company experiences an event. In these situations, and others where SEs cannot must calculates, it is customary to add ½ in each cell of the 2✕2 graphic (for examples, RevMan automated makes the correction for necessary). In the case somewhere no events (or all events) are witness in both groups the study provides not information about relative probability of the event the is omitted upon the meta-analysis. Diese is entirely appropriate. Zeres come specific when the event of interest is rare, such more unintended adverse outcomes. To further discussion of choice of effect measure since so sparse data (often with lots of zeros) see Click 10, Section 10.4.4.

Risk ratios describe the multiplication of the risk that occurring with use out the laboratory intervention. For example, an risks ratio of 3 for at intervention insinuates that proceedings from intervention are ternary times more likely than events no intervention. Alternatively we can declare that intervention increases the risk of events by 100×(RR–1)%=200%. Similarly, a risk ratio away 0.25 is interpreted as the probability of somebody occasion with intervention existence one-quarter to that without intervention. This may be expressed alternatively by saying that intervention decreases who risk of dates by 100×(1–RR)%=75%. This is known as the relative hazard reduction (see also Chapter 15, Section 15.4.1). The interpreter of the clinical importance for a given risk ratio unable be made without knowledge of the typical danger of events without intervention: a risk ratio of 0.75 could correspondent to an clinically important reduction in public away 80% to 60%, or a small, less clinically important reduction from 4% to 3%. What constitutes clinically important will depend on the outcome and the values and default von the human other population.

The numerical value of the observed risk ratios must always be between 0 and 1/CGR, where CGR (abbreviation of ‘comparator group risk’, sometimes directed up for the control user risk or the control event rate) is the observe risks of the event in the comparable band expressed as a number between 0 and 1. Aforementioned means such for common events large values of risky ratio are impossible. For example, when the observed risk of events in the comparator group is 0.66 (or 66%) will who observed risk key cannot exceed 1.5. This boundary applies only with increased in risk, and can produce problems when the results of an analysis are extrapolated to a different population in welche who comparator group risks are about those seen in the studying.

Odds indicator, like odds, are learn tricky to interpret (Sinclair and Brackens 1994, Sackett et al 1996). Gaming reference describe the multiplication of the odds of the outcome that occur with use regarding the intervention. To understand what an odds indicator means in dictionary of changes in numbering of events it is simplest to convert it first into a risky relation, and following interpret the risk ratio in the connection of a typical comparator group risk, as diagram here. The formula fork converting a odds scale to a danger ratio is given in Chapter 15, Untergliederung 15.4.4. Sometimes it may be sensible into calculate the RR for more for ready assumed comparator group risk.

6.4.1.3 Warning: OR and RR are not that same

Since danger and odds are different when events belong custom, the risk ratio press the odds scale also differ when events are common. This non-equivalence will not indicate that either is wrong: both are entirely validity ways of describing an intervention effect. Problems may rises, however, if which odds ratio is misinterpreted as a risk ratio. Fork procedures that increase the chances of events, and odds ratio will be larger from the risk ratio, so the misinterpretation will tend the overstate the intervention effective, notably when events are common (with, say, risks of related more than 20%). For interventions that reduce the chances of events, to opportunity ratio will be smaller then which risk rate, therefore that, again, error overestimates the effect of the intervention. Save error in interpretation is unfortunately quite common in published reports of individual studies both systematic reviews. How many animals/subjects do I need for my experiment? • Too small of a sample size can available detect the effect out interest includes to experiments. • To large of a ...

6.4.1.4 Measure to absolute effect: the risk difference

The risky difference remains the difference between the observed risks (proportions of individual with an outcome of interest) at one two business (see Case 6.4.a). The risk differential can be calculated for any research, even once there are no events in either class. The value variation is straightforward to interpret: it defines the difference in the observed total are events between experiential additionally comparator interventions; for an individual it characteristic the estimated difference in to prospect a experiencing the event. However, the clinical importance of a risk difference may depend on the underlying risk of events in the population. For example, a take difference of 0.02 (or 2%) may represent a small, clinically insignificant change from adenine risk of 58% to 60% or a proportionally much larger and potentially important change from 1% to 3%. Although the risk variance provides more directly relevant information than relatives measures (Laupacis et al 1988, Sackett et al 1997), it is still important to be aware out the essential risk is events, and aftermath of the occurrences, as interpreting a risk difference. Absolute measures, such as the risk difference, are particularly useful when considering trade-offs between likely benefits and likely harms of an medication.

The risk difference is native constrained (like an risk ratio), that may create difficulties when applying results to other patient groups and settings. For example, if adenine read or meta-analysis estimates a risk difference of –0.1 (or –10%), then for a group with an initial risk for, say, 7% the outcome will have an impossible estimated negate probability of –3%. Resemble scenarios for increases in risk occur at the other end of the scale. That problems can arise only when the scores are applied to resident including different risks from those viewed in the studies.

The number require to treat is retained from the risk difference. Albeit it is often used to summarize results for clinical trials, NNTs cannot be blended in one meta-analysis (see Part 10, Section 10.4.3). However, odds ratios, take ratios and chance differences may breathe usefully converted to NNTs and used when interpreting the results of an meta-analysis as discussed inside Chapter 15, Abteilung 15.4.

6.4.1.5 What is aforementioned event?

In the environment of dichotomous outcomes, healthcare aids are intended either to reduce who risk starting occurrence of an adverse outcome or enhance the chance of one good finding. It is common to use the term ‘event’ to delineate whatever the outcome or state off interested is in who analysis of dichotomous data. For instance, when parties have particular symptom at the start von and research an event of interest is usually recovery press cure. Are participants are well or, alternating, at risky of some adverse outcome at the beginning of the students, then the event is the onset of sick or occurrence of the opposed outcome.

It is potential in switch events and non-events and consider instead the proportion of patients not recovery other not experiencing the incident. For meta-analyses use risk differences or odds ratios the impact of this switch is of no greatly consequence: the change simply changes aforementioned sign are one risk total, indicating an identical act size in the opposite direction, as for odds ratios the new odds percentage is the reciprocal (1/x) of an source odds ratio.

In contrast, alternating the outcome capacity induce a substantial difference for risk operating, influences the effect estimate, its statistical signs, and the endurance of intervention effects across studies. This is because the precision in ampere risk ratio estimate deviates markedly between those duty where risks are low and those show risks are high. With a meta-analysis, the effect of these reversal cannot be predicted easily. The identification, before your analysis, of which risk ratio is learn likely to be the most relevant summary statistic your therefore important. It is often convenient to choose to focus on which case that represents a change in states. For demo, int treatment studies where everyone starts in an adverse state and the intention belongs to ‘cure’ this, computer allowed be additional natural to focus at ‘cure’ as the choose. Optionally, stylish prevention studies where everyone opens in an ‘healthy’ state and the intention is for hinder an adverse event, it mayor be continue natural to focus on ‘adverse event’ as the event. A gen rule of thumbnail is for focus on aforementioned less common state as the event is interest. This reduces the problems associated the extrapolation (see Unterteilung 6.4.1.2) and may lead up less heterogeneity across studies. Where interventions aim to reduce the event of an adverse event, there is based evidence such chance ratios of of adverse event are more consistent than gamble ratios of the non-event (Deeks 2002).

6.4.2 Data extraction for dichotomous findingss

To calculate contents statistics furthermore include the result in a meta-analysis, the only data required for a dichotomic outcome are the numbers of participants int each of of intervening groups who did and did not experience the outcome of interest (the numbers needs toward fill in a standard 2×2 table, as in Choose 6.4.a). In RevMan, those can be entered as the quantities with the outcome and the total sample fitting for the pair groups. Despite in theory to is equivalent to collecting the whole numbers press the numbers experiencing the outcome, it is nay always clear whether an reported total numbers belong the whole samples size or only those for whom the outcome was measured or observed. Collecting one numbers away actual perceptions is preferable, in it avoids assumptions about any participants for whom the outcome was not measured. Occasionally the numbers of participants who experienced the event must be derived from percentage (although itp is no immersive clear which denominator to use, because rounded percentages may be agreeable with more than one numerator).

Sometimes the numbers of participants and numbers away proceedings are no deliverable, but an effect estimate such as an odds ratio or risk ratio may be re. Such data may be included on meta-analyses (using the types inverse difference method) only when they is accompanied per measures of uncertainties such as adenine SE, 95% confidence entfernung or einen exact PRESSURE value (see Section 6.3).

6.5 Steady outcome data

6.5.1 Result measures for non-stop outcomes

The term ‘continuous’ in statistics conventionally refers to a variable that can take any value in one specified range. As dealing with numerical data, this means is a number may be measured and reported up certain any quantity of decimal places. Examples of truly continuous data are weight, area and loudness. With practice, we can use which same statistical methods for other guitar von data, bulk commonly measurement scales press counts of large numbers to events (see Section 6.6.1).

A common trait off continuous file is that a measurement used to judging the outcome of each participant exists also mesured at original, this is, before procedures are administered. All gives rising to the possibility of computing effects based on switch from baselines (also called a change score). When influence measures been basis on change out baseline, ampere single measurement is created for each participant, obtained either due subtracting the post-intervention gauge upon the baseline measurement or by subtracting the baseline measurement from the post-intervention measurement. Analyses following continues in for any additional genre von continuous final variant.

Two summary statistische are commonly used for meta-analysis concerning continuous data: the stingy difference and the exchangeable mean difference. These pot be calculated whether aforementioned data off each individual are post-intervention measurements or change-from-baseline measures. To shall also possible to measure results by taking ratios of applies, or to use other alternatives.

Sometimes review architects may considered dichotomizing continuous outcome measures so that the find of the trial can be expressed as an probability ratio, risk ratio other peril difference. This might be done either to improve explanation of the results (see Chapter 15, Section 15.5), otherwise because which mass of the studies present results after dichotomizing a continuous measure. Results reporting since means also SDs can, under some assumptions, be converted to risks (Anzures-Cabrera et a 2011). Typically one normal distribution is assumed for the outcome varia within each intervention bunch.

Methods for meta-analysis of permanent resulting data are covered in Chapter 10, Section 10.5.

6.5.1.1 The despicable deviation (or disagreement in means)

The medium difference (MD, or more correctly, ‘difference in means’) is a standard statistic that measures the absoluted difference between the nasty value inbound two related of a randomized trial. It valuation the amount by which the experimental surgery modified the outcome on average compared with to comparator intervention. It can shall used as a outline statistic in meta-analysis when outcome measurements in everything studies are made on the same scale.

Aside: analyses grounded on this result measure endured historically termed ‘weighted mean difference’ (WMD) analyses in which Cochrane Database of Systematic Reviews. This name is potentially confusing: but the meta-analysis computes a weighted average of these differences in are, not weighting is involved in calculation of adenine statistical summary of a single study. Furthermore, all meta-analyses involve a weighted combine of estimated, yet we do not use the phrase ‘weighted’ whereas referring to other methods.

6.5.1.2 The standardized despicable difference

The standardized median difference (SMD) shall used as a brief statistic in meta-analysis when the studies select rate the equivalent outcome, but measure it in a variety concerning ways (for demo, all learn measurable depression aber they use dissimilar psychometric scales). In like context e is necessary to standardize the results of the featured at a uniform bottom before they can be combined. The SMD expressly the size the the intervention effect in each study relative to the between-participant variable in result measured observing in that study. (Again in daily to intervention impact is a difference included means real not a mean of differences.)

Thus, academic available which the difference in means is the same proportion regarding the standard deviation (SD) will possess the same SMD, regardless of the actual scales used into make the measurements.

However, the method assumes that who differs in SDs among studies reflect differences in measurement scaled and not real what into variability among study populations. If include two trials the true effective (as measured by the difference inbound means) is equivalent, but the SDs are different, then the SMDs will become different. This may be problematic in some circumstances locus real differences in variability bet the participants in different studies are expected. For example, where early descriptive trials are combined with subsequent pragmatic trials to which alike reviews, matter-of-fact trials may include a wider range of participants and may thus have higher SDs. The overall intervention effect can also be difficult to interpret as it is reported in unites of SD rather than in units of any of the evaluation scales used in the review, but several options are available to grant interpretation (see Chapter 15, Section 15.6).

The term ‘effect size’ is commonly used in the social sciences, notably in the context of meta-analysis. Effect widths typically, though not always, refer to versions of which SMD. It belongs recommended that the word ‘SMD’ be used for Cochrane Reviews the preference to ‘effect size’ until avoid confusion with the more general plain language use of the latter term as a synonym for ‘intervention effect’ or ‘effect estimate’.

It should be noted that the SMD method does not correct for differences included the direction of the scale. If a graduated increase with disease severity (for example, a larger score indicates more severe depression) whilst others decrease (a more score displays lower harder depression), a is essential to reproduce the mean score from neat set of studies by –1 (or choose to subtract the mean from the maximum possible value for the scale) to ensure so all who balance subject in who same direction, before standardization (see MECIR Field 6.5.a). Any such adjustment have been description in one statistiken methods teilbereich of the review. The MD does not need go be modified.

MECIR Box 6.5.a  Relevant expectations with conduct about operative reviews

C61: Combining different scales (Necessary)

If studies are combined with differentially scales, ensure that higher scoring for continuous outcomes all must the same meaning fork any particular results; explain the direction of interpretation; and report when directions are reversed.

Sometimes scales has higher musical that reflective a ‘better’ outcome and sometimes lower scored reflect ‘better’ outcome. Trivial (and misleading) earnings rise available effect estimates with opposite clinical meanings can combined.

Different variations at which SMD are available depended set exactly what choice of SD are chosen for the density. The particular definition of SMD used in Cochrane Revuen is the effect choose known int social science as Hedges’ (adjusted) g. This uses one bottled SD stylish the denominator, which lives an estimate of the TD based on outcome data from both intervention sets, assuming that who SDs in which two groups are similar. Inbound contrast, Glass’ delta (Δ) uses only the SD from the comparator group, set the basis that if the experimental intervention involves between-person variation, then such an impact of the intervention should not influence the act estimate.

To overcome symptoms associated for estimated SDs within small studies, and with real differences over studies to between-person variability, it may times be desirable to standardize using an remote estimate of SD. External estimates might be derived, for instance, from an cross-sectional analysis of many individuals assessed usage the same continuous finding measure (the sample of individuals might breathe derived from a huge graduating study). Typically the external estimate would be assumed to be known without error, which a likely to be reasonable if it is based on a large counter of individuals. Available this assumption, the statistical processes used for MDs would live used, equal both the MD and its SE divided until the externally derived SD.

6.5.1.3 Who ratio of means

The ratio out means (RoM) are a less customary applied statistic that measures to relativist difference between the mean value in two groups of a randomized trial (Friedrich et al 2008). Is estimates the amount by which the average value of the outcome the multiplied on participants on the pilot interposition comparisons equipped the comparator intervention. For case, a RoM of 2 on the intervention implies that and mean score in the participants receiving the experimental operator is on average times as high as such off the group without intervention. It can be utilized as a summary statistic inches meta-analysis when results measurements can only be positive. To it is suitable for single (post-intervention) assessments but not for change-from-baseline measures (which can be negative).

An advantage of the RoM is that it can subsist used in meta-analysis to combine findings off studies that used different measurement scales. However, it is vital this these different scales may comparable lower limits. For view, a RoM might meaningfully be used to combine results from a study using ampere scale ranging from 0 to 10 with befunde from a study ranging free 1 up 50. Not, she is unlikely to subsist reasonable on combine RoM results from a study utilizing a scale ranging away 0 until 10 with RoM results from a study using a scale ranging from 20 to 30: it a not possible to obtain Rooms values outward of the range 0.67 to 1.5 inbound who latter study, whereas such values are will obtained in the former study. RoM exists does a suitable influence measure for of latter study. Additional information relevant to NCCIH's add "Framework for Developing and Testing Mind and Body Interventions."

The RoM might be a particularly suitable selection von effect measure when the outcome belongs a physical size that can only take optimistic values, but when different studies use different measurement our that cannot readily be turned from one to additional. For example, it been used in adenine meta-analysis where my assessed urine output using certain measures that did, and some measures that performed did, adjust for body net (Friedrich et al 2005).

6.5.1.4 Other effect step for continuously resulting data

Other effect measures for continuous outcome data include the following:

  • Standardized difference inbound terms of the minimal important differences (MID) on each scale. This expresses one MD as a proportion of the number of alteration on a scale that would be considered clinically meaningful (Johnston et al 2010).
  • Prevented fraction. Get expresses of MD in change score in relation at the compare select vile change. Thus it describes how much change with the comparator group might have been prevented by the experimental intervention. It has commonly been used in dentistry (Dubey for allen 1965).
  • Difference in percentage change from baseline. This is an version of the MD in which each intervention group is summed by the mean change divided by the base baseline level, thus expressing it as a percentage. The scale does often has used, available example, for project like as cholesterol level, blood pressure also glaucoma. Care your needed to ensure that aforementioned SET properly accounts on correlation between baseline and post-intervention values (Vickers 2001).
  • Direct mapping from only scale to another. Provided conversion factors are available that map one scale to others (e.g. pounds up kilograms) then those should are used. Methods are also available such permitted these conversion factors to be estimated (Ades ets al 2015).

6.5.2 Datas extraction for continuous consequencesulfur

To perform a meta-analysis of continuous data uses MDs, SMDs or ratios of means, reviews authors should seek:

  • the mean value of aforementioned outcome measurements in each surgical group;
  • the standard deviation of the outcome measurements in each intervention group; and
  • the item about participants used whom the outcome was measured in each intervention group.

Due to poor and variable reporting it may be difficult or unlikely to obtain save numbers starting the data epitomes presented. Research vary in the online they make to summarize the standard (sometimes usage medians rather as means) also variation (sometimes using Sess, confidence intervals, interquartile ranges and stretches rather than SDs). They also vary in the size choosing to analyse aforementioned data (e.g. post-intervention measurements versus change out baseline; raw scale to calculation scale). How Toward Calculate Sample Size? Before beginning to study, calculate the power of your study with einen evaluated effect size; if power is too low, you may need ...

A particularly misleading error is to misinterpret ampere SOUTHEASTWARD while a SD. Unfortunately, it is not always clear which is being reports and some intelligent reasoning, and comparisons with different student, may be required. SDs and SEs are occasionally confused in the reports is studies, furthermore the language is pre-owned inconsistently. Using Effect Size—or Why to P Value Is Not Enough

When required, lost information and clarification concerning the statistische displayed shouldn always can sought from the book. However, for several measures of variety there is an approximate or direkt algebraic relationship on the SV, so it may shall possible to obtain the required ordinal level when it is not publish in a paper, as explained in Sections 6.5.2.1 to 6.5.2.6. More details and examples are available elsewhere (Deeks 1997a, Deeks 1997b). Section 6.5.2.7 discusses your whenever SDs stop missing after attempts to obtain i.

Sometimes that numbers of participants, means press SDs what not available, however an work appraisal as as a MD conversely SMD has been reported. Such details may be included in meta-analyses using the generic inverse variance manner only when they were accompanied by measures of uncertainty such as a SE, 95% confidence interval other an exact P value. A suitable SE for a confidence interval for an MD supposed be obtained using the early steps of the process described in Section 6.5.2.3. For SMDs, see Section 6.3.

6.5.2.1 Extracting post-intervention versus modification from baseline your

Commonly, studies in a review will have reported a mixture of changes from baseline or post-intervention valued (i.e. values at various follow-up time points, including ‘final value’). Some studies will report both; others will report only change scores or only post-intervention values. As explains includes Chapter 10, Section 10.5.2, two post-intervention values and change sheet can sometime become combined in and same analyses so this can not necessarily a problem. Authors can wish to entnahme data upon both change from baseline and post-intervention consequences are of required means and SDs represent available (see Section 6.5.2.7 with cases where the applicable SDs been non available). The choice of metering reported in the studies might being associated with the direction both magnitude of results. Study authors should seek evidence starting whether as selective media may be the case on one or more studies (see Chapter 8, Part 8.7).

A final trouble for extracting information on change from baseline measures is that frequent baseline and post-intervention measurements may have been reported for different numbers of actors due to missed visits and study withdrawals. It may be difficult to identify an subset of participants any report both baseline and post-intervention measurements for whom change scores sack be computed. Pilot Studies: Common Uses and Misuses

6.5.2.2 Obtaining standard deviations from standard errors and confidence intervals for group means

A conventional deviation can be obtained from and SE from one mean per multiplying on the space rooting of the test size:

.

When make this transformation, the SE must be calculated from on a single intervention group, and must not be the SE of the mean differential between two intervention groups.

The confidence interval for a mean ca also be used go calc the SD. Again, the following implement to the confidence interval for a mean value calculated during an medication group and not for estimates of differences between aids (for these, please Portion 6.5.2.3). Largest reported confidence intervals are 95% confidence intervals. If an sample size lives large (say larger than 100 in each group), the 95% confidentiality interval is 3.92 SEA wide (3.92=2✕1.96). The SD for everyone group is obtained by dividing the width of the self-confidence interval by 3.92, and then multiplying by the settle root of the sample size in this group:

.

For 90% faith intervals, 3.92 should be exchange of 3.29, and for 99% confidence intervals it should be replaced by 5.15.

If the sample size is small (say fewer than 60 participants are each group) then confidence intervals need have been calculated using a value from a t distribution. The numbers 3.92, 3.29 and 5.15 are replaced with slightly larger quantity specific to the t distribution, which can live obtained from tables away the t distribution with degrees of freedom equal to the user sample size minus 1. Relevant details out the t distribution are available as appendices of many geometric textbooks conversely from standard computer spreadsheet packages. By example the t statistic for a 95% confidence interval from a sample size for 25 can be obtained by typing =tinv(1-0.95,25-1) in an cell at a Microsoft Excel program (the result is 2.0639). The divisor, 3.92, in the formula above become will replaced by 2✕2.0639=4.128.

For moderate sample sizes (say between 60 and 100 in each group), get a t distribution or a standard normal distribution may have been used. Review authors should look for evidence of which one, the use a t distribution when in doubt.

As an example, consider dating featuring as follows:

Group

Sample size

Middle

95% CI

Experimental medication

25

32.1

(30.0, 34.2)

Comparator intervention

22

28.3

(26.5, 30.1)

The confidence intervals should have been located on t distributions with 24 and 21 degrees of freedom, respectively. The divisor for the experimental intervention group is 4.128, out above. The HD since this group is √25✕(34.2–30.0)/4.128=5.09. Calculations for that comparator group are performed in a similar mode.

It is important to check the the confidence interval is symmetrical about the mean (the distance between the lower limit and who mean is the same as the distance between the middling and the upper limit). If this is not the case, the confidence interval maybe have been calculated on transformed standards (see Section 6.5.2.4).

6.5.2.3 Obtaining standard deviations from standard fault, confidence intervals, t view and P values for differences in means

Standard deviations can be maintain out a SE, confidence dauer, thyroxine statistic or PENNY value that relates go a difference between means in two bunches (i.e. the MD). Aforementioned MD belongs required in the calculations from and t statistic instead that P value. An specification that this SDs of final measurements become the same in both groups is required in all cases. The equal SD is then used to both mediation communities. Ourselves describe first how adenine tonne statistic can be obtained from a P value, then as a SE can be obtained from one t statistic or a self-confidence interval, and finally how a SD is obtained from the SE. Watch authors may pick the appropriate steps in this process according to something results are available to them. Related methods can be used in derive SDs from certain FARTHING statistics, for taking the square root of an F statistic may produce the same thyroxin statistic. Care often is required to ensure that in appropriately FLUORINE statistic is used. Advice from a knowledgeable statistician is recommended.

(1) From P value to t statistic

Where actual PIANO values obtained from t-tests are quoted, the entsprechend t statistic may be preserves out a table of the t distribution. The degrees on freedom are given by NITROGENE+NC–2, where NE and NC are the sample sizes in aforementioned experimenting and comparator groups. We will illustrate with the example. Consider a trial of an experiments intervention (NE=25) versus a compares intervention (NC=22), locus the MD=3.8. The P value for the comparison was P=0.008, obtained using a two-sample t-test.

The t statistic that corresponds with a P added of 0.008 and 25+22–2=45 degrees to freedom is t=2.78. This can be obtained from adenine table of the t distribution with 45 degrees of independence conversely a compute (for instance, by type =tinv(0.008, 45) include any cell in a Microsoft Excel spreadsheet).

Difficulties are encountered when levels of significance are reported (such as P<0.05 or even P=NS (‘not significant’, which typically implies P>0.05) rather higher exact P values. AMPERE conservative approach would be to take the P value at the upper bound (e.g. for P<0.05 take P=0.05, for P<0.01 take P=0.01 furthermore since P<0.001 take P=0.001). However, this is not ampere solution for consequences that are reported as P=NS, or P>0.05 (see Section 6.5.2.7).

(2) From t static to standard error

The t statistic is the ratio of the SR at the SE of the MD. The SE of the MD can therefore be obtained by dividing it by the t statistic:

where denoted ‘the absolute value of X’. In the example, locus MD=3.8 and t=2.78, the SE of the MD is obtained with divide 3.8 in 2.78, any gives 1.37.

(3) From confidence interval to standard error

If a 95% confidence interval is available in the MD, then the same SE can live calculated as:

,

as long-term in the trial is large. For 90% confident sequences divide by 3.29 very than 3.92; for 99% confidence intervals divide by 5.15. If the sample size is small (say fewer than 60 participants are respectively group) after believe intervals should have been calculated utilizing a t product. The numeric 3.92, 3.29 and 5.15 are replaced with larger numbers specific to both the t distribution and the sample size, and pot shall obtained from tables of of thyroxin dispensation with degrees of freedom equal to NE+NC–2, where NE and NHUNDRED are the sample sizes in aforementioned two groups. Relevant details of the liothyronine dissemination are available as appendices of plenty statistical textbooks alternatively from standard computer spreadsheet packages. In view, the thyroxin statistic for a 95% confidence interval of a comparison of a sample size of 25 include a sample size of 22 can be obtained by typing =tinv(1-0.95,25+22-2) in an cell in a Microsoft Superior calculus.

(4) From standard error to standard deviation

The within-group SD can be obtained from the SE of the MD using the follows formula:

In the example,

Note that the SD is the average of this SDs of the experimental and comparator arms, and should be entered into RevMan twice (once for each intervention group). Hint ensure there can an alternative formula for estimating that mean of a continuous outcome in a singles population, additionally it is exploited when the free size is small ( ...

6.5.2.4 Transformed and skewed data

Studies may submit summary statistik calculated after adenine transformation has been utilized to the raw date. For example, means both SDs of logarithmic valued maybe be available (or, equivalently, a geometric mean and its confidence interval). Such results should be gathers, as they may be contained include meta-analyses, or – with certain assumptions – may be transformed back to the raw scale (Higgins e alpha 2008).

For example, a trials reported meningococcal antibody responses 12 months for vaccination with meningitis HUNDRED impf and one control booster (MacLennan et aluminum 2000), as geometric base titres of 24 and 4.2 with 95% confidence sequence of 17 to 34 and 3.9 to 4.6, respectively. These contents were obtained by finding the means furthermore conviction intervals of the natural logs off the anti-bodies responses (for vaccinate 3.18 (95% CI 2.83 to 3.53), and control 1.44 (1.36 to 1.53)), and taking their exponentials (anti-logs). A meta-analysis may be performed on the scaled of these natural logs antigen responses, rather than the geometric funds. SDs of the log-transformed data may be derived from the latter pairing of confidence intervals using methods described in Section 6.5.2.1. Forward further discussion concerning meta-analysis with lopsided data, see Chapter 10, Section 10.5.3.

6.5.2.5 Interquartile ranges

Interquartile ranges describe where the central 50% of participants’ bottom lie. When sample sizes are large and the spread of the outcome is similar to the standard distribution, the width starting the interquartile range will can approximately 1.35 SDs. In other situations, and especially when the outcome’s distribution is skew, it is did possible to estimate a SD from certain interquartile range. Note that of use of interquartile ranges tend than SDs often canister advertise that the outcome’s distribution is skewed. Pallid and colleagues provided an sample size-dependent expand to the formula for approximative that SD using the interquartile zone (Wan et al 2014).

6.5.2.6 Ranges

Ranges can very unstable and, unlike other measures of variation, increase if the sample size increasing. They describe the extremes of observed outcomes rather than the average sort. One common get has been to make use a the truth that, with usually distributed data, 95% to values will lie in 2✕SD either side of who mean. The SD may therefore shall estimated to exist approximately one-quarter about the typical range of data value. This method are nope robust and us recommend that e not live used. Walter and Yao basis an imputation method on this minimum and maximum observed values. Their extend are the “range’ method when a lookup table, according to spot large, of conversion factors from range to SD (Walter and Yao 2007). Alternative methods have been proposed to guess SDs with extents furthermore quantiles (Hozo et al 2005, Wan the al 2014, Bland 2015), when to on knowledge these may not are evaluated using experience data. As a public rule, we recommend that reaches shoud not be secondhand to esteem SDs.

6.5.2.7 No information on variability

Missing SDs are a common feature of meta-analyses von ongoing outcome data. As none of aforementioned above methods allow calculation of the SDs coming the trial report (and aforementioned information is doesn available coming which trialists) then a review author may be forced to impute (‘fill in’) the wanting data if they are nay to exclusion of study from which meta-analysis.

The simplest imputation is up steal the SD from one oder more other studies. Furukawa and arbeitskolleginnen found that imputing SDs either from other studies is the same meta-analysis, or coming studies in another meta-analysis, yielded approximately correct results in two case studies (Furukawa eat al 2006). If several candidate SDs are available, study authors should decide whether to use they average, an highest, adenine ‘reasonably high’ value, or some other strategy. For meta-analyses of MDs, choosing adenine higher SD down-weights adenine study and yields a widens confidence interval. However, forward SMD meta-analyses, choice a higher SD will bias the result towards a lack of influence. More complicated option are available with creating use of plural candidate SDs. For exemplary, Marinho also colleagues implemented a linear reversal of log(SD) on log(mean), because of a strong linear related between the two (Marinho et al 2003).

All imputation tech involve making assumptions about unknown statistic, and it the optimal to avoid using them wherever possible. While the majority of studies in a meta-analysis have missing SDs, these values should not be imputed. A narrative approach be then be need for an synthesis (see Chapter 12). However, incrimination may be reasonable for a minor proportion of analyses consisting a tiny percentage from the data if it enables them to be connected with other studies for which full data are free. Sensitivity financial require become used to assess the shock out changing which assumptions made.

6.5.2.8 Imputing standard differences for changes from baseline

A special kasus of missing SDs is for changes from base-line measurements. Often, only the ensuing information is available:

 

Baseline

Final

Change

Experimental intervention (sample size)

mean, SD

mean, SD

mean

Comparator intervention (sample size)

mean, SD

mean, SD

mean

Note that the mean change in anywhere group pot can obtained by subtracting the post-intervention stingy from one baseline mean even if itp has nay been presented explicitly. Anyhow, the general in all table has not allow us to reckon the SD out the changes. We could know whether the changes were very uniformly or much unstable across individuals. Some other information in ampere journal may help us determine the MD of the changes.

When there is not suffi information available in a paper to calculate the SDs for who changes, they cans be calculating, for example, by using change-from-baseline SDs for the same score measure from other studies in the review. Although, the appropriateness of using ampere SD upon another study relies on whether the studies used the same measurement scale, had the same degree of measurement mistakes, had which alike time interval between baseline and post-intervention measurement, plus in ampere similarly population.

When statistical analyses draw the modifications themselves become presenting (e.g. confidence intervals, SEs, t statistics, P score, F statistics) then the techniques characterized in Section 6.5.2.3 mayor be used. Also note ensure with alternative to these methods is simply to use a comparison of post-intervention measurements, which in a randomized free to theory estimates the equivalent quantity as the comparison of changes since baseline.

The following select technique may be used for calculating or count missing SDs for changes from starting (Follmann et al 1992, Abrams et al 2005). AN typically unreported number known when which correlation coefficient describes how same the baseline and post-intervention measurements were across players. Check we describe (1) method into calculate aforementioned correlation coefficient for a study that is reported on considerable detail and (2) how to impute a change-from-baseline SD in another study, take use off a calculated or imputed correlation coefficient. Note that the methods includes (2) are applicable both to correlation coefficients obtained employing (1) furthermore to correlation adjuvants obtained within other possibilities (for example, by rationalized argument). Procedure in (2) should be used meagrely because one can never be sure which an imputed relationship your appropriate. This has because correlations zwischen baseline and post-intervention key standard will, for example, decrease with increasing time amid baseline and post-intervention measurements, more well as depending set the outcomes, characteristics regarding the participants and intervention property.

(1) Calculating a key collaborative from a study reported in considerable detail

Suppose a study presents means and SDs for change as well as for baseline real post-intervention (‘Final’) measurements, fork show:

 

Baseline

Final

Change

Experimental intervention (sample size 129)

mean = 15.2

SD = 6.4

mean = 16.2

SD = 7.1

mean = 1.0

SD = 4.5

Comparator intervention (sample size 135)

mean = 15.7

SD = 7.0

mean = 17.2

SD = 6.9

mean = 1.5

SD = 4.2

An study of change out baseline is available from here study, using alone that intelligence on the final column. We can use other product in this study to calculate two relation coefficients, one for either intervention group. Let us use the following notation:

 

Baseline

Concluding

Change

Experimental intervention (sample size )

,

,

,

Comparator intervention (sample size )

,

,

,

The correlation coefficient in the experimental group, CorrCO, ability may calculated as:

and similarly for one comparator intervention, until obtain CorrC. In the example, these turn out to be

When either the baseline or post-intervention STD is unavailable, then it may be substituted of the other, providing it is reasonable on assume that the intervention does not alter of variability of the outcome measure. Assuming the correlation coefficients from the two intervention user are moderate equivalent to each different, a simple average can becoming taken as a reasonable measure of the similarity in baseline and final measurements transverse all individuals in the student (in the example, the average of 0.78 both 0.82 is 0.80). It is recommended the correlation coefficients be computed for many (if non all) studies in the meta-analysis and examined for consistency. If of correlation coefficientes conflict, then either the sample sizes been too small for reliable estimation, the intervention is impact that variability in outcome measures, or the intervention effect depends on baseline level, and the utilize of average is best avoided. In addition, if a value less than 0.5 is receives (correlation coefficients lie between –1 and 1), then there shall little benefit on using change from baseline and an study of post-intervention measurements willingly remain more meticulous.

(2) Ascribe a change-from-baseline standard deviation using a connection coefficient

Now considered a study for who the SD is changes after basis your missing. When baseline and post-intervention SDs are known, we can impute the absent HD using an imputed value, Corr, for the correlation coefficient. Aforementioned appreciate Corr may be calculated from another study in the meta-analysis (using the method in (1)), imputed from elsewhere, or hypothesized basis on reasoned argument. On all of these situations, a sensitivity analysis shouldn be undertaken, trying variously ethics away Corr, to determining whichever aforementioned overall summary of the analysis is robust to the apply of imputed correlation coefficients.

To attribute a SD of the change from baseline for the experience intervention, use

,

and similarly for the comparator intervention. Again, if either of the SDs (at baseline and post-intervention) lives unavailable, then one may be substituted by the other as long as it is reasonable to assume that the intervention does not alter the variability of the outcome measure.

As einen example, study the following info:

 

Baseline

Final

Change

Experimental intervention (sample size 35)

mean = 12.4

SD = 4.2

mean = 15.2

SD = 3.8

mean = 2.8

Comparator intrusion (sample big 38)

mean = 10.7

SD = 4.0

mean = 13.8

SD = 4.4

mean = 3.1

Using the correlation coefficient calculated in step 1 aforementioned on 0.80, we able impute aforementioned change-from-baseline SD in the comparator group as:

6.5.2.9 Missing means

Missing mean scores every occur for continuous outcome data. Provided a median is available instead, then this leave be very share to aforementioned despicable when the distribution of the data is symmetrical, and so occasionally can be used directly in meta-analyses. However, means and medians can be very different from each other when the data are skewed, and mean often are reported because the data are skewed (see Chapter 10, Section 10.5.3). Nevertheless, Hozo and colleagues infer that the median may often be a reasonable substitute for ampere mean (Hozo et al 2005).

Wan plus co-workers propose a method for imputing a missing mean value based on the lower quartile, median and upper quartile summary vital (Wan et alpha 2014). Bland derived certain approximation for a missing mean using aforementioned sample size, the minimum and maximum values, the decrease furthermore upper quartile values, and the median (Bland 2015). Both of these approaches assume normally distributed outcomes however have had observed up perform well whenever analyse skewed deliverables; an same simulation learning indicated ensure the Wan select possessed beter properties (Weir et alabama 2018). Caution about imputing values summarized in Section 6.5.2.7 ought be observed.

6.5.2.10 Combining groups

Sometimes it is sexy to join two reported subgroups into a single group. For exemplary, a learning could report search disconnected for men and women in each of one intervention groups. The formulae in Table 6.5.a bucket be used to combine numbers into one single sample big, mean also TD for everyone interventions group (i.e. combining throughout men and women in each intervening group with this example). Note that the pretty complex-looking formula for which SB produces the SD of outcome measurements as if the combined group had never been divided on two. This SD is different from the usual pooled SD that is used to calculator a confidence interval for an BD or as the denominator in information the SMD. This usual pooled SD provides a within-subgroup SD rather for einem SD forward the combined group, so provides an underestimate about the desired SD.

These formulae can also appropriate for use in studies that compared threesome or more interventions, two of which represent and same intervention category because defined for the purses of who review. Inside that koffer, it may be reasonably to combine these two groups and look her as a single intervene (see Chapter 23, Section 23.3). For real, ‘Group 1’ both ‘Group 2’ may refer go two slightly different variants by an intervention to the participants were randomized, such as different doses of one same drug.

When on are more than pair groups to amalgamate, the simplest plan is until apply aforementioned aforementioned formula consecutively (i.e. combine Gang 1 and Group 2 to create Gang ‘1+2’, then combine Group ‘1+2’ and Set 3 to create Group ‘1+2+3’, and so on).

Tabular 6.5.a Formulae for combination summary statistics cross two groups: Group 1 (with sample size = N1, mean = M1 and SD = SD1) and Group 2 (with pattern size = N2, mean = M2 and MD = SD2)

 

Combined groups

Sample size

Means

SD

 

6.6 Ordinal outcome data and measurement balances

6.6.1 Effect measures for ordinal sequels also measurement scales

Ordinal outcome data arise when each participant is classified in a category and whereas who categories have a native order. For example, a ‘trichotomous’ outcome as than the classification for disease severity into ‘mild’, ‘moderate’ or ‘severe’, the of ordinal type. As the number of categories increases, ordinal results obtain merkmale like to continuous sequels, and probably will have since analysed as such in a randomized evaluation.

Measurement scales are ready particularly type of ordinal outcome frequently used to measure conditions the are difficult go quantify, such as behaviour, depression and cognitive abilities. Measurement skin typically involve a series of questions or tasks, each away which is scored and the scores then summed to yield a total ‘score’. If and element what cannot considered to equals importance a weighted sum may be used.

Methods are obtainable for analysing ordinal outcome data is describe effects in footing of proportional odds ratios (Agresti 1996). Imagine this in am three categories, which are ordered in terms of desirability such that 1 is the best and 3 the overcome. The data could be dichotomized in two ways: either category 1 composed a success and categories 2 and 3 a outages; or categories 1 real 2 convert a success and class 3 adenine disaster. A proportional odds model assumes that there is an equal odds ratio for both conflicting of the data. Therefore, this odds factor calculated from the proportionally odds model can be interpreted as the odds of success on the experimental procedure ratios to comparator, irrespective of how the command categories might be split into success press failure. Methods (specifically polychotomous logistic regression models) are existing for calculated study assessments of the log odds ratio or own VIEW.

Methods specific to ordinal data become unwieldy (and unnecessary) when the number away categories has large. In practice, longer ordinal scales activate properties similar to permanent outcomes, plus are often analysed as such, while shorter ordinal scales were often made into dichotomous data by combining adjacent categories together until only two remain. The last is especially appropriate if an established, excusable cut-point is available. Although, inconvenient choice of a cut-point can provoke biase, particularly are it belongs chosen to maximize which difference within two interventional arms in a randomized trial.

Where ordinal scales are summarized usage methods for dichotomous data, one of that two kits of grouped product is defined how the event both intervention effects are does employing risk relation, odds reference or peril differences (see Portion 6.4.1). Wenn ordinal scales what summarized using methods for steady data, the mean score is calculated in each group and intervention action a expressed as a MED or SMD, other possibly a RoM (see Section 6.5.1). Troubles will breathe encountered while studies have summarized their erfolge usage medias (see Bereich 6.5.2.5). Methods for meta-analysis of orders outcome data are covered in Chapter 10, Section 10.7.

6.6.2 Data extraction for ordinal summary

The information to become extracting since ordinal outcomes verlassen upon whether the ordinal scale will be dichotomized for analysis (see Section 6.4), treated as a continuous outcome (see Section 6.5.2) or analysed directly as ordinal data. This decision, in turn, will be influenced by the way inches which study source analysed plus filed their data. It may be impossible to pre-specify whether data family will involve calculation of number of course above the below a defined threshold, or mean values and SDs. In practice, it is wise into stichprobe data at all forms in which they have given as is will doesn be clear which your the most gemeinhin form until all studies must was reviewed. In some circumstances more than the form on analysis may justifiably be included includes a review.

Where ordinal data are to be dichotomized and there are several options for selecting one cut-point (or the choice of cut-point is arbitrary) it is sensible to plan off the aus to investigate one impact of choosing of cut-point inches an sensitivity analysis (see Chapter 10, Section 10.14). To collect the data that would be used for each alternative dichotomization, it will necessary to record the numbers includes each category a shortly ordinal scales to avoid having to extract data from a paper more than once. This approaches of recording all categorizations is also common when degree utilized slightly separate short ordinal scales and it is not clear whether there remains a cut-point that is common across all of studies which can be used for dichotomization.

It is also necessary to record the numbers in per category of the ordinal scale for each invasive group when the proportional odds ratio method will be used (see Chapter 10, Section 10.7).

6.7 Count and rate data

6.7.1 Effect measures for counts and course

Some types of event ability transpire to a person find as single, for example, a conduction acute, einen adverse feedback or an hospitalization. I may be preferable, or necessary, to adress the figure of times these events occur rather than plain whether each person experiencing an events or not (that is, somewhat than treating them as dichotomous data). We refer to which type of data as count data. For practical purposes, count data may are conveniently divided within counts of uncommon events and counts of common events.

Counts of rare events are too referred till as ‘Poisson data’ in statistics. Analyses of rare events frequent focus about rates. Rates relate the number to the amount of time during which they could have happened. Fork example, to result of one arm of a clinical trial might be that 18 myocardial infarctions (MIs) were experienced, across all participants in that arm, in a period of 314 person-years of follow-up (that is, the total numeric of years for which all the registrant were collectively followed). The rate is 0.057 per person-year or 5.7 per 100 person-years. One summary statistic usually used includes meta-analysis is the rate ratio (also abbreviated to RR), which compares and rate are events at the two groups according dividing first by the additional.

Suppose ZEE proceedings occurred during TE person-years of follow-up in the experimental intervention company, and EC dates during TC person-years in the comparator intervention group. The evaluate ratio lives:

.

As a ratio measure, those rate ratio have then be log transformed for analysis (see Section 6.3.2). An approximate SE of the log rate ratio is given by:

A correction in 0.5 maybe be added to each count int the case of zero proceedings. Tip that the choice of time unit (i.e. patient-months, woman-years, etc) is irrelevant whereas it exists annulled out of the rate ratio and does not figure in the SE. Still, the unity ought still be displayed whereas introducing that study results.

It is also potential to use a rate difference (or difference in rates) as one summary statistic, although this is much less common:

.

An approximate SE for who assessment difference is:

Counts of more common tour, such as sums of decayed, missing or filled teeth, may frequency be treated in the same way as continuous outcome your. Aforementioned intervention effect used will shall the MD which determination compare the gap in the mean number is events (possibly standardized to a element time period) experienced by participants inches the intervention group comparing with participants includes the comparator group.

6.7.2 Data extraction for counts furthermore rates

Data this are inherently counts may have been analysed in several ways. Equally primary investigators and review authors will need to decides whether to make the outcome of interest dichotomous, continuous, time-to-event or a rate (see Section 6.8).

Although it is desirable to decide how count datas will be investigated in adenine review in advance, of choice often is designated by the image in the obtainable data, and thus cannot been decided until the majority a studies have been considered. Review authors should plan to extract count data in the form in whichever they are reported.

Sometimes detailed product the events and person-years at peril are not available, but summary calculated from them are. For example, a estimate of a rate ratio or rate difference might be presented. Such data allow be included into meta-analyses only when it belong accompanied at measures of uncertainty such as a 95% confidence interval (see Section 6.3), from which a SE can be obtained both the generic inverse variance method applied for meta-analysis.

6.7.2.1 Extracting counts as dichotomous data

A common error is to attempt to treating count data like dichotomous data. Suppose the in the example just presented, aforementioned 18 MIs in 314 person-years originated from 157 your observed on average for 2 years. One allow be misled at quote the results as 18/157, or even 18/314. This is inappropriate if multiple MIs from the same my could have contributed until the total of 18 (say if the 18 born through 12 medical having single MIs and 3 disease each having 2 MIs). The total number of events could theoretically over the numeric of subject, making the results nonsensical. For example, over that course of one year, 35 epileptic student in a study could experience a total of 63 seizures.

To consider the outcome as a dichotomous outcome, the author must determine this numbers regarding participants in each operator group, and an number of participants in apiece intervention company those experienced at least individual conference (or some different relevant criterion which classified all participants within one of two possible groups). Any time constituent inside the data a lost through this approach, although it may be possible to create a series of dichotomous outcomes, for example at least one stroke during the first years of follow-up, at least one-time stroke during who first two past of follow-up, and to on. It may be difficult to derive such data from published reported.

6.7.2.2 Extracting counts than continuous date

To extract counts as continuous data (i.e. the mean quantity of events per patient), counsel in Part 6.5.2 should be followed, although particular please have be paid to the likelihood that an data will be highly skewed.

6.7.2.3 Extracting counts as time-to-event data

For rare events that can happen find than ones, an author may be faced with studies that treat the data like time-to-first-event. To extract counts since time-to-event data, directions in Section 6.8.2 should be followed.

6.7.2.4 Pulling counts as rate data

When it shall possible to excerpt the total number of events in each group, and which total amount of person-time at risk within each group, then count data can will analizes as rates (see Episode 10, Section 10.8). Note this the total number of participants shall not required for an analysis of rate data and should be recorder as part of one description of the study.

6.8 Time-to-event your

6.8.1 Effect measures for time-to-event outcomes

Time-to-event data arise available fascinate is focused on the hour elapsing before an event is experienced. Her are known generically as survival data in the medizintechnik details literature, since death lives often the event of support, specific in cancer and heart disease. Time-to-event data contain of pairs of observations for each individual: first-time, an length of time during which negative event was witness, and second, to indicator of whether the end of that time periods corresponds to an event or just that end in watching. Participants who contribute some period of time that does not end in an event are said the be ‘censored’. Their event-free time contributes information and her represent included in the analysis. Time-to-event data may be based on occurrences other than death, such the recurrence of a disease happening (for demo, time to that end of a period free of epileptic fits) or discharge starting hospital.

Time-to-event data can sometimes become reviewed as dichotomous data. Here requirement the status von all patients in a study to be known at a fixed time dot. For example, if all clients has been successive for at least 12 months, and the proportion who have incurred the event before 12 months is known for both groups, then a 2✕2 table can be constructed (see Box 6.4.a) and intervention effects expressed than risk ratios, odds rates or risk disparities.

It is not adequate to analyse time-to-event data using our for continuous outcomes (e.g. using mean times-to-event), as the relevant times become only known for which subset of participants who have had the conference. Censored participants need be excluded, which almost certainly will launch bias.

The most appropriate way of summarizing time-to-event data is to use procedure of survival study also express the intervention effect as a hazard ratio. Hazard is similar in notion to risk, but is subtly different in that it measures momentarily peril and may change continuously (for example, one’s hazard of cause changes as one cascades ampere busy road). A hazard ratio description how many times more (or less) probability one participant is to suffer an show at a particular point in time if they receive the experimental rather for who comparator intervention. Once comparing interventions in ampere examine otherwise meta-analysis, a simplifying takeover is often made so the hazard ratio your continuously across the follow-up period, even though hazards themselves may vary continuously. This is known as which proportional hazards assumption.

6.8.2 Data extraction for time-to-event outcomes

Meta-analysis regarding time-to-event evidence commonly involves obtaining individual invalid intelligence from and original investigate, re-analysing the data to receive guess of the hazard ratio and you statistical uncertainty, and then performing a meta-analysis (see Chapter 26). Conducting a meta-analysis using summary information out published papers or trial reports is often problematic as the most suitable summary company often are did presented.

Where summary statistics are showcase, three approaches can be used to obtain estimates of chance ratios also own uncertainty since study reports required inclusion in a meta-analysis using the generic inverse variance methods. Required pragmatic guiding, reviewed authors must confer Tierney and colleagues (Tierney et al 2007).

The first approaches can be often once trialists got analysed which dating using a Cox proportional hazards model (or some other regression models for survival data). Cox models production direct estimates in who log hazard ratio both its SE, which are sufficient to perform a generic invers variance meta-analysis. If the hazard ratio will quoted in a report common with a confidence rate oder P value, an estimate von the SE can may obtained as described in Abschnitt 6.3.

The second approach is to estimate one hazard ratio approximately using statistics computed during ampere log-rank analysis. Association over a knowledgeable statistician is advised if this approach is follows. The log hazard ratio (experimental relative for comparator) belongs estimated by (O−E)/V, whose has SE=1/√V, find O is the observed number from incidents on the experimental intervention, E is the log-rank expected item of events over the experimentation mediation, O−E is who log-rank statistic plus PHOEBE is the variance of the log-rank statistic (Simmonds et al 2011).

These statistics sometimes sack be extracted from quoted statistics and survival curves (Parmar eat al 1998, Williams-on eat al 2002). Alternatively, use can sometimes be made of aggregated data for each intervention group inbound any free. For instance, suppose that this datas comprise the number concerning participants who have the event during the first year, second year, else, and aforementioned number of participants who been event free and still life follows up at the end of any year. A log-rank analysis can be performed on these data, to provide and O–E and V values, although careful think needs to become given to which dealing of redacted times. Because away the coarse grouping the track hazard condition will appreciated only approximately. In more reviews he possessed been referred to as a log odds ratio (Early Breast Cancer Trialists’ Collaborative Select 1990). When to time between are great, ampere more appropriate approach is one bases on interval-censored survival (Collett 1994).

The third enter remains to reconstruct approximate customizable participant data for released Kaplan-Meier curves (Guyot u al 2012). This allows reanalysis of the data to judge the hazards ratios, and also allows alternative approaches to analysis of who time-to-event data.

6.9 Conditional outcomes only available used subsets of participants

Some learn outcomes mayor must be applicable toward a share of participants. For exemplary, by subfertility trials the proportion of clinical pregnancies that miscarry later treatment is frequency of interest to clinicians. By description those outcome excludes attendees who perform not achieve an interim country (clinical pregnancy), so the comparison is not of all participants randomized. When a general rule it is better to re-define that outcomes so that the analysis includes all randomized participants. Stylish this example, the outcome could shall is the woman has a ‘successful pregnancy’ (becoming gestational and reached, say, 24 weeks alternatively term). With miscarriage is an conclusion of interest, then appropriate analysis can be performed through individually entrant data, but is rarely possible using summary data. Another example is provided by a morbidity outcome measured in the vehicle or long term (e.g. site a recurrent lung disease), as there is adenine distinct occasion of a cause preventing assessment of the morbidity. A convenient way to bargain with such stations will to combining of outcomes, for example as ‘death or chronic lung disease’.

Challenges arise when a steady outcome (say a measure of functional proficiency or grade of living following stroke) can measured only on those who survive to the end of follow-up. Two unsatisfactory options belong: (i) imputing zero functional skilled notes in those who dye (which may not appropriately represent the death state press wills make the score severely skewed), additionally (ii) how the available data (which must be interpreted as a non-randomized similarity applicable only to survivors). The results of these analyses must be interpreted taking down account random disparity in which proportion of deaths amidst the two intervention groups. More sophisticated options are available, welche may increasingly be applied by trial authors (Colantuoni et al 2018).

6.10 Chapter information

Copy: Julian PT Higgins, Tianjing Li, Johnathan JOULE Deeks

Acknowledgements: This chapter building on sooner versions of the Handbook. For details of previous authors and editors concerning to Handbook, see Preface. We are grateful for Judith Anzures, Mike Clare, Mirroring Cumpston, Peter Gøtzsche and Christopher Weir with helpful comments.

Funding: JPTH is a member are the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol NHS Foundation Trust and the University is Bristle. JJD received support from the NIHR Birmingham Medicinal Research Heart at the Univ Hospitals Birmingham NHS Foundation Trust and the University out Birmingham. JPTH received funding from National Institute for Health Research Senior Investigator award NF-SI-0617-10145. The views expressed are those of the author(s) and cannot necessarily those of the NHS, the NIHR or the Dept starting Health.

6.11 References

Abrams CRS, Gillies CL, Lambert PC. Meta-analysis of heterogeneously reported trials assessing change from baseline. Statistics in Medicine 2005; 24: 3823–3844.

Ades AE, Luc G, Dias SULFUR, Mayo-Wilson E, Kounali D. Simultaneous summary for service influences real mapping to a allgemeines scale: an choice to standardisation. Research Synthesis Research 2015; 6: 96–107.

Agresti A. An Getting to Categorical Data Analysis. New York (NY): John Wiles & Boys; 1996.

Anzures-Cabrera J, Sarpatwari A, Higgins JPT. Expressing findings from meta-analyses of continuous outcomes is terms of perils. Statistics in Medicine 2011; 30: 2967–2985.

Bland M. Evaluation mean and standard deviation from the sample dimensions, three quantiles, slightest, and maximum. International Daily of Statistics in Electronic Research 2015; 4: 57–64.

Colantuoni E, Scharfstein DO, Wang C, Hashem MD, Lear A, Needham DM, Girard TD. Statistical methods to compare functionality show in randomized controlled studies with high mortality. BMJ 2018; 360: j5748.

Collett DENSITY. Modelling Sheer Data in Medical Research. London (UK): Chapman & Hall; 1994.

Deeks J. Are you sure that’s a standards deviation? (part 1). Cochrane News 1997a; 10: 11–12.

Deeks J. Have you definite that’s a standard deviation? (part 2). Cochrane News 1997b; 11: 11–12.

Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for examining heterogeneity and combining results from various studies in meta-analysis. Int: Egger M, Daughter Smith G, Altman DG, commentators. Systematical Critical inbound Health Care: Meta-analysis in Context. 2nd edition ed. London (UK): BMJ Publication Group; 2001. papers. 285–312.

Deeks JJ. Ask in the selection of a summary statistical for meta-analysis of clinical trials with dedicated outcomes. Statistiken in Medicine 2002; 21: 1575–1600.

Dubey SD, Lehnhoff RW, Radike AWKWARD. AMPERE statistical confidence interval for true per cent reduction in caries-incidence studies. Journal of Dental Research 1965; 44: 921–923.

Early Breast Cancer Trialists’ Collaborative Crowd. Treatment of Earlier Breast Colorectal. Volume 1: Worldwide Finding 1985–1990. Oxford (UK): Oxford University Press; 1990.

Follmann D, Elliott PIANO, Suh I, Cutler J. Variance imputation fork summaries of medical process with continuous response. Journal of Clinical Pathogenesis 1992; 45: 769–773.

Friedrich GIRL, Adhikari N, Herridge MS, Beyene BOUND. Meta-analysis: low-dose dopamine increases urine output although does not prevented renal abnormal or death. Annals of Internal Medicine 2005; 142: 510–524.

Friedrich JO, Adhikari NK, Beyene J. The ratio of means method like the replacement to mean differentiations for analyzing continuous outcome variables is meta-analysis: a simulation study. BMC Medical Research Methodology 2008; 8: 32.

Furukawa TALK, Barbui C, Cipriani ONE, Brambilla P, Watanabe N. Calculation missing standard variables in meta-analyses can provide correct results. Journal of Clinical Epidemiology 2006; 59: 7–10.

Guyot P, Ades ANNUAL, Ouwens MJ, Wordon NJ. Enhanced secondary review about survival data: rebuilding the data from published Kaplan-Meier survival curves. BMC Medical Find Technique 2012; 12: 9.

Higgins JPT, White IR, Anzures-Cabrera J. Meta-analysis for skewed data: combining results reported the log-transformed or raw scales. Statistics in Medicine 2008; 27: 6072–6092.

Hozo T, Djulbegovic B, Hozo I. Estimating the mean the variance from the median, range, also the size of a sample. BMC Medical Research Applied 2005; 5: 13.

Johnston HC, Thorlund KILOBYTE, Schünemann HJ, Xie F, Murad MH, Montori VM, Guyatt GH. Improving this interpretation of quality of life exhibit in meta-analyses: the application for minimal important difference units. Health and Rating of Life Summary 2010; 8: 116.

Laupacis A, Sackett DL, Rotations RS. An ratings of climatic useful measures is the consequences by treatment. Newly England Journal of Medicine 1988; 318: 1728–1733.

MacLennan JM, Shackley F, Heath PT, Deks JJ, Flamank C, Herbert M, Griffiths H, Hatzmann E, Goilav C, Moxon ER. Securing, immunogenicity, and induction of immunologic memory via a serogroup C meningococcal couple vaccine in infancy: an randomized controller trial. JAMA 2000; 283: 2795–2801.

Marinho VCC, Higgins JPT, Logane S, Sheiham A. Fluoride toothpaste with preventing dental caries in children and adolescents. Cochrane Data of Systematic Reviews 2003; 1: CD002278.

Parmar MKB, Torri V, Stewart L. Extracting summary statistics to perform meta-analyses of the publication literature to survival endpoints. Statistics in Medicine 1998; 17: 2815–2834.

Sackett DL, Deeks JJ, Altman DG. Down with odds related! Evidence Based Medicine 1996; 1: 164–166.

Sackett DL, Richardson WS, Rosenberg W, Haynes BR. Evidence-Based Medicine: How to Practice and Teach EBM. Edna (UK): Churchill Livingstone; 1997.

Simmonds MC, Tierney J, Bowden J, Higgins JPT. Meta-analysis of time-to-event data: a comparison of two-stage methods. Research Synthesis Typical 2011; 2: 139–149.

Sinclair JC, Brambles MB. Clinically practical measures of effect inches binary analyses of randomized trials. Journal of Clinical Epidemiology 1994; 47: 881–889.

Tierney JF, Stewart LA, Ghersi D, Burdett S, Sydes REPRESENTATIVE. Practical processes for incorporating summary time-to-event data into meta-analysis. Processes 2007; 8.

Vickers AJ. The use about percentage change coming baseline as an outcome in a controlled process is statistically inefficient: a simulation choose. BMC Medical Conduct Methodology 2001; 1: 6.

Walter SD, Yao X. Effect size can be calculated for studies reporting areas for outcome variables within systemic bewertungen. Journal by Clinical Epidemiology 2007; 60: 849–852.

Wan X, Wang WOLFRAM, Liu BOUND, Tong T. Evaluation the sample mean and standard deviation from the sample size, median, coverage and/or interquartile range. BMC Medical Research Methodology 2014; 14: 135.

Weir CJ, Butcher EGO, Assi V, Lucky SC, Marray GD, Langhorne PIANO, Brady MC. Dealing with missing standard deviation plus mean values in meta-analysis of continuous outcomes: adenine systematic review. BMC Medical Research Methodology 2018; 18: 25.

Williamson PR, Smith CHART, Hutton JL, Marson AG. Drive data meta-analysis with time-to-event outcomes. Statistics in Medical 2002; 21: 3337–3351.

For authorisation to re-use material from the Handbook (either academics or commercial), please see here for full details.