**Unformatted text preview: ** Modern
Mathematical
Statistics with
Applications
Jay L. Devore
California Polytechnic State University Kenneth N. Berk
Illinois State University Australia ¥ Canada ¥ Mexico ¥ Singapore ¥ Spain ¥
United Kingdom ¥ United States Modern Mathematical Statistics with Applications
Jay L. Devore and Kenneth N. Berk Acquisitions Editor: Carolyn Crockett
Editorial Assistant: Daniel Geller
Technology Project Manager: Fiona Chong
Senior Assistant Editor: Ann Day
Marketing Manager: Joseph Rogove
Marketing Assistant: Brian Smith
Marketing Communications Manager:
Darlene Amidon-Brent
Manager, Editorial Production: Kelsey McGee
Creative Director: Rob Hugel Art Director: Lee Friedman
Print Buyer: Rebecca Cross
Permissions Editor: Joohee Lee
Production Service and Composition: G&S Book Services
Text Designer: Carolyn Deacy
Copy Editor: Anita Wagner
Cover Designer: Eric Adigard
Cover Image: Carl Russo
Cover Printer: Phoenix Color Corp
Printer: RR Donnelley-Crawfordsville ' 2007 Duxbury, an imprint of Thomson Brooks/Cole, a part
of The Thomson Corporation. Thomson, the Star logo, and
Brooks/Cole are trademarks used herein under license. Thomson Higher Education
10 Davis Drive
Belmont, CA 94002-3098
USA ALL RIGHTS RESERVED. No part of this work covered
by the copyright hereon may be reproduced or used in any
form or by any means graphic, electronic, or mechanical,
including photocopying, recording, taping, web distribution,
information storage and retrieval systems, or in any other
manner without the written permission of the publisher .
Printed in the United States of America
1 2 3 4 5 6 7 09 08 07 06 05 For more information about our products, contact us at:
Thomson Learning Academic Resource Center
1-800-423-0563
For permission to use material from this text or product,
submit a request online at .
Any additional questions about permissions can be
submitted by e-mail to [email protected] Library of Congress Control Number: 2005929405
ISBN 0-534-40473-1 To my wife Carol
whose continuing support of my writing efforts
over the years has made all the difference. To my wife Laura
who, as a successful author, is my mentor and role model. About the Authors Jay L. Devore
Jay Devore received a B.S. in Engineering Science from the University of California,
Berkeley, and a Ph.D. in Statistics from Stanford University. He previously taught at
the University of Florida and Oberlin College, and has had visiting positions at
Stanford, Harvard, the University of Washington, and New York University. He has
been at California Polytechnic State University, San Luis Obispo, since 1977, where he
is currently a professor and chair of the Department of Statistics.
Jay has previously authored v e other books, including Probability and Statistics
for Engineering and the Sciences, currently in its 6th edition. He is a Fellow of the
American Statistical Association, an associate editor for the Journal of the American
Statistical Association, and received the Distinguished Teaching Award from Cal Poly
in 1991. His recreational interests include reading, playing tennis, traveling, and cooking and eating good food. Kenneth N. Berk
Ken Berk has a B.S. in Physics from Carnegie Tech (now Carnegie Mellon) and a Ph.D.
in Mathematics from the University of Minnesota. He is Professor Emeritus of
Mathematics at Illinois State University and a Fellow of the American Statistical
Association. He founded the Software Reviews section of The American Statistician
and edited it for six years. He served as secretary/treasurer, program chair, and chair of
the Statistical Computing Section of the American Statistical Association, and he twice
co-chaired the Interface Symposium, the main annual meeting in statistical computing.
His published work includes papers on time series, statistical computing, regression
analysis, and statistical graphics and the book Data Analysis with Microsoft Excel (with
Patrick Carey). iii Brief Contents
1
2
3
4
5
6
7
8
9
10
11
12
13
14 Overview and Descriptive Statistics 1
Probability 49
Discrete Random Variables and Probability Distributions 94
Continuous Random Variables and Probability Distributions 154
Joint Probability Distributions 229
Statistics and Sampling Distributions 278
Point Estimation 325iv
Statistical Intervals Based on a Single Sample 375
Tests of Hypotheses Based on a Single Sample 417
Inferences Based on Two Samples 472
The Analysis of Variance 539
Regression and Correlation 599
Goodness-of-Fit Tests and Categorical Data Analysis 707
Alternative Approaches to Inference 743
Appendix Tables 781
Answers to Odd-Numbered Exercises 809
Index 829 iv Contents
Preface viii
1 Overview and Descriptive Statistics 1
1.1
1.2
1.3
1.4 2 56 Introduction 94
Random Variables 95
Probability Distributions for Discrete Random Variables 99
Expected Values of Discrete Random Variables 109
Moments and Moment Generating Functions 118
The Binomial Probability Distribution 125
*Hypergeometric and Negative Binomial Distributions 134
*The Poisson Probability Distribution 142 Continuous Random Variables and Probability Distributions 154
4.1
4.2
4.3
4.4
4.5
4.6
4.7 5 Introduction 49
Sample Spaces and Events 50
Axioms, Interpretations, and Properties of Probability
Counting Techniques 65
Conditional Probability 73
Independence 83 Discrete Random Variables and Probability Distributions 94
3.1
3.2
3.3
3.4
3.5
3.6
3.7 4 9 Probability 49
2.1
2.2
2.3
2.4
2.5 3 Introduction 1
Populations and Samples 2
Pictorial and Tabular Methods in Descriptive Statistics
Measures of Location 25
Measures of Variability 33 Introduction 154
Probability Density Functions and Cumulative Distribution Functions
Expected Values and Moment Generating Functions 167
The Normal Distribution 175
*The Gamma Distribution and Its Relatives 190
*Other Continuous Distributions 198
*Probability Plots 206
*Transformations of a Random Variable 216 155 Joint Probability Distributions 229
Introduction 229
5.1 Jointly Distributed Random Variables 230
5.2 Expected Values, Covariance, and Correlation 242
v vi Contents 5.3 *Conditional Distributions 249
5.4 *Transformations of Random Variables
5.5 *Order Statistics 267 6 Statistics and Sampling Distributions 278
6.1
6.2
6.3
6.4 7 Introduction 278
Statistics and Their Distributions 279
The Distribution of the Sample Mean 291
The Distribution of a Linear Combination 300
Distributions Based on a Normal Random Sample 309
Appendix: Proof of the Central Limit Theorem 323 Point Estimation
7.1
7.2
7.3
7.4 8 262 325 Introduction 325
General Concepts and Criteria 326
*Methods of Point Estimation 344
*Sufﬁciency 355
*Information and Efﬁciency 364 Statistical Intervals Based on a Single Sample 375
Introduction 375
Basic Properties of Conﬁdence Intervals 376
Large-Sample Conﬁdence Intervals for a Population Mean and Proportion
Intervals Based on a Normal Population Distribution 393
*Conﬁdence Intervals for the Variance and Standard Deviation of
a Normal Population 401
8.5 *Bootstrap Conﬁdence Intervals 404 8.1
8.2
8.3
8.4 9 Tests of Hypotheses Based on a Single Sample 417
9.1
9.2
9.3
9.4
9.5 10 Introduction 417
Hypotheses and Test Procedures 418
Tests About a Population Mean 428
Tests Concerning a Population Proportion 442
P-Values 448
*Some Comments on Selecting a Test Procedure 456 Inferences Based on Two Samples 472
Introduction 472
10.1 z Tests and Conﬁdence Intervals for a Difference Between Two
Population Means 473
10.2 The Two-Sample t Test and Conﬁdence Interval 487
10.3 Analysis of Paired Data 497
10.4 Inferences About Two Population Proportions 507
10.5 *Inferences About Two Population Variances 515
10.6 *Comparisons Using the Bootstrap and Permutation Methods 520 11 The Analysis of Variance 539
Introduction 539
11.1 Single-Factor ANOVA 540
11.2 *Multiple Comparisons in ANOVA 552
11.3 *More on Single-Factor ANOVA 560 385 Contents 11.4 *Two-Factor ANOVA with Kij 1 570
11.5 *Two-Factor ANOVA with Kij > 1 584 12 Regression and Correlation 599
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8 13 Introduction 599
The Simple Linear and Logistic Regression Models 600
Estimating Model Parameters 611
Inferences About the Regression Coefﬁcient b1 626
Inferences Concerning mY # x* and the Prediction of Future Y Values
Correlation 648
*Aptness of the Model and Model Checking 660
*Multiple Regression Analysis 668
*Regression with Matrices 689 640 Goodness-of-Fit Tests and Categorical Data
Analysis 707
Introduction 707
13.1 Goodness-of-Fit Tests When Category Probabilities Are Completely Speciﬁed
13.2 *Goodness-of-Fit Tests for Composite Hypotheses 716
13.3 Two-Way Contingency Tables 729 14 Alternative Approaches to Inference 743
14.1
14.2
14.3
14.4
14.5 Introduction 743
*The Wilcoxon Signed-Rank Test 744
*The Wilcoxon Rank-Sum Test 752
*Distribution-Free Conﬁdence Intervals 757
*Bayesian Methods 762
*Sequential Methods 770 Appendix Tables 781
A.1
A.2
A.3
A.4
A.5
A.6
A.7
A.8
A.9
A.10
A.11
A.12
A.13
A.14
A.15
A.16
A.17 Cumulative Binomial Probabilities 782
Cumulative Poisson Probabilities 784
Standard Normal Curve Areas 786
The Incomplete Gamma Function 788
Critical Values for t Distributions 789
Tolerance Critical Values for Normal Population Distributions
Critical Values for Chi-Squared Distributions 791
t Curve Tail Areas 792
Critical Values for F Distributions 794
Critical Values for Studentized Range Distributions 800
Chi-Squared Curve Tail Areas 801
Critical Values for the Ryan–Joiner Test of Normality 803
Critical Values for the Wilcoxon Signed-Rank Test 804
Critical Values for the Wilcoxon Rank-Sum Test 805
Critical Values for the Wilcoxon Signed-Rank Interval 806
Critical Values for the Wilcoxon Rank-Sum Interval 807
b Curves for t Tests 808 Answers to Odd-Numbered Exercises 809
Index 829 790 708 vii Preface
Purpose
Our objective is to provide a postcalculus introduction to the discipline of statistics that
¥
¥
¥
¥
¥ Has mathematical integrity and contains some underlying theory.
Shows students a broad range of applications involving real data.
Is very current in its selection of topics.
Illustrates the importance of statistical software.
Is accessible to a wide audience, including mathematics and statistics majors (yes,
there are a few of the latter), prospective engineers and scientists, and those business
and social science majors interested in the quantitative aspects of their disciplines. A number of currently available mathematical statistics texts are heavily oriented toward a rigorous mathematical development of probability and statistics, with
much emphasis on theorems, proofs, and derivations. The emphasis is more on mathematics than on statistical practice. Even when applied material is included, the scenarios are often contrived (many examples and exercises involving dice, coins, cards,
widgets, or a comparison of treatment A to treatment B).
So in our exposition we have tried to achieve a balance between mathematical
foundations and statistical practice. Some may feel discomfort on grounds that because
a mathematical statistics course has traditionally been a feeder into graduate programs
in statistics, students coming out of such a course must be well prepared for that path.
But that view presumes that the mathematics will provide the hook to get students
interested in our discipline. That may happen for a few mathematics majors. However,
our experience is that the application of statistics to real-world problems is far more
persuasive in getting quantitatively oriented students to pursue a career or take further
coursework in statistics. Let s rst dra w them in with intriguing problem scenarios and
applications. Opportunities for exposing them to mathematical foundations will follow
in due course. In our view it is more important for students coming out of this course
to be able to carry out and interpret the results of a two-sample t test or simple regression analysis than to manipulate joint moment generating functions or discourse on various modes of convergence. Content
The book certainly does include core material in probability (Chapter 2), random variables and their distributions (Chapters 3—5),and sampling theory (Chapter 6). But our
desire to balance theory with application/data analysis is re ected in the w ay the book
starts out, with a chapter on descriptive and exploratory statistical techniques rather
than an immediate foray into the axioms of probability and their consequences. After
viii Preface ix the distributional infrastructure is in place, the remaining statistical chapters cover the
basics of inference. In addition to introducing core ideas from estimation and hypothesis testing (Chapters 7—10),there is emphasis on checking assumptions and looking at
the data prior to formal analysis. Modern topics such as bootstrapping, permutation
tests, residual analysis, and logistic regression are included. Our treatment of regression, analysis of variance, and categorical data analysis (Chapters 11—13) is de nitely
more oriented to dealing with real data than with theoretical properties of models. We
also show many examples of output from commonly used statistical software packages,
something noticeably absent in most other books pitched at this audience and level.
(Figures 10.1 and 11.14 have been reproduced here for illustrative purposes.) For example, the rst section on multiple re gression toward the end of Chapter 12 uses no matrix
algebra but instead relies on output from software as a basis for making inferences.
40 Interaction Plot(data means)for vibration
30 Source
1
2
3
4
5 Final 17
16
15
14 20 Source 13
17
10
*
*
*
* 16
15
14 Material
A
P
S Material 13
0
Control Exper Figure 10.1 1 2 3 4 5 A P S Figure 11.14 Mathematical Level
The challenge for students at this level should lie with mastery of statistical concepts
as well as with mathematical wizardry. Consequently, the mathematical prerequisites
and demands are reasonably modest. Mathematical sophistication and quantitative reasoning ability are, of course, crucial to the enterprise. Students with a solid grounding
in univariate calculus and some exposure to multivariate calculus should feel comfortable with what we are asking of them. The several sections where matrix algebra
appears (transformations in Chapter 5 and the matrix approach to regression in the last
section of Chapter 12) can easily be deemphasized or skipped entirely.
Our goal is to redress the balance between mathematics and statistics by putting
more emphasis on the latter. The concepts, arguments, and notation contained herein
will certainly stretch the intellects of many students. And a solid mastery of the material will be required in order for them to solve many of the roughly 1300 exercises
included in the book. Proofs and derivations are included where appropriate, but we
think it likely that obtaining a conceptual understanding of the statistical enterprise will
be the major challenge for readers. x Preface Recommended Coverage
There should be more than enough material in our book for a year-long course. Those
wanting to emphasize some of the more theoretical aspects of the subject (e.g., moment
generating functions, conditional expectation, transformations, order statistics,
suf cienc y) should plan to spend correspondingly less time on inferential methodology in the latter part of the book. We have tried to help instructors by marking certain
sections as optional (using an *). Optional is not synonymous with unimportant ;
an * is just an indication that what comes afterward makes at most minimal use of what
is contained in a section so marked. Other than that, we prefer to rely on the experience
and tastes of individual instructors in deciding what should be presented. We would
also like to think that students could be asked to read an occasional subsection or even
section on their own and then work exercises to demonstrate understanding, so that not
everything would need to be presented in class. Remember that there is never enough
time in a course of any duration to teach students all that we d like them to know! Acknowledgments
We gratefully acknowledge the plentiful feedback provided by the following reviewers:
Bhaskar Bhattacharya, Southern Illinois University; Ann Gironella, Idaho State
University; Tiefeng Jiang, University of Minnesota; Iwan Praton, Franklin & Marshall
College; and Bruce Trumbo, California State University, East Bay.
A special salute goes to Bruce Trumbo for going way beyond his mandate in providing us an incredibly thoughtful review of 40+ pages containing many wonderful
ideas and pertinent criticisms. Matt Carlton, a Cal Poly colleague of one of the authors,
has provided stellar service as an accuracy checker, and has also prepared a solutions
manual.
Our emphasis on real data would not have come to fruition without help from the
many individuals who provided us with data in published sources or in personal communications; we greatly appreciate all their contributions.
We very much appreciate the production services provided by the folks at G&S
Book Services. Our production editor, Gretchen Otto, did a rst-rate job of mo ving the
book through the production process, and was always prompt and considerate in her
communications with us. Thanks to our copy editor, Anita Wagner, for employing a
light touch and not taking us too much to task for our occasional grammatical and technical lapses. The staff at Brooks/Cole—Duxb
ury has been as supportive on this project
as on ones with which we have previously been involved. Special kudos go to Carolyn
Crockett, Ann Day, Dan Geller, and Kelsey McGee, and apologies to any whose names
were inadvertently omitted from this list. A Final Thought
It is our hope that students completing a course taught from this book will feel as passionately about the subject of statistics as we still do after so many years in the profession. Only teachers can really appreciate how gratifying it is to hear from a student after
he or she has completed a course that the experience had a positive impact and maybe
even affected a career choice.
Jay Devore
Ken Berk CHAPTER ONE Overview
and Descriptive
Statistics
Introduction
Statistical concepts and methods are not only useful but indeed often indispensable in understanding the world around us. They provide ways of gaining new
insights into the behavior of many phenomena that you will encounter in your
chosen ﬁeld of specialization.
The discipline of statistics teaches us how to make intelligent judgments
and informed decisions in the presence of uncertainty and variation. Without
uncertainty or variation, there would be little need for statistical methods or
statisticians. If the yield of a crop were the same in every ﬁeld, if all individuals reacted the same way to a drug, if everyone gave the same response to an
opinion survey, and so on, then a single observation would reveal all desired
information.
An interesting example of variation arises in the course of performing
emissions testing on motor vehicles. The expense and time requirements of the
Federal Test Procedure (FTP) preclude its widespread use in vehicle inspection
programs. As a result, many agencies have developed less costly and quicker tests,
which it is hoped replicate FTP results. According to the journal article “Motor
Vehicle Emissions Variability” (J. Air Waste Manag. Assoc., 1996: 667–675), the
acceptance of the FTP as a gold standard has led to the widespread belief that
repeated measurements on the same vehicle would yield identical (or nearly
identical) results. The authors of the article applied the FTP to seven vehicles 1 2 CHAPTER 1 Overview and Descriptive Statistics characterized as “high emitters.” Here are the results of four hydrocarbon and carbon
dioxide tests on one such vehicle:
HC (g/mile)
CO (g/mile) 13.8
118 18.3
149 32.2
232 32.5
236 The substantial variation in both the HC and CO measurements casts considerable doubt
on conventional wisdom and makes it much more difﬁcult to make precise assessments
about emissions levels.
Ho...

View
Full Document