A Course in Statistics with R.pdf

A Course in Statistics with R.pdf
 

书籍描述

内容简介
Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and the related statistical techniques underlying them through practical applications, and hence helps the reader to achieve a clear understanding of the associated statistical models.

Key features:

  • Integrates R basics with statistical concepts
  • Provides graphical presentations inclusive of mathematical expressions
  • Aids understanding of limit theorems of probability with and without the simulation approach
  • Presents detailed algorithmic development of statistical models from scratch
  • Includes practical applications with over 50 data sets

目录

I The Preliminaries 1

1 Why R? 2

1.1 Why R? 2

1.2 R Installation 4

1.3 There is Nothing Such as PRACTICALS 5

1.4 Data Sets in R and Internet 6

1.4.1 List of Web-sites Containing DATA SETS 7

1.4.2 Antique Datasets 8

1.5 http://cran.r-project.org 10

1.5.1 http://r-project.org 11

1.5.2 http://www.cran.r-project.org/web/views/ 11

1.5.3 Is subscribing to R-Mailing List useful? 12

1.6 R and Its Interface with Other Software 12

1.7 help and/or ? 13

1.8 R Books 14

1.9 A Road Map 15

2 The R Basics 18

2.1 Introduction 18

2.2 Simple Arithmetics and a Little Beyond 19

2.2.1 Absolute Values, Remainders, etc 20

2.2.2 round, floor, etc 21

2.2.3 Summary Functions 21

2.2.4 Trigonometric Functions 22

2.2.5 Complex Numbers* 23

2.2.6 Special Mathematical Functions 25

2.3 Some Basic R Functions 27

2.3.1 Summary Statistics 27

2.3.2 is, as, is.na, etc 29

2.3.3 factors, levels, etc 31

2.3.4 Control Programming 32

2.3.5 Other Useful Functions 34

2.3.6 Calculus* 37

2.4 Vectors and Matrices in R 38

2.4.1 Vectors 39

2.4.2 Matrices 43

2.5 Data Entering and Reading from Files 48

2.5.1 Data Entering 48

2.5.2 Reading Data from External Files 51

2.6 Working with Packages 52

2.7 R Session Management 54

2.8 Bibliography 54

2.9 Complements, Problems, and Programs 55

3 Data Preparation and Other Tricks 57

3.1 Introduction 57

3.2 Manipulation with Complex Format Files 58

3.3 Reading Datasets of Foreign Formats 64

3.4 Displaying R Objects 65

3.5 Manipulation Using R Functions 66

3.6 Working with Time and Date 68

3.7 Text Manipulations 71

3.8 Scripts and Text Editors for R 73

3.8.1 Text Editors for Linuxians 74

3.9 Bibliography 75

3.10 Complements, Problems, and Programs 75

4 Exploratory Data Analysis 77

4.1 Introduction: The Tukey’s School of Statistics 77

4.2 Essential Summaries of EDA 78

4.3 Graphical Techniques in EDA 81

4.3.1 Boxplot 81

4.3.2 Histogram 86

4.3.3 Histogram Extensions and the Rootogram 90

4.3.4 Pareto Chart 93

4.3.5 Stem-and-Leaf Plot 95

4.3.6 Run Chart 100

4.3.7 Scatter Plot 101

4.4 Quantitative Techniques in EDA 103

4.4.1 Trimean 104

4.4.2 Letter Values 105

4.5 Exploratory Regression Models 107

4.5.1 Resistant Line 108

4.5.2 Median Polish 110

4.6 Bibliography 113

4.7 Complements, Problems, and Programs 114

II Probability and Inference 116

5 Probability Theory 117

5.1 Introduction 117

5.2 Sample Space, Set Algebra, and Elementary Probability 118

5.3 Counting Methods 127

5.3.1 Sampling: The DiverseWays 128

5.3.2 The Binomial Coefficients and the Pascals Triangle 132

5.3.3 Some Problems Based on Combinatorics 133

5.4 Probability: A Definition 137

5.4.1 The Prerequisites 137

5.4.2 The Kolmogorov Definition 142

5.5 Conditional Probability and Independence 146

5.6 Bayes Formula 147

5.7 Random Variables, Expectations, and Moments 149

5.7.1 The Definition 149

5.7.2 Expectation of Random Variables 153

5.8 Distribution Function, Characteristic Function, and Moment Generation Function 159

5.9 Inequalities 162

5.9.1 The Markov Inequality 162

5.9.2 The Jensen’s Inequality 163

5.9.3 The Chebyshev Inequality 163

5.10 Convergence of Random Variables 164

5.10.1 Convergence in Distributions 165

5.10.2 Convergence in Probability 167

5.10.3 Convergence in rth Mean 168

5.10.4 Almost Sure Convergence 169

5.11 The Law of Large Numbers 170

5.11.1 The Weak Law of Large Numbers 170

5.12 The Central Limit Theorem 172

5.12.1 The de Moivre–Laplace Central Limit Theorem 172

5.12.2 CLT for iid Case 173

5.12.3 The Lindeberg-Feller CLT 175

5.12.4 The Liapounov CLT 181

5.13 Bibliography 184

5.13.1 Intuitive, Elementary, and First Course Source 184

5.13.2 The Classics and Second Course Source 184

5.13.3 The Problem Books 185

5.13.4 Other Useful Source 185

5.13.5 R for Probability 185

5.14 Complements, Problems, and Programs 186

6 Probability and Sampling Distributions 188

6.1 Introduction 188

6.2 Discrete Univariate Distributions 189

6.2.1 The Discrete Uniform Distribution 189

6.2.2 The Binomial Distribution 190

6.2.3 The Geometric Distribution 193

6.2.4 The Negative Binomial Distribution 195

6.2.5 Poisson Distribution 197

6.2.6 The Hypergeometric Distribution 200

6.3 Continuous Univariate Distributions 201

6.3.1 The Uniform Distribution 201

6.3.2 The Beta Distribution 204

6.3.3 The Exponential Distribution 205

6.3.4 The Gamma Distribution 206

6.3.5 The Normal Distribution 207

6.3.6 The Cauchy Distribution 210

6.3.7 The t-Distribution 211

6.3.8 The Chi-square Distribution 211

6.3.9 The F-Distribution 212

6.4 Multivariate Probability Distributions 212

6.4.1 The Multinomial Distribution 213

6.4.2 Dirichlet Distribution 213

6.4.3 The Multivariate Normal Distribution 214

6.4.4 The Multivariate t Distribution 214

6.5 Populations and Samples 215

6.6 Sampling from the Normal Distributions 216

6.7 Some Finer Aspects of Sampling Distributions 219

6.7.1 Sampling Distribution of Median 219

6.7.2 Sampling Distribution of Mean of Standard Distributions 221

6.8 Multivariate Sampling Distributions 222

6.8.1 Noncentral Univariate Chi-square, t, and F Distributions223

6.8.2 Wishart Distribution 225

6.8.3 Hotellings T2 Distribution 226

6.9 Bayesian Sampling Distributions 226

6.10 Bibliography 228

6.11 Complements, Problems, and Programs 228

7 Parametric Inference 230

7.1 Introduction 230

7.2 Families of Distribution 232

7.2.1 The Exponential Family 234

7.2.2 Pitman Family 235

7.3 Loss Functions 236

7.4 Data Reduction 239

7.4.1 Sufficiency 239

7.4.2 Minimal Sufficiency 242

7.5 Likelihood and Information 244

7.5.1 The Likelihood Principle 244

7.5.2 The Fisher Information 250

7.6 Point Estimation 255

7.6.1 Maximum Likelihood Estimation 255

7.6.2 Method of Moments Estimator 264

7.7 Comparison of Estimators 266

7.7.1 Unbiased Estimators 266

7.7.2 Improving Unbiased Estimators 269

7.8 Confidence Intervals 271

7.9 Testing Statistical Hypotheses - The Preliminaries 272

7.10 The Neyman-Pearson Lemma 277

7.11 Uniformly Most Powerful Tests 283

7.12 Uniformly Most Powerful Unbiased Tests 288

7.12.1 Tests for the Means: One- and Two- Sample t-Test 291

7.13 Likelihood Ratio Tests 293

7.13.1 Normal Distribution: One-Sample Problems 294

7.13.2 Normal Distribution: Two-Sample Problem for the Mean297

7.14 Behrens-Fisher Problem 298

7.15 Multiple Comparison Tests 300

7.15.1 Bonferroni’s Method 301

7.15.2 Holm’s Method 302

7.16 The EM Algorithm * 303

7.16.1 Introduction 303

7.16.2 The Algorithm 304

7.16.3 Introductory Applications 305

7.17 Bibliography 311

7.17.1 Early Classics 311

7.17.2 Texts From the Last 30 Years 311

7.18 Complements, Problems, and Programs 312

8 Nonparametric Inference 314

8.1 Introduction 314

8.2 Empirical Distribution Function and Its Applications 314

8.2.1 Statistical Functionals 317

8.3 The Jackknife and Bootstrap Methods 319

8.3.1 The Jackknife 320

8.3.2 The Bootstrap 321

8.3.3 Bootstrapping Simple Linear Model* 324

8.4 Nonparametric Smoothing 326

8.4.1 Histogram Smoothing 327

8.4.2 Kernel Smoothing 330

8.4.3 Nonparametric Regression Models* 334

8.5 Nonparametric Tests 339

8.5.1 The Wilcoxon Signed-Ranks Test 339

8.5.2 The Mann-Whitney test 343

8.5.3 The Siegel-Tukey Test 344

8.5.4 The Wald-Wolfowitz Run Test 347

8.5.5 The Kolmogorov-Smirnov Test 348

8.5.6 Kruskal-Wallis Test* 350

8.6 Bibliography 352

8.7 Complements, Problems, and Programs 352

9 Bayesian Inference 354

9.1 Introduction 354

9.2 Bayesian Probabilities 354

9.3 The Bayesian Paradigm for Statistical Inference 358

9.3.1 Bayesian Sufficiency and the Principle 359

9.3.2 Bayesian Analysis and Likelihood Principle 360

9.3.3 Informative and Conjugate Prior 360

9.3.4 Noninformative Prior 361

9.4 Bayesian Estimation 361

9.4.1 Inference for Binomial Distribution 361

9.4.2 Inference for the Poisson Distribution 365

9.4.3 Inference for Uniform Distribution 366

9.4.4 Inference for Exponential Distribution 368

9.4.5 Inference for Normal Distributions 369

9.5 The Credible Intervals 371

9.6 Bayes Factors for Testing Problems 373

9.7 Bibliography 374

9.8 Complements, Problems, and Programs 375

III Stochastic Processes and Monte Carlo 376

10 Stochastic Processes 377

10.1 Introduction 377

10.2 Kolmogorov’s Consistency Theorem 378

10.3 Markov Chains 380

10.3.1 The m-Step TPM 382

10.3.2 Classification of States 383

10.3.3 Canonical Decomposition of an Absorbing Markov Chain 387

10.3.4 Stationary Distribution and Mean First Passage Time of an Ergodic Markov Chain 390

10.3.5 Time Reversible Markov Chain 391

10.4 Application of Markov Chains in Computational Statistics 392

10.4.1 The Metropolis-Hastings Algorithm 393

10.4.2 Gibbs Sampler 395

10.4.3 Illustrative Examples 395

10.5 Bibliography 403

10.6 Complements, Problems, and Programs 403

11 Monte Carlo Computations 404

11.1 Introduction 404

11.2 Generating the (Pseudo-) Random Numbers 405

11.2.1 Useful Random Generators 405

11.2.2 Probability Through Simulation 408

11.3 Simulation from Probability Distributions and Some Limit Theorems  415

11.3.1 Simulation from Discrete Distributions 415

11.3.2 Simulation from Continuous Distributions 424

11.3.3 Understanding Limit Theorems Through Simulation.426

11.3.4 Understanding The Central Limit Theorem 429

11.4 Monte Carlo Integration 431

11.5 The Accept-Reject Technique 433

11.6 Application to Bayesian Inference 438

11.7 Bibliography 441

11.8 Complements, Problems, and Programs 441

IV Linear Models 443

12 Linear Regression Models 444

12.1 Introduction 444

12.2 Simple Linear Regression Model 445

12.2.1 Fitting a Linear Model 447

12.2.2 Confidence Intervals 449

12.2.3 The Analysis of Variance (ANOVA) 452

12.2.4 The Coefficient of Determination 453

12.2.5 The "lm" Function from R 454

12.2.6 Residuals for Validation of the Model Assumptions 456

12.2.7 Prediction for the Simple Regression Model 461

12.2.8 Regression Through the Origin 462

12.3 The Anscombe Warnings and Regression Abuse 464

12.4 Multiple Linear Regression Model 467

12.4.1 Scatter Plots: A First Look 469

12.4.2 Other Useful Graphical Methods 469

12.4.3 Fitting a Multiple Linear Regression Model 473

12.4.4 Testing Hypotheses and Confidence Intervals 475

12.5 Model Diagnostics for the Multiple Regression Model 480

12.5.1 Residuals 480

12.5.2 Influence and Leverage Diagnostics 483

12.6 Multicollinearity 488

12.6.1 Variance Inflation Factor 489

12.6.2 Eigen System Analysis 491

12.7 Data Transformations 493

12.7.1 Linearization 493

12.7.2 Variance Stabilization 495

12.7.3 Power Transformation 497

12.8 Model Selection 499

12.8.1 Backward Elimination 501

12.8.2 Forward and Stepwise Selection 505

12.9 Bibliography 507

12.9.1 Early Classics 507

12.9.2 Industrial Applications 507

12.9.3 Regression Details 507

12.9.4 Modern Regression Texts 507

12.9.5 R for Regression 508

12.10Complements, Problems, and Programs 508

13 Experimental Designs 510

13.1 Introduction 510

13.2 Principles of Experimental Design 510

13.3 Completely Randomized Designs 512

13.3.1 The CRD Model 512

13.3.2 Randomization in CRD 513

13.3.3 Inference for the CRD Models 515

13.3.4 Validation of Model Assumptions 520

13.3.5 Contrasts and Multiple Testing for the CRD Model 522

13.4 Block Designs 527

13.4.1 Randomization and Analysis of Balanced Block Designs527

13.4.2 Incomplete Block Designs 532

13.4.3 Latin Square Design 534

13.4.4 Graeco Latin Square Design 538

13.5 Factorial Designs 542

13.5.1 Two Factorial Experiment 543

13.5.2 Three Factorial Experiment 548

13.5.3 Blocking in Factorial Experiments 554

13.6 Bibliography 556

13.7 Complements, Problems, and Programs 556

14 Multivariate Statistical Analysis - I 558

14.1 Introduction 558

14.2 Graphical Plots for Multivariate Data 559

14.3 Definitions, Notations, and Summary Statistics for Multivariate Data 562

14.3.1 Definitions and Data Visualization 562

14.3.2 Early Outlier Detection 568

14.4 Testing for Mean Vectors : One Sample 570

14.4.1 Testing for Mean Vector with Known Variance-Covariance Matrix 571

14.4.2 Testing for Mean Vectors with Unknown Variance-covariance Matrix 572

14.5 Testing for Mean Vectors : Two-Samples 574

14.6 Multivariate Analysis of Variance 577

14.6.1 Wilks Test Statistic 578

14.6.2 Roy’s Test 580

14.6.3 Pillai’s Test Statistic 581

14.6.4 The Lawley–Hotelling Test Statistic 581

14.7 Testing for Variance-Covariance Matrix: One Sample 583

14.7.1 Testing for Sphericity 584

14.8 Testing for Variance-Covariance Matrix: k-Samples 586

14.9 Testing for Independence of Sub-vectors 589

14.10Bibliography 592

14.11Complements, Problems, and Programs 592

15 Multivariate Statistical Analysis - II 594

15.1 Introduction 594

15.2 Classification and Discriminant Analysis 594

15.2.1 Discrimination Analysis 595

15.2.2 Classification 596

15.3 Canonical Correlations 598

15.4 Principal Component Analysis - Theory and Illustration 601

15.4.1 The Theory 602

15.4.2 Illustration Through a Data Set 604

15.5 Applications of Principal Component Analysis 608

15.5.1 PCA for Linear Regression 608

15.5.2 Biplots 611

15.6 Factor Analysis 615

15.6.1 The Orthogonal Factor Analysis Model 616

15.6.2 Estimation of Loadings and Communalities 618

15.7 Bibliography 624

15.7.1 The Classics and Applied Perspectives 624

15.7.2 Multivariate Analysis and Software 625

15.8 Complements, Problems, and Programs 626

16 Categorical Data Analysis 627

16.1 Introduction 627

16.2 Graphical Methods for CDA 628

16.2.1 Bar and Stacked Bar Plots 628

16.2.2 Spine Plots 632

16.2.3 Mosaic Plots 634

16.2.4 Pie Charts and Dot Charts 636

16.2.5 Four Fold Plots 639

16.3 The Odds Ratio 640

16.4 The Simpson’s Paradox 644

16.5 The Binomial, Multinomial, and Poisson Models 645

16.5.1 The Binomial Model 645

16.5.2 The Multinomial Model 646

16.5.3 The Poisson Model 648

16.6 The Problem of Overdispersion 649

16.7 The c2- Tests of Independence 650

16.8 Bibliography 652

16.9 Complements, Problems, and Programs 652

17 Generalized Linear Models 653

17.1 Introduction 653

17.2 Regression Problems in Count/Discrete Data 654

17.3 Exponential Family and the GLM 657

17.4 The Logistic Regression Model 658

17.5 Inference for the Logistic Regression Model 660

17.5.1 Estimation of the Regression Coefficients and Related Parameters 660

17.5.2 Estimation of the Variance-Covariance Matrix of ˆb 664

17.5.3 Confidence Intervals and Hypotheses Testing for the Regression Coefficients 665

17.5.4 Residuals for the Logistic Regression Model 666

17.5.5 Deviance Test and Hosmer-Lemeshow Goodness-of-Fit Test669

17.6 Model Selection in Logistic Regression Models 671

17.7 Probit Regression 678

17.8 Poisson Regression Model 682

17.9 Bibliography 686

17.10Complements, Problems, and Programs 687

Appendix A Open Source Software - An Epilogue 689

Appendix B The Statistical Tables 693

Bibliography 694

Author Index 712

Subject Index 718

R Codes 729

购买书籍

当当网购书 京东购书 卓越购书

PDF电子书下载地址

相关书籍

搜索更多