دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 1
نویسندگان: Alan Agresti. Maria Kateri
سری:
ISBN (شابک) : 0367748452, 9780367748456
ناشر: Chapman and Hall/CRC
سال نشر: 2021
تعداد صفحات: 486
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 16 مگابایت
در صورت تبدیل فایل کتاب Foundations of Statistics for Data Scientists: With R and Python (Chapman & Hall/CRC Texts in Statistical Science) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب مبانی آمار برای دانشمندان داده: با R و پایتون (متن های چپمن و هال/CRC در علوم آماری) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
طراحی شده به عنوان یک کتاب درسی برای معرفی یک یا دو ترم آمار ریاضی برای دانش آموزانی که برای تبدیل شدن به دانشمند داده آموزش می بینند، مبانی آمار برای دانشمندان داده: با R و پایتون یک کتاب است. - ارائه عمیق موضوعاتی در علم آمار که هر دانشمند داده باید با آن آشنا باشد، از جمله توزیع احتمال، روش های آماری توصیفی و استنباطی و مدل سازی خطی. این کتاب دانش حساب پایه را فرض میکند، بنابراین ارائه میتواند روی «چرا کار میکند» و همچنین «چگونگی انجام آن» تمرکز کند. با این حال، در مقایسه با کتابهای درسی سنتی \"آمار ریاضی\"، این کتاب تأکید کمتری بر نظریه احتمال دارد و تأکید بیشتری بر استفاده از نرمافزار برای پیادهسازی روشهای آماری و انجام شبیهسازی برای نشان دادن مفاهیم کلیدی دارد. تمام تحلیلهای آماری کتاب از نرمافزار R استفاده میکنند، همراه با پیوستی که تحلیلهای مشابه با پایتون را نشان میدهد.
این کتاب همچنین موضوعات مدرنی را معرفی میکند که معمولاً در متون آمار ریاضی ظاهر نمیشوند، اما برای دانشمندان داده بسیار مرتبط هستند. مانند استنتاج بیزی، مدل های خطی تعمیم یافته برای پاسخ های غیر عادی (مانند رگرسیون لجستیک و مدل های لاگ خطی پواسون)، و برازش مدل منظم. تقریباً 500 تمرین در «تجزیه و تحلیل دادهها و کاربردها» و «روشها و مفاهیم» گروهبندی میشوند. پیوستها R و Python را معرفی میکنند و شامل راهحلهایی برای تمرینهای با اعداد فرد هستند. وب سایت کتاب ضمائم R، Python و Matlab و تمام مجموعه داده ها از مثال ها و تمرین ها را گسترش داده است.
آلن آگرستی، استاد برجسته ممتاز دانشگاه فلوریدا، نویسنده هفت کتاب از جمله
i>تجزیه و تحلیل داده های طبقه بندی شده (وایلی) و آمار: هنر و علم یادگیری از داده ها (پیرسون)، و دوره های کوتاهی را در 35 کشور ارائه کرده است. جوایز او شامل دکترای افتخاری از دانشگاه دی مونتفورت (بریتانیا) و آماردان سال از انجمن آمار آمریکا (فصل شیکاگو) است. ماریا کاتری، استاد آمار و علوم داده در دانشگاه RWTH آخن، مونوگراف تحلیل جدول اقتضایی: روشها و پیادهسازی با استفاده از R (Birkhäuser/Springer) و یک کتاب درسی در مورد ریاضیات برای اقتصاددانان (به آلمانی). او تجربه طولانی مدت در تدریس دروس آمار به دانشجویان علوم داده، ریاضیات، آمار، علوم کامپیوتر و مدیریت بازرگانی و مهندسی دارد.
\"هدف اصلی این کتاب درسی ارائه آمار پایه است. روشها و نظریههایی که در زمینه علم داده مرتبط هستند. نویسندگان با تأکید بیشتر بر ارائه تفسیرهای بصری و عملی از آن روشها با کمک کدهای برنامهنویسی R، از رویکردهای معمولی که بسیاری از کتابهای درسی آمار ریاضی مرسوم اتخاذ میکنند، فاصله میگیرند. ...من قدرت خاص آن را ارائه شهودی نظریه و روش های آماری بدون گرفتار شدن در جزئیات ریاضی می دانم که شاید کمتر برای تمرین کنندگان مفید باشد.» (مینتاک لی، دانشگاه ایالتی بویز)
\"جنبه های این دست نوشته که به نظر من جذاب است: 1. استفاده از داده های واقعی. 2. استفاده از R اما با گزینه ای برای استفاده از Python. 3. ترکیب خوبی از تئوری و عمل. 4. متن به خوبی با تمرین های خوب نوشته شده است. 5. پوشش موضوعاتی (مانند روشهای بیزی و خوشهبندی) که معمولاً بخشی از یک دوره آمار در سطح این کتاب نیستند.» (جیسون ام. گراهام، دانشگاه اسکرانتون)
Designed as a textbook for a one or two-term introduction to mathematical statistics for students training to become data scientists, Foundations of Statistics for Data Scientists: With R and Python is an in-depth presentation of the topics in statistical science with which any data scientist should be familiar, including probability distributions, descriptive and inferential statistical methods, and linear modelling. The book assumes knowledge of basic calculus, so the presentation can focus on 'why it works' as well as 'how to do it.' Compared to traditional "mathematical statistics" textbooks, however, the book has less emphasis on probability theory and more emphasis on using software to implement statistical methods and to conduct simulations to illustrate key concepts. All statistical analyses in the book use R software, with an appendix showing the same analyses with Python.
The book also introduces modern topics that do not normally appear in mathematical statistics texts but are highly relevant for data scientists, such as Bayesian inference, generalized linear models for non-normal responses (e.g., logistic regression and Poisson loglinear models), and regularized model fitting. The nearly 500 exercises are grouped into "Data Analysis and Applications" and "Methods and Concepts." Appendices introduce R and Python and contain solutions for odd-numbered exercises. The book's website has expanded R, Python, and Matlab appendices and all data sets from the examples and exercises.
Alan Agresti, Distinguished Professor Emeritus at the University of Florida, is the author of seven books, including Categorical Data Analysis (Wiley) and Statistics: The Art and Science of Learning from Data (Pearson), and has presented short courses in 35 countries. His awards include an honorary doctorate from De Montfort University (UK) and the Statistician of the Year from the American Statistical Association (Chicago chapter). Maria Kateri, Professor of Statistics and Data Science at the RWTH Aachen University, authored the monograph Contingency Table Analysis: Methods and Implementation Using R (Birkhäuser/Springer) and a textbook on mathematics for economists (in German). She has a long-term experience in teaching statistics courses to students of Data Science, Mathematics, Statistics, Computer Science, and Business Administration and Engineering.
"The main goal of this textbook is to present foundational statistical methods and theory that are relevant in the field of data science. The authors depart from the typical approaches taken by many conventional mathematical statistics textbooks by placing more emphasis on providing the students with intuitive and practical interpretations of those methods with the aid of R programming codes...I find its particular strength to be its intuitive presentation of statistical theory and methods without getting bogged down in mathematical details that are perhaps less useful to the practitioners" (Mintaek Lee, Boise State University)
"The aspects of this manuscript that I find appealing: 1. The use of real data. 2. The use of R but with the option to use Python. 3. A good mix of theory and practice. 4. The text is well-written with good exercises. 5. The coverage of topics (e.g. Bayesian methods and clustering) that are not usually part of a course in statistics at the level of this book." (Jason M. Graham, University of Scranton)
Cover Half Title Series Page Title Page Copyright Page Contents Preface 1. Introduction to Statistical Science 1.1. Statistical Science: Description and Inference 1.1.1. Design, Descriptive Statistics, and Inferential Statistics 1.1.2. Populations and Samples 1.1.3. Parameters: Numerical Summaries of the Population 1.1.4. Defining Populations: Actual and Conceptual 1.2. Types of Data and Variables 1.2.1. Data Files 1.2.2. Example: The General Social Survey (GSS) 1.2.3. Variables 1.2.4. Quantitative Variables and Categorical Variables 1.2.5. Discrete Variables and Continuous Variables 1.2.6. Associations: Response Variables and Explanatory Variables 1.3. Data Collection and Randomization 1.3.1. Randomization 1.3.2. Collecting Data with a Sample Survey 1.3.3. Collecting Data with an Experiment 1.3.4. Collecting Data with an Observational Study 1.3.5. Establishing Cause and Effect: Observational versus Experimental Studies 1.4. Descriptive Statistics: Summarizing Data 1.4.1. Example: Carbon Dioxide Emissions in European Nations 1.4.2. Frequency Distribution and Histogram Graphic 1.4.3. Describing the Center of the Data: Mean and Median 1.4.4. Describing Data Variability: Standard Deviation and Variance 1.4.5. Describing Position: Percentiles, Quartiles, and Box Plots 1.5. Descriptive Statistics: Summarizing Multivariate Data 1.5.1. Bivariate Quantitative Data: The Scatterplot, Correlation, and Regression 1.5.2. Bivariate Categorical Data: Contingency Tables 1.5.3. Descriptive Statistics for Samples and for Populations 1.6. Chapter Summary 2. Probability Distributions 2.1. Introduction to Probability 2.1.1. Probabilities and Long-Run Relative Frequencies 2.1.2. Sample Spaces and Events 2.1.3. Probability Axioms and Implied Probability Rules 2.1.4. Example: Diagnostics for Disease Screening 2.1.5. Bayes’ Theorem 2.1.6. Multiplicative Law of Probability and Independent Events 2.2. Random Variables and Probability Distributions 2.2.1. Probability Distributions for Discrete Random Variables 2.2.2. Example: Geometric Probability Distribution 2.2.3. Probability Distributions for Continuous Random Variables 2.2.4. Example: Uniform Distribution 2.2.5. Probability Functions (pdf, pmf) and Cumulative Distribution Function (cdf) 2.2.6. Example: Exponential Random Variable 2.2.7. Families of Probability Distributions Indexed by Parameters 2.3. Expectations of Random Variables 2.3.1. Expected Value and Variability of a Discrete Random Variable 2.3.2. Expected Values for Continuous Random Variables 2.3.3. Example: Mean and Variability for Uniform Random Variable 2.3.4. Higher Moments: Skewness 2.3.5. Expectations of Linear Functions of Random Variables 2.3.6. Standardizing a Random Variable 2.4. Discrete Probability Distributions 2.4.1. Binomial Distribution 2.4.2. Example: Hispanic Composition of Jury List 2.4.3. Mean, Variability, and Skewness of Binomial Distribution 2.4.4. Example: Predicting Results of a Sample Survey 2.4.5. The Sample Proportion as a Scaled Binomial Random Variable 2.4.6. Poisson Distribution 2.4.7. Poisson Variability and Overdispersion 2.5. Continuous Probability Distributions 2.5.1. The Normal Distribution 2.5.2. The Standard Normal Distribution 2.5.3. Examples: Finding Normal Probabilities and Percentiles 2.5.4. The Gamma Distribution 2.5.5. The Exponential Distribution and Poisson Processes 2.5.6. Quantiles of a Probability Distribution 2.5.7. Using the Uniform to Randomly Generate a Continuous Random Variable 2.6. Joint and Conditional Distributions and Independence 2.6.1. Joint and Marginal Probability Distributions 2.6.2. Example: Joint and Marginal Distributions of Happiness and Family Income 2.6.3. Conditional Probability Distributions 2.6.4. Trials with Multiple Categories: The Multinomial Distribution 2.6.5. Expectations of Sums of Random Variables 2.6.6. Independence of Random Variables 2.6.7. Markov Chain Dependence and Conditional Independence 2.7. Correlation between Random Variables 2.7.1. Covariance and Correlation 2.7.2. Example: Correlation between Income and Happiness 2.7.3. Independence Implies Zero Correlation, but Not Converse 2.7.4. Bivariate Normal Distribution * 2.8. Chapter Summary 3. Sampling Distributions 3.1. Sampling Distributions: Probability Distributions for Statistics 3.1.1. Example: Predicting an Election Result from an Exit Poll 3.1.2. Sampling Distribution: Variability of a Statistic’s Value among Samples 3.1.3. Constructing a Sampling Distribution 3.1.4. Example: Simulating to Estimate Mean Restaurant Sales 3.2. Sampling Distributions of Sample Means 3.2.1. Mean and Variance of Sample Mean of Random Variables 3.2.2. Standard Error of a Statistic 3.2.3. Example: Standard Error of Sample Mean Sales 3.2.4. Example: Standard Error of Sample Proportion in Exit Poll 3.2.5. Law of Large Numbers: Sample Mean Converges to Population Mean 3.2.6. Normal, Binomial, and Poisson Sums of Random Variables Have the Same Distribution 3.3. Central Limit Theorem: Normal Sampling Distribution for Large Samples 3.3.1. Sampling Distribution of Sample Mean Is Approximately Normal 3.3.2. Simulations Illustrate Normal Sampling Distribution in CLT 3.3.3. Summary: Population, Sample Data, and Sampling Distributions 3.4. Large-Sample Normal Sampling Distributions for Many Statistics 3.4.1. The Delta Method 3.4.2. Delta Method Applied to Root Poisson Stabilizes the Variance 3.4.3. Simulating Sampling Distributions of Other Statistics 3.4.4. The Key Role of Sampling Distributions in Statistical Inference 3.5. Chapter Summary 4. Statistical Inference: Estimation 4.1. Point Estimates and Confidence Intervals 4.1.1. Properties of Estimators: Unbiasedness, Consistency, Efficiency 4.1.2. Evaluating Properties of Estimators 4.1.3. Interval Estimation: Confidence Intervals for Parameters 4.2. The Likelihood Function and Maximum Likelihood Estimation 4.2.1. The Likelihood Function 4.2.2. Maximum Likelihood Method of Estimation 4.2.3. Properties of Maximum Likelihood (ML) Estimators 4.2.4. Example: Variance of ML Estimator of Binomial Parameter 4.2.5. Example: Variance of ML Estimator of Poisson Mean 4.2.6. Sufficiency and Invariance for ML Estimates 4.3. Constructing Confidence Intervals 4.3.1. Using a Pivotal Quantity to Induce a Confidence Interval 4.3.2. A Large-Sample Confidence Interval for the Mean 4.3.3. Confidence Intervals for Proportions 4.3.4. Example: Atheists and Agnostics in Europe 4.3.5. Using Simulation to Illustrate Long-Run Performance of CIs 4.3.6. Determining the Sample Size before Collecting the Data 4.3.7. Example: Sample Size for Evaluating an Advertising Strategy 4.4. Confidence Intervals for Means of Normal Populations 4.4.1. The t Distribution 4.4.2. Confidence Interval for a Mean Using the t Distribution 4.4.3. Example: Estimating Mean Weight Change for Anorexic Girls 4.4.4. Robustness for Violations of Normal Population Assumption 4.4.5. Construction of t Distribution Using Chi-Squared and Standard Normal 4.4.6. Why Does the Pivotal Quantity Have the t Distribution? 4.4.7. Cauchy Distribution: t Distribution with df = 1 Has Unusual Behavior 4.5. Comparing Two Population Means or Proportions 4.5.1. A Model for Comparing Means: Normality with Common Variability 4.5.2. A Standard Error and Confidence Interval for Comparing Means 4.5.3. Example: Comparing a Therapy to a Control Group 4.5.4. Confidence Interval Comparing Two Proportions 4.5.5. Example: Does Prayer Help Coronary Surgery Patients? 4.6. The Bootstrap 4.6.1. Computational Resampling and Bootstrap Confidence Intervals 4.6.2. Example: Booststrap Confidence Intervals for Library Data 4.7. The Bayesian Approach to Statistical Inference 4.7.1. Bayesian Prior and Posterior Distributions 4.7.2. Bayesian Binomial Inference: Beta Prior Distributions 4.7.3. Example: Belief in Hell 4.7.4. Interpretation: Bayesian versus Classical Intervals 4.7.5. Bayesian Posterior Interval Comparing Proportions 4.7.6. Highest Posterior Density (HPD) Posterior Intervals 4.8. Bayesian Inference for Means 4.8.1. Bayesian Inference for a Normal Mean 4.8.2. Example: Bayesian Analysis for Anorexia Therapy 4.8.3. Bayesian Inference for Normal Means with Improper Priors 4.8.4. Predicting a Future Observation: Bayesian Predictive Distribution 4.8.5. The Bayesian Perspective, and Empirical Bayes and Hierarchical Bayes Extensions 4.9. Why Maximum Likelihood and Bayes Estimators Perform Well * 4.9.1. ML Estimators Have Large-Sample Normal Distributions 4.9.2. Asymptotic Efficiency of ML Estimators Same as Best Unbiased Estimators 4.9.3. Bayesian Estimators Also Have Good Large-Sample Performance 4.9.4. The Likelihood Principle 4.10. Chapter Summary 5. Statistical Inference: Significance Testing 5.1. The Elements of a Significance Test 5.1.1. Example: Testing for Bias in Selecting Managers 5.1.2. Assumptions, Hypotheses, Test Statistic, P-Value, and Conclusion 5.2. Significance Tests for Proportions and Means 5.2.1. The Elements of a Significance Test for a Proportion 5.2.2. Example: Climate Change a Major Threat? 5.2.3. One-Sided Significance Tests 5.2.4. The Elements of a Significance Test for a Mean 5.2.5. Example: Significance Test about Political Ideology 5.3. Significance Tests Comparing Means 5.3.1. Significance Tests for the Difference between Two Means 5.3.2. Example: Comparing a Therapy to a Control Group 5.3.3. Effect Size for Comparison of Two Means 5.3.4. Bayesian Inference for Comparing Two Means 5.3.5. Example: Bayesian Comparison of Therapy and Control Groups 5.4. Significance Tests Comparing Proportions 5.4.1. Significance Test for the Difference between Two Proportions 5.4.2. Example: Comparing Prayer and Non-Prayer Surgery Patients 5.4.3. Bayesian Inference for Comparing Two Proportions 5.4.4. Chi-Squared Tests for Multiple Proportions in Contingency Tables 5.4.5. Example: Happiness and Marital Status 5.4.6. Standardized Residuals: Describing the Nature of an Association 5.5. Significance Test Decisions and Errors 5.5.1. The α-level: Making a Decision Based on the P-Value 5.5.2. Never “Accept H0” in a Significance Test 5.5.3. Type I and Type II Errors 5.5.4. As P(Type I Error) Decreases, P(Type II Error) Increases 5.5.5. Example: Testing Whether Astrology Has Some Truth 5.5.6. The Power of a Test 5.5.7. Making Decisions versus Reporting the P-Value 5.6. Duality between Significance Tests and Confidence Intervals 5.6.1. Connection between Two-Sided Tests and Confidence Intervals 5.6.2. Effect of Sample Size: Statistical versus Practical Significance 5.6.3. Significance Tests Are Less Useful than Confidence Intervals 5.6.4. Significance Tests and P-Values Can Be Misleading 5.7. Likelihood-Ratio Tests and Confidence Intervals * 5.7.1. The Likelihood-Ratio and a Chi-Squared Test Statistic 5.7.2. Likelihood-Ratio Test and Confidence Interval for a Proportion 5.7.3. Likelihood-Ratio, Wald, Score Test Triad 5.8. Nonparametric Tests * 5.8.1. A Permutation Test to Compare Two Groups 5.8.2. Example: Petting versus Praise of Dogs 5.8.3. Wilcoxon Test: Comparing Mean Ranks for Two Groups 5.8.4. Comparing Survival Time Distributions with Censored Data 5.9. Chapter Summary 6. Linear Models and Least Squares 6.1. The Linear Regression Model and Its Least Squares Fit 6.1.1. The Linear Model Describes a Conditional Expectation 6.1.2. Describing Variation around the Conditional Expectation 6.1.3. Least Squares Model Fitting 6.1.4. Example: Linear Model for Scottish Hill Races 6.1.5. The Correlation 6.1.6. Regression toward the Mean in Linear Regression Models 6.1.7. Linear Models and Reality 6.2. Multiple Regression: Linear Models with Multiple Explanatory 6.2.1. Interpreting Effects in Multiple Regression Models 6.2.2. Example: Multiple Regression for Scottish Hill Races 6.2.3. Association and Causation 6.2.4. Confounding, Spuriousness, and Conditional Independence 6.2.5. Example: Modeling the Crime Rate in Florida 6.2.6. Equations for Least Squares Estimates in Multiple Regression 6.2.7. Interaction between Explanatory Variables in Their Effects 6.2.8. Cook’s Distance: Detecting Unusual and Influential Observations 6.3. Summarizing Variability in Linear Regression Models 6.3.1. The Error Variance and Chi-Squared for Linear Models 6.3.2. Decomposing Variability into Model Explained and Unexplained Parts 6.3.3. R-Squared and the Multiple Correlation 6.3.4. Example: R-Squared for Modeling Scottish Hill Races 6.4. Statistical Inference for Normal Linear Models 6.4.1. The F Distribution: Testing That All Effects Equal 0 6.4.2. Example: Normal Linear Model for Mental Impairment 6.4.3. t Tests and Confidence Intervals for Individual Effects 6.4.4. Multicollinearity: Nearly Redundant Explanatory Variables 6.4.5. Confidence Interval for E(Y ) and Prediction Interval for Y 6.4.6. The F Test That All Effects Equal.0. is a Likelihood-Ratio Test * 6.5. Categorical Explanatory Variables in Linear Models 6.5.1. Indicator Variables for Categories 6.5.2. Example: Comparing Mean Incomes of Racial-Ethnic Groups 6.5.3. Analysis of Variance (ANOVA): An F Test Comparing Several Means 6.5.4. Multiple Comparisons of Means: Bonferroni and Tukey Methods 6.5.5. Models with Both Categorical and Quantitative Explanatory Variables 6.5.6. Comparing Two Nested Normal Linear Models 6.5.7. Interaction with Categorical and Quantitative Explanatory Variables 6.6. Bayesian Inference for Normal Linear Models 6.6.1. Prior and Posterior Distributions for Normal Linear Models 6.6.2. Example: Bayesian Linear Model for Mental Impairment 6.6.3. Bayesian Approach to the Normal One-Way Layout 6.7. Matrix Formulation of Linear Models * 6.7.1. The Model Matrix 6.7.2. Least Squares Estimates and Standard Errors 6.7.3. The Hat Matrix and the Leverage 6.7.4. Alternatives to Least Squares: Robust Regression and Regularization 6.7.5. Restricted Optimality of Least Squares: Gauss–Markov Theorem 6.7.6. Matrix Formulation of Bayesian Normal Linear Model 6.8. Chapter Summary 7. Generalized Linear Models 7.1. Introduction to Generalized Linear Models 7.1.1. The Three Components of a Generalized Linear Model 7.1.2. GLMs for Normal, Binomial, and Poisson Responses 7.1.3. Example: GLMs for House Selling Prices 7.1.4. The Deviance 7.1.5. Likelihood-Ratio Model Comparison Uses Deviance Difference 7.1.6. Model Selection: AIC and the Bias/Variance Tradeoff 7.1.7. Advantages of GLMs versus Transforming the Data 7.1.8. Example: Normal and Gamma GLMs for Covid-19 Data 7.2. Logistic Regression Model for Binary Data 7.2.1. Logistic Regression: Model Expressions 7.2.2. Parameter Interpretation: Effects on Probabilities and Odds 7.2.3. Example: Dose–Response Study for Flour Beetles 7.2.4. Grouped and Ungrouped Binary Data: Effects on Estimates and Deviance 7.2.5. Example: Modeling Italian Employment with Logit and Identity Links 7.2.6. Complete Separation and Infinite Logistic Parameter Estimates 7.3. Bayesian Inference for Generalized Linear Models 7.3.1. Normal Prior Distributions for GLM Parameters 7.3.2. Example: Logistic Regression for Endometrial Cancer Patients 7.4. Poisson Loglinear Models for Count Data 7.4.1. Poisson Loglinear Models 7.4.2. Example: Modeling Horseshoe Crab Satellite Counts 7.4.3. Modeling Rates: Including an Offset in the Model 7.4.4. Example: Lung Cancer Survival 7.5. Negative Binomial Models for Overdispersed Count Data * 7.5.1. Increased Variance Due to Heterogeneity 7.5.2. Negative Binomial: Gamma Mixture of Poisson Distributions 7.5.3. Example: Negative Binomial Modeling of Horseshoe Crab Data 7.6. Iterative GLM Model Fitting * 7.6.1. The Newton–Raphson Method 7.6.2. Newton–Raphson Fitting of Logistic Regression Model 7.6.3. Covariance Matrix of Parameter Estimators and Fisher Scoring 7.6.4. Likelihood Equations and Covariance Matrix for Poisson GLMs 7.7. Regularization with Large Numbers of Parameters * 7.7.1. Penalized Likelihood Methods 7.7.2. Penalized Likelihood Methods: The Lasso 7.7.3. Example: Predicting Opinions with Student Survey Data 7.7.4. Why Shrink ML Estimates toward 0? 7.7.5. Dimension Reduction: Principal Component Analysis 7.7.6. Bayesian Inference with a Large Number of Parameters 7.7.7. Huge n: Handling Big Data 7.8. Chapter Summary 8. Classification and Clustering 8.1. Classification: Linear Discriminant Analysis and Graphical Trees 8.1.1. Classification with Fisher’s Linear Discriminant Function 8.1.2. Example: Predicting Whether Horseshoe Crabs Have Satellites 8.1.3. Summarizing Predictive Power: Classification Tables and ROC Curves 8.1.4. Classification Trees: Graphical Prediction 8.1.5. Logistic Regression versus Linear Discriminant Analysis and Classification Trees 8.1.6. Other Methods for Classification: k-Nearest Neighbors and Neural Networks * 8.2. Cluster Analysis 8.2.1. Measuring Dissimilarity between Observations on Binary Responses 8.2.2. Hierarchical Clustering Algorithm and Its Dendrogram 8.2.3. Example: Clustering States on Presidential Election Outcomes 8.3. Chapter Summary 9. Statistical Science: A Historical Overview 9.1. The Evolution of Statistical Science * 9.1.1. Evolution of Probability 9.1.2. Evolution of Descriptive and Inferential Statistics 9.2. Pillars of Statistical Wisdom and Practice 9.2.1. Stigler’s Seven Pillars of Statistical Wisdom 9.2.2. Seven Pillars of Wisdom for Practicing Data Science Appendix A: Using R in Statistical Science A.0. Basics of R A.0.1. Starting a Session, Entering Commands, and Quitting A.0.2. Installing and Loading R Packages A.0.3. R Functions and Data Structures A.0.4. Data Input in R A.0.5. R Control Flows A.1. Chapter 1: R for Descriptive Statistics A.1.1. Data Handling and Wrangling A.1.2. Histograms and Other Graphics A.1.3. Descriptive Statistics A.1.4. Missing Values in Data Files A.1.5. Summarizing Bivariate Quantitative Data A.1.6. Summarizing Bivariate Categorical Data A.2. Chapter 2: R for Probability Distributions A.2.1. R Functions for Probability Distributions A.2.2. Quantiles, Q-Q Plots, and the Normal Quantile Plot A.2.3. Joint and Conditional Probability Distributions A.3. Chapter 3: R for Sampling Distributions A.3.1. Simulating the Sampling Distribution of a Statistic A.3.2. Monte Carlo Simulation A.4. Chapter 4: R for Estimation A.4.1. Confidence Intervals for Proportions A.4.2. Confidence Intervals for Means of Subgroups and Paired Differences A.4.3. The t and Other Probability Distributions for Statistical Inference A.4.4. Empirical Cumulative Distribution Function A.4.5. Nonparametric and Parametric Bootstraps A.4.6. Bayesian HPD Intervals Comparing Proportions A.5. Chapter 5: R for Significance Testing A.5.1. Bayes Factors and a Bayesian t Test A.5.2. Simulating the Exact Distribution of the Likelihood-Ratio Statistic A.5.3. Nonparametric Statistics: Permutation Test and Wilcoxon Test A.6. Chapter 6: R for Linear Models A.6.1. Linear Models with the lm Function A.6.2. Diagnostic Plots for Linear Models A.6.3. Plots for Regression Bands and Posterior Distributions A.7. Chapter 7: R for Generalized Linear Models A.7.1. The glm Function A.7.2. Plotting a Logistic Regression Model Fit A.7.3. Model Selection for GLMs A.7.4. Correlated Responses: Marginal, Random Effects, and Transitional Models A.7.5. Modeling Time Series A.8. Chapter 8: R for Classification and Clustering A.8.1. Visualization of Linear Discriminant Analysis Results A.8.2. Cross-Validation and Model Training A.8.3. Classification and Regression Trees A.8.4. Cluster Analysis with Quantitative Variables Appendix B: Using Python in Statistical Science B.0. Basics of Python B.0.1. Python Preliminaries B.0.2. Data Structures and Data Input B.1. Chapter 1: PYTHON for Descriptive Statistics B.1.1. Random Number Generation B.1.2. Summary Statistics and Graphs for Quantitative Variables B.1.3. Descriptive Statistics for Bivariate Quantitative Data B.1.4. Descriptive Statistics for Bivariate Categorical Data B.1.5. Simulating Samples from a Bell-Shaped Population B.2. Chapter 2: PYTHON for Probability Distributions B.2.1. Simulating a Probability as a Long-Run Relative Frequency B.2.2. Python Functions for Discrete Probability Distributions B.2.3. Python Functions for Continuous Probability Distributions B.2.4. Expectations of Random Variables B.3. Chapter 3: PYTHON for Sampling Distributions B.3.1. Simulation to Illustrate a Sampling Distribution B.3.2. Law of Large Numbers B.4. Chapter 4: PYTHON for Estimation B.4.1. Confidence Intervals for Proportions B.4.2. The t Distribution B.4.3. Confidence Intervals for Means B.4.4. Confidence Intervals Comparing Means and Comparing Proportions B.4.5. Bootstrap Confidence Intervals B.4.6. Bayesian Posterior Intervals for Proportions and Means B.5. Chapter 5: PYTHON for Significance Testing B.5.1. Significance Tests for Proportions B.5.2. Chi-Squared Tests Comparing Multiple Proportions in Contingency Tables B.5.3. Significance Tests for Means B.5.4. Significance Tests Comparing Means B.5.5. The Power of a Significance Test B.5.6. Nonparametric Statistics: Permutation Test and Wilcoxon Test B.5.7. Kaplan-Meier Estimation of Survival Functions B.6. Chapter 6: PYTHON for Linear Models B.6.1. Fitting Linear Models B.6.2. The Correlation and R-Squared B.6.3. Diagnostics: Residuals and Cook’s Distances for Linear Models B.6.4. Statistical Inference and Prediction for Linear Models B.6.5. Categorical Explanatory Variables in Linear Models B.6.6. Bayesian Fitting of Linear Models B.7. Chapter 7: PYTHON for Generalized Linear Models B.7.1. GLMs with Identity Link B.7.2. Logistic Regression: Logit Link with Binary Data B.7.3. Separation and Bayesian Fitting in Logistic Regression B.7.4. Poisson Loglinear Model for Counts B.7.5. Negative Binomial Modeling of Count Data B.7.6. Regularization: Penalized Logistic Regression Using the Lasso B.8. Chapter 8: PYTHON for Classification and Clustering B.8.1. Linear Discriminant Analysis B.8.2. Classification Trees and Neural Networks for Prediction B.8.3. Cluster Analysis Appendix C: Brief Solutions to Exercises C.1. Chapter 1: Solutions to Exercises C.2. Chapter 2: Solutions to Exercises C.3. Chapter 3: Solutions to Exercises C.4. Chapter 4: Solutions to Exercises C.5. Chapter 5: Solutions to Exercises C.6. Chapter 6: Solutions to Exercises C.7. Chapter 7: Solutions to Exercises C.8. Chapter 8: Solutions to Exercises Bibliography Example Index Subject Index