دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Joni Shreve, Donna Dea Holland سری: ISBN (شابک) : 1629603813, 9781629603810 ناشر: SAS Institute سال نشر: 2018 تعداد صفحات: 414 زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 10 Mb
در صورت تبدیل فایل کتاب SAS Certification Prep Guide: Statistical Business Analysis Using SAS9 به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب راهنمای آماده سازی گواهینامه SAS: تجزیه و تحلیل آماری کسب و کار با استفاده از SAS9 نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
راهنمای مطالعه ضروری برای تحلیلگر کسب و کار آماری تایید شده SAS با استفاده از SAS9: آزمون رگرسیون و مدل سازی! راهنمای آماده سازی گواهینامه SAS: تجزیه و تحلیل کسب و کار آماری با استفاده از SAS9 که برای برنامه نویسان جدید و با تجربه SAS نوشته شده است، یک راهنمای آماده سازی عمیق برای تحلیلگر کسب و کار آماری تایید شده SAS با استفاده از آزمون SAS9: رگرسیون و مدل سازی است.
Must-have study guide for the SAS Certified Statistical Business Analyst Using SAS9: Regression and Modeling exam! Written for both new and experienced SAS programmers, the SAS Certification Prep Guide: Statistical Business Analysis Using SAS9 is an in-depth prep guide for the SAS Certified Statistical Business Analyst Using SAS9: Regression and Modeling exam.
Contents About This Book What Does This Book Cover? Requirements and Details Exam Objectives Take a Practice Exam Registering for the Exam Syntax Conventions What Should You Know about the Examples? Software Used to Develop the Book's Content Example Code and Data SAS University Edition Where Are the Exercise Solutions? We Want to Hear from You Chapter 1: Statistics and Making Sense of Our World Introduction What Is Statistics? The Two Branches of Statistics Variable Types and SAS Data Types Variable Types Table 1.1 Data for the Study of Diabetes SAS Data Types The Data Analytics Process Defining the Purpose Table 1.2 Examples of Analyses by Purpose for Various Industries Data Preparation Sampling Cleaning the Data Exploring the Data Analyzing the Data and Roadmap to the Book Table 1.3 Summary of Statistical Models for Business Analysis Certification by Variable Role Conclusions and Interpretation Getting Started with SAS Diabetic Care Management Case Ames Housing Case Table 1.4 List of Data Sets Used in the Book by Chapter Accessing the Data in the SAS Environment Program 1.1 PROC CONTENTS of the Diabetes Care Management Case Data Set SAS Log 1.1 PROC CONTENTS of the Diabetes Care Management Case Data Set Output 1.1 PROC CONTENTS of the Diabetes Care Management Case Data Set Key Terms Chapter 2: Summarizing Your Data with Descriptive Statistics Introduction Measures of Center Mean Figure 2.1 Time to Process Online Orders (in Hours) Median Mode Table 2.1 Number of Deaths for Top Ten Causes – 2014 United States Measures of Variation Range Table 2.2 Time to Process Orders (in Hours) by Retailer Figure 2.2 Time to Process Orders (in Hours) Variance Table 2.3 Descriptive Statistics for Time to Process Orders Table 2.4 Calculations for Variance as Average Squared Deviations Standard Deviation Measures of Shape Skewness Figure 2.3 Examples of Symmetric and Asymmetric Distributions Table 2.5 Sum of Z3 Values for Calculating Skewness Kurtosis Figure 2.4 Examples of Kurtosis as Compared to the Normal Distribution Table 2.6 Sum of Z4 Values for Calculating Kurtosis Other Descriptive Measures Percentiles, the Five-Number-Summary, and the Interquartile Range (IQR) Percentiles The Five-Number-Summary and the Interquartile Range (IQR) Figure 2.5 Time to Process Online Orders (in Hours) for Retailer 2 Outliers The MEANS Procedure Procedure Syntax for PROC MEANS Program 2.1 PROC MEANS of Process Time and Amount Spent for Retailer 1 Output 2.1 PROC MEANS of Process Time and Amount Spent for Retailer 1 Customizing Output with the VAR statement and Statistics Keywords Program 2.2 PROC MEANS with Additional Descriptive Statistics of Process Time for Retailer 1 Output 2.2 PROC MEANS with Additional Descriptive Statistics of Process Time for Retailer 1 Key Words for Generating Desired Statistics Table 2.7 Keywords for Requesting Statistics in the MEANS Procedure Comparing Groups Using the CLASS Statement or the BY Statement PROC MEANS Using the CLASS Statement Program 2.3 PROC MEANS of Process Time for Retailers 1 and 2 Using the CLASS Statement Output 2.3 PROC MEANS of Process Time for Retailers 1 and 2 Using the CLASS Statement PROC MEANS Using the BY Statement Program 2.4 PROC MEANS of Process Time for Retailers 1 and 2 Using the BY Statement Output 2.4 PROC MEANS of Process Time for Retailers 1 and 2 Using the BY Statement Program 2.5 Analysis of Process Time for Retailers 1 and 2 Using BY DESCENDING Output 2.5 Analysis of Process Time For Retailers 1 and 2 Using BY DESCENDING Multiple Classes and Customizing Output Using the WAYS and TYPES Statements Using Multiple Classes in the CLASS Statement Program 2.6 Three-Way Analysis of Ketones by Diabetes Status, Renal Disease, and Gender Output 2.6 Three-Way Analysis of Ketones by Diabetes Status, Renal Disease, and Gender The WAYS Statement for Multiple Classes Program 2.7 Two-Way Analysis of Ketones by Diabetes Status, Renal Disease, and Gender Output 2.7 Two-Way Analysis of Ketones by Diabetes Status, Renal Disease, and Gender The TYPES Statement for Multiple Classes Program 2.8 One- and Two-Way Analyses of Ketones by Diabetes Status, Renal Disease, and Gender Output 2.8 One- and Two-Way Analyses of Ketones by Diabetes Status, Renal Disease, and Gender Saving Your Results Using the OUTPUT Statement Program 2.9 Ketones for the Diabetic Care Management Case Output 2.9 Ketones for the Diabetic Care Management Case The CLASS Statement and the _TYPE_ and _FREQ_ Variables Program 2.10 Ketones by the Class Controlled_Diabetic Output 2.10 Ketones by the Class Controlled_Diabetic Program 2.11 Ketones by the Classes Controlled_Diabetic and Renal_Disease Output 2.11 Ketones by the Classes Controlled_Diabetic and Renal_Disease SAS Log 2.1 Ketone Analysis by Two Classes Program 2.12 Ketones by the Classes Controlled_Diabetic, Renal_Disease, and Gender Output 2.12 Ketones by the Classes Controlled_Diabetic, Renal_Disease, and Gender Table 2.8 TYPE Values and the Subgroups Produced by Three-Way Analyses SAS Log 2.2 Ketone Analysis by the Classes Controlled_Diabetic, Renal_Disease, and Gender Table 2.9 TYPE, WAYS, Subgroups, and Number of Observations for One-, Two-, and Three-Way Analyses The CLASS Statement and Filtering the Output Data Set Program 2.13 Ketone Analysis by Four Classes SAS Log 2.3 Ketone Analysis by Four Classes Output 2.13 Filter of Output File for Only One-Way Analyses (_TYPE_ = 1, 2, 4, 8) The NWAY Option and Comparisons to the WAYS and TYPES Statements Program 2.14 Three-Way Analysis of Ketones Using the NWAY Option Output 2.14 Three-Way Analysis of Ketones Using the NWAY Option Program 2.15 Alternative 1 for Three-Way Analysis of Ketones Using the NWAY Option Program 2.16 Three Class Variables Connected by the Asterisk (*) in the TYPES Statement The BY Statement and the _TYPE_ and _FREQ_ Variables Program 2.17 Ketones by Controlled_Diabetic Output 2.15 Ketones by Controlled_Diabetic Program 2.18 Ketones by Controlled_Diabetic for Two Classes Output 2.16 Ketones by Controlled_Diabetic for Two Classes Handling Missing Data with the MISSING Option Program 2.19 The MEANS Procedure of Glucose by AE_DURATION Including Missing Values Output 2.17a The MEANS Procedure of Glucose by AE_DURATION Including Missing Values Output 2.17b Glucose by AE_DURATION Including Missing Values Key Terms Chapter Quiz Chapter 3: Data Visualization Introduction View and Interpret Categorical Data Frequency and Crosstabulation Tables Using the FREQ Procedure Procedure Syntax for PROC FREQ Figure 3.1 Diabetic Care Management Case Data Program 3.1 Frequency Tables of GENDER, AGE_RANGE, and CONTROLLED_DIABETIC Output 3.1 Frequency Tables of GENDER, AGE_RANGE, and CONTROLLED_DIABETIC PLOTS Options within the TABLES Statement Program 3.2 Frequency Table and Bar Chart of GENDER Output 3.2 Frequency Table and Bar Chart of GENDER Crosstabulations for Illustrating Associations between Two Categorical Variables Program 3.3 Crosstabulation of Gender by Diabetes Status Output 3.3a Crosstabulation of Gender by Diabetes Status Output 3.3b Crosstabulation of Gender by Diabetes Status: Frequency Pots of Gender by Diabetes Status Program 3.4 Cross Tabs and Frequency Plots of Diabetes Status and Renal Disease Output 3.4 Cross Tabs and Frequency Plots of Diabetes Status and Renal Disease MISSING Option within the TABLES Statement Program 3.5 Crosstabulation of Diabetes Status and Primary Medication with Missing Obs Excluded Output 3.5 Crosstabulation of Diabetes Status and Primary Medication with Missing Obs Excluded Program 3.6 Crosstabulation of Diabetes Status and Primary Medication with Missing Obs Included Output 3.6 Crosstabulation of Diabetes Status and Primary Medication with Missing Obs Included View and Interpret Numeric Data Histograms Using the UNIVARIATE Procedure Figure 3.2 Histogram for Numeric Data Procedure Syntax for PROC UNIVARIATE Program 3.7 Univariate Statistics on BMI for 200 Diabetic Patients Output 3.7 Univariate Statistics on BMI for 200 Diabetic Patients Table 3.1 Summary Data for the Variable BMI Program 3.8 Histogram of the Variable BMI Output 3.8 Histogram of the Variable BMI Q-Q Plots Using the UNIVARIATE Procedure Table 3.2 Expected Z-Scores for Number of Texts Figure 3.3 Q-Q Plot for Number of Texts Interpreting the Q-Q Plots Program 3.9 Q-Q Plot for the Variable BMI Output 3.9 Q-Q Plot for the Variable BMI Box-and-Whisker Plot Using the UNIVARIATE Procedure Calculating Quartiles for Five-Number Summary Figure 3.4 Box Plot for Number of Texts Interpreting the Box Plot Program 3.10 Distribution and Probability Plot for BMI Output 3.10 Distribution and Probability Plot for BMI UNIVARIATE Procedures Using the INSET Statement Program 3.11 Histogram with Descriptive Statistics of BMI Output 3.11 Histogram with Descriptive Statistics of BMI UNIVARIATE Procedures Using the CLASS Statement Program 3.12 Histogram of Pounds with Descriptive Statistics by Gender Output 3.12 Histogram of Pounds with Descriptive Statistics by Gender Visual Analyses Using the SGPLOT Procedure Procedure Syntax for PROC SGPLOT Exploring Bivariate Relationships with Basic Plots, Fits, and Confidence The SCATTER and REG Statements Program 3.13 Scatter Plot of Systolic and Diastolic Blood Pressure Output 3.13 Scatter Plot of Systolic and Diastolic Blood Pressure Program 3.14 Regression Line and Confidence Limits on Bivariate Scatter Plot Output 3.14 Regression Line and Confidence Limits on Bivariate Scatter Plot Program 3.15 Scatter Plot of Price by Quantity Sold Output 3.15 Scatter Plot of Price by Quantity Sold Program 3.16 Scatter Plot of Weight and Blood Pressure by Gender Output 3.16a Scatter Plot of Weight and Blood Pressure by Gender Output 3.16b Scatter Plot of Weight by Systolic Blood Pressure by Gender Exploring Other Relationships Using SGPLOT Program 3.17 Vertical Bar Charts for Diabetes Status Output 3.17 Vertical Bar Charts for Diabetes Status Program 3.18 Bar Chart of Diabetes Status by Renal Disease Output 3.18a Bar Chart of Diabetes Status by Renal Disease Output 3.18b Numbers with Renal Diseases by Diabetes Status Program 3.19 Bar Charts for Diastolic and Systolic BP by Diabetes Status Output 3.19 Bar Charts for Diastolic and Systolic BP by Diabetes Status Key Terms Chapter Quiz Chapter 4: The Normal Distribution and Introduction to Inferential Statistics Introduction Continuous Random Variables Normal Random Variables Figure 4.1 Distributions of Adult Weights for Three Populations The Empirical Rule Figure 4.2 Visualization of the Empirical Rule Figure 4.3 Empirical Rule Applied to Height of Diabetic Males Program 4.1 Actual Percentage of Males Having Heights within 1, 2, and 3 Standard Deviations from the Mean Output 4.1 Actual Percentage of Males Having Heights within 1, 2, and 3 Standard Deviations from the Mean The Standard Normal Distribution Figure 4.4 Proportion of Z-values Less Than -1.15, P(Z < -1.15) Table 4.1 Excerpt from Standard Normal Cumulative Area (for Z ≤ 0) Figure 4.5 Proportion of Z-values Less Than 1.15, P(Z < +1.15) Table 4.2 Excerpt from Standard Normal Cumulative Area (for Z ≥ 0) Figure 4.6 Proportion of Z-Values Greater Than 1.15, P(Z > +1.15) Figure 4.7 Proportion of Z-Values between -1.00 and +1.00, P(-1.00 < Z < +1.00) Figure 4.8 Proportion of Z-Values between -1.96 and +1.96, P(-1.96 < Z < +1.96) Applying the Standard Normal Distribution to Answer Probability Questions Figure 4.9 Proportion of Americans Exceeding Recommended Daily Sugar Consumption Figure 4.10 Proportion of College Students Spending More Than 14 Hours Using Digital Devices The Sampling Distribution of the Mean Characteristics of the Sampling Distribution of the Mean Figure 4.11 Distribution of Wait-Times at a Casual-Dining Restaurant Program 4.2 Description of the Sampling Distribution of Mean Wait-Times Output 4.2 Description of the Sampling Distribution of Mean Wait-Times The Central Limit Theorem Figure 4.12 Sampling Distribution of Average Wait-Times by Sample Size Application of the Sampling Distribution of the Mean Figure 4.13 Sample Distribution of the Mean Based upon a Sample Size of 50 Figure 4.14 Probability That Z > +1.77 Effects of Sample Size on the Sampling Distribution Figure 4.15 Sampling Distribution of the Mean for Two Sample Sizes Introduction to Hypothesis Testing Defining the Null and Alternative Hypotheses Figure 4.16 Rejection Region for a Two-Tailed Test Figure 4.17 Rejection Region for a Lower-Tailed Test Figure 4.18 Rejection Region for an Upper-Tailed Test Defining and Controlling Errors in Hypothesis Testing Hypothesis Testing for the Population Mean (σ Known) Two-Tailed Tests for the Population Mean (µ) Figure 4.19 Rejection Region for a Two-Tailed Test at α = 0.05 Table 4.3 Finding Z-Value Associated with 0.025 Area in the Lower Tail Figure 4.20 Critical Values for a Two-Tailed Test at α = 0.05 One-Tailed Tests for the Population Mean (µ) Figure 4.21 Critical Value for a One-Tailed Test at α = 0.05 Figure 4.22 Test Statistic Compared to the Critical Value Table 4.4 Critical Values Based upon α-Level and One-Tailed versus Two-Tailed Tests Hypothesis Testing Using the P-Value Approach Figure 4.23 p-Value for a One-Tailed Test The P-Value for the Two-Tailed Hypothesis Test Figure 4.24 p-Value for a Two-Tailed Test Hypothesis Testing for the Population Mean (σ Unknown) One-Tailed Tests for the Population Mean (µ) Figure 4.25 The t-Distribution for Various Sample Sizes Table 4.5 Descriptive Statistics of BMI for 25 Female Diabetic Patients Table 4.6 Excerpt from the t-Table Figure 4.26 t-Test Statistic Compared to the Critical Value Procedure Syntax for PROC TTEST Program 4.3 t-Test of BMI for Female Diabetics Output 4.3 t-Test of BMI for Female Diabetics Confidence Intervals for Estimating the Population Mean Confidence Interval for the Population Mean (σ Known) Figure 4.27 Confidence Intervals as Related to the Sampling Distribution Effects of Level of Confidence and Sample Size on Confidence Intervals Confidence Interval for the Population Mean (σ Unknown) Key Terms Chapter Quiz Chapter 5: Analysis of Categorical Variables Introduction Testing the Independence of Two Categorical Variables Hypothesis Testing and the Chi-Square Test Table 5.1 Expected Frequency Count of Online Shopping by Gender Table 5.2 Observed and Expected Frequencies Count of Online Shopping by Gender Figure 5.1 Bivariate Bar Charts of Gender and Online Shopping The Chi-Square Test Using the FREQ Procedure Procedure Syntax for PROC FREQ Program 5.1 Testing Association between Bonus and Kitchen Quality Output 5.1a Testing Association between Bonus and Kitchen Quality Output 5.1b Testing Association between Bonus and Kitchen Quality: Bivariate Bar Charts of Bonus and Kitchen Quality Assumptions Measuring the Strength of Association between Two Categorical Variables Cramer’s V The Odds Ratio Table 5.3 General Form of the 2x2 Contingency Table Using Chi-Square Tests for Exploration Prior to Predictive Analytics Program 5.2 Testing Association between Bonus and Corner Lot Output 5.2a Testing Association between Bonus and Corner Lot Output 5.2b Testing Association between Bonus and Corner Lot: Bivariate Bar Charts of Bonus and Corner Lot Key Terms Chapter Quiz Chapter 6: Two-Sample t-Test Introduction Independent Samples The Pooled Variance t-Test Assumptions Procedure Syntax of PROC TTEST Procedure Program 6.1 Independent Samples t-Test for Mean Differences in Above Ground Living Area Output 6.1a Independent Samples t-Test for Ames Housing, Above Ground Living Area by Bonus Testing the Equal Variance Assumption Using the Folded F-Test Verifying the Assumptions of a Two-Sample t-Test Output 6.1b Normal Probability Plots for Above Ground Living Area by Bonus Supplemental Plots for Data Visualization Output 6.1c Histograms and Box Plots for Above Ground Living Area by Bonus Testing the Normality Assumption Using the Kolmogorov-Smirnov Test Program 6.2 Kolmogorov-Smirnov Test of Normality for Above Ground Living Area by Bonus Output 6.2 Kolmogorov-Smirnov Test of Normality for Above Ground Living Area by Bonus Satterthwaite t-Test for Unequal Variances Program 6.3 Independent Samples t-Test for Mean Differences in Total Basement Area Output 6.3 Independent Sample t-Test for Ames Housing, Total Basement Area by Bonus Summary of Steps for the t-Test of Two Independent Populations Paired Samples Assumptions The Paired-Sample t-Test Using the PAIRED Statement in the TTEST Procedure Table 6.1 Whitley County, Indiana, 2012 and 2016 Tax Assessed Property Values Sample Data Program 6.4 Kolmogorov-Smirnov Test of Normality Assumption on the Difference Score Using the UNIVARIATE Procedure Output 6.4a Kolmogorov-Smirnov Test of Normality Assumption on the Difference Score Using the UNIVARIATE Procedure Output 6.4b Paired t-Test Results for Differences in Tax Assessed Property Values Output 6.4c Accompanying Plots for the Paired-Sample t-Test Key Terms Chapter Quiz Chapter 7: Analysis of Variance (ANOVA) Introduction One-Factor Analysis of Variance The One-Factor ANOVA Model Constructing the Test Statistic: Estimating Variance among Groups and Variance within Groups Table 7.1 Deviations within and across Groups Table 7.2 Squared Deviations within and across Groups Figure 7.1 The F-Distribution Table 7.3 General Form of the Analysis of Variance Table The GLM Procedure for Investigating Mean Differences Program 7.1 Descriptive Statistics for Computer Anxiety by Academic Major Output 7.1 Exploration of Computer Anxiety by Academic Major Program 7.2 One-Way ANOVA for Testing Differences in Computer Anxiety Output 7.2 One-Way ANOVA for Testing Differences in Computer Anxiety Predicted Values and Residuals Using the OUTPUT Statement Program 7.3 Predicted Values and Residuals for Computer Anxiety Scores Output 7.3 Predicted Values and Residuals for Computer Anxiety Scores Measures of Fit The Normality Assumption and the PLOTS Option Output 7.4 Fit Diagnostics for the One-Way Analysis of Variance Levene’s Test for Equal Variances and the MEANS Statement Program 7.4 The MEANS Statement for Additional Tests of Computer Anxiety Scores Output 7.5 Levene’s Homogeneity of Variance Test for Computer Anxiety Scores Post Hoc Tests: The Tukey-Kramer Procedure and the MEANS Statement Output 7.6 Tukey-Kramer for Testing Pairwise Differences in Computer Anxiety Other Post Hoc Procedures, the LSMEANS Statement, and the Diffogram Output 7.7 LSMEANS Statement for Testing Pairwise Differences in Computer Anxiety Output 7.8 Dunnett Adjustment for Testing Pairwise Differences in Computer Anxiety Program 7.5 Complete Analysis of Difference in Computer Anxiety Scores Across Academic Majors The Randomized Block Design The ANOVA Model for the Randomized Block Design Example and Interpretation of the Randomized Block Design Program 7.6 Exploration of Computer Anxiety by Academic Major and Block Output 7.9 Exploration of Computer Anxiety by Academic Major and Block Table 7.4 The ANOVA Table for the Randomized Block Design Program 7.7 Randomized Block Design for Testing Differences in Computer Anxiety Output 7.10 Randomized Block Design for Testing Differences in Computer Anxiety Post Hoc Tests Using the LSMEANS Statement Output 7.11 LSMEANS Statement for Testing Pairwise Differences in Computer Anxiety When Blocking Assessing the Assumptions of a Randomized Block Design Using the PLOTS Option Unbalanced Designs, the LSMEANS Statement, and Type III Sums of Squares Table 7.5 Cell Means and Sample Sizes for Computer Anxiety Scores Two-Factor Analysis of Variance The Two-Factor ANOVA Model Table 7.6 General Form of the Two-Factor ANOVA Table Example and Interpretation of the Two-Factor ANOVA Program 7.8 Exploration of Computer Anxiety by Academic Major and Gender Output 7.12 Descriptive Statistics for Computer Anxiety by Academic Major and Gender Figure 7.2 Mean Computer Anxiety Scores by Academic Major and Gender Program 7.9 Two-Factor ANOVA for Testing Differences in Computer Anxiety Output 7.13a Two-Factor ANOVA for Testing Differences in Computer Anxiety Output 7.13b Least Squares Means for Major by Gender Interaction Effects Output 7.13c Diffogram of MAJOR by GENDER Means Analyzing Simple Effects When Interaction Exists Using the LSMEANS Statement with the SLICE Option Output 7.13d Analysis of Simple Effects in the Presence of Interaction Assessing the Assumptions of a Two-Factor Analysis of Variance Key Terms Chapter Quiz Chapter 8: Preparing the Input Variables for Prediction Introduction Missing Values Complete-Case Analysis Using Imputation with a Missing Value Indicator Program 8.1 Ames Housing Data with Missing Values Output 8.1 Ames Housing Data with Missing Values Program 8.2 Ames Housing with Imputed Data Output 8.2 Ames Housing with Imputed Data Categorical Input Variables Sparse Events and Quasi-Complete Separation Greenacre’s Method Using the CLUSTER Procedure Table 8.1 Contingency Table of Bonus by Neighborhood Program 8.3 Combining Neighborhoods from Ames Data Housing Using Greenacre’s Method Output 8.3a Chi-square for Bonus by Neighborhood Output 8.3b Proportion of Houses with Bonus by Neighborhood Output 8.3c Results of Cluster Analysis on Ames Neighborhoods Output 8.3d Dendrogram of Cluster Analysis Results by Neighborhoods Output 8.3e Contents of the Cluster History Output 8.3f Log P-Value Information and the Cluster History Output 8.3g Plot of Log P-Value by Number of Clusters Output 8.3h List of Neighborhoods by Cluster Table 8.2 Contingency Table of Bonus by Clustered Neighborhoods Variable Clustering The VARCLUS Procedure for Variable Reduction Table 8.3 Correlation Matrix for Variables Q1 through Q6 Procedure Syntax for PROC VARCLUS Program 8.4 The VARCLUS Procedure for Reducing Ames Housing Inputs Output 8.4a Summary Information for VARCLUS Procedure for Ames Housing Input Data Output 8.4b Cluster Summary for 2 Clusters for Ames Housing Input Data Output 8.4c Cluster Summary for 23 Clusters for Ames Housing Input Data Output 8.4d R-Squared with Own Cluster and Next Closest Cluster for Ames Housing Input Data Output 8.4e Summary of Cluster Splitting by Stage Output 8.4f Dendrogram Illustration of Cluster Splits for Ames Housing Input Data Cluster Representative and Best Variable Selection Table 8.4 Reduced Set of Inputs After Deleting Redundant Variables for Ames Housing Variable Screening The CORR Procedure for Detecting Associations Program 8.5 Description of Input Variables Screened for Relevance for Ames Housing Data Output 8.5a Summary of Input Variables Screened for Relevance for Ames Housing Data Output 8.5b ODS Output of Spearman Data Output 8.5c Spearman’s and Hoeffding’s D Correlation Data Sorted by Spearman’s Rank Output 8.5d Rank of Spearman’s Correlation by Rank of Hoeffding’s D Using the Empirical Logit to Detect Non-Linear Associations Program 8.6 Plot of Empirical Logit by Bsmt_Unf_SF Output 8.6a Value of Bsmt_Unf_SF and Bin Variables for the First Eight Houses in Ames Housing Output 8.6b Total Frequency, Number of Houses Earning a Bonus, and Average Bsmt_Unf_SF by Bin Output 8.6c Empirical Logit by the Variable Bsmt_Unf_SF Key Terms Chapter Quiz Chapter 9: Linear Regression Analysis Introduction Exploring the Relationship between Two Continuous Variables Exploring the Relationship between Two Continuous Variables Using a Scatter Plot Program 9.1 Scatter Plot of Sale Price by Above Ground Living Area Output 9.1 Scatter Plot of Sale Price and Above Ground Living Area Program 9.2 Scatter Plot of Sale Price and Age at Time of Sale Output 9.2 Scatter Plot of Sale Price and Age at Time of Sale Program 9.3 Scatter Plot of Sale Price and Square Footage Output 9.3 Scatter Plot of Sale Price and Square Footage Quantifying the Degree of Association between Two Continuous Variables Using Correlation Statistics Figure 9.1 Scatter Plot of Perfect Positive, Perfect Negative, and No Relationship Producing Correlation Coefficients Using the CORR Procedure Program 9.4 Correlation Coefficient and Descriptive Statistics for Ames Housing Output 9.4 Correlation Coefficients and Descriptive Statistics for Ames Housing Program 9.5 Correlation Coefficients with Sale Price for Ames Housing Output 9.5a Correlation Coefficients with Sale Price for Ames Housing Output 9.5b Scatter Plots for Sale Price with Potential Predictors Testing the Hypothesis for a Bivariate Linear Relationship Using the CORR Procedure Understanding Potential Misuses of the Correlation Coefficient Simple Linear Regression Fitting a Simple Linear Regression Model Using the REG Procedure Figure 9.2 Fitting the Line Closest to All Points Program 9.6 Linear Regression for Predicting Sale Price with Ground Living Area Output 9.6 Linear Regression Output for Predicting Saleprice with Ground Living Area Measures of Fit for the Linear Regression Model The Coefficient of Determination (R2) The Standard Error of the Regression (Se) Using Measures of Fit to Compare Models Table 9.1 Measures of Fit for Simple Linear Regression Hypothesis Testing for the Slope The t-Test for Slope The F-Test for Slope Table 9.2 Analysis of Variance (ANOVA) Table for Linear Regression Producing Confidence Intervals Program 9.7 Confidence Interval for Effect of Gr_Liv_Area on Sale Price Output 9.7 Confidence Interval for Effect of Gr_Liv_Area on SalePrice Multiple Linear Regression Fitting a Multiple Linear Regression Model Using the REG Procedure Program 9.8 Multiple Linear Regression for Predicting Sale Price with Six Predictors Output 9.8 Multiple Linear Regression for Predicting SalePrice with Six Predictors Measures of Fit for the Multiple Linear Regression Model Adjusted R-Square Output 9.9 Multiple Linear Regression for Predicting SalePrice with Five Predictors Table 9.3 Measures of Fit for Multiple Linear Regression Quantifying the Relative Impact of a Predictor Program 9.9 Measures of Relative Predictor Importance in Multiple Linear Regression Output 9.10 Measure of Relative Predictor Impact in Multiple Linear Regression Checking for Collinearity Using VIF, COLLIN, and COLLINOINT The Variance Inflation Factor (VIF) for Detecting Collinearity The Condition Index (C) for Detecting Collinearity Program 9.10 VIF and Condition Numbers for Detecting Collinearity Output 9.11 VIF and Condition Numbers for Detecting Collinearity Fitting a Simple Linear Regression Model Using the GLM Procedure Program 9.11 PROC GLM for Prediction Using One Categorical Variable Output 9.12a PROC GLM for Prediction Using One Categorical Variable Output 9.12b Tukey Procedure for Detecting Differences in Mean Sale Price Program 9.12 PROC REG for Prediction Using One Categorical Variable Output 9.13 PROC REG for Prediction Using One Categorical Variable Variable Selection Using the REG and GLMSELECT Procedures The REG Procedure for Variable Selection All Possible Subsets Program 9.13 Best Subsets Regression Models Ranked by Adjusted R-Square Output 9.14 Best Subsets Regression Models Ranked by Adjusted R-Square Program 9.14 Best Subsets Regression Models Ranked by Mallows’ Cp Output 9.15a Mallows’ Cp Plot for Variable Selection Output 9.15b Best Subsets Regression Models Ranked by Mallows’ Cp Backward Elimination Program 9.15 Backward Elimination for the Ames Housing Case Output 9.16a Backward Elimination Step 0 Output 9.16b Backward Elimination Step 1 Output 9.16c Backward Elimination Step 2 Output 9.16d Backward Elimination Step 7 Output 9.16e Summary of Backward Elimination Output 9.16f Plot of Adjusted R-Square by Backward Elimination Step Forward Selection Program 9.16 Forward Selection for the Ames Housing Case Output 9.17a Forward Selection Step 1 Output 9.17b Forward Selection Step 2 Output 9.17c Forward Selection Step 7 Output 9.17d Summary of Forward Selection Output 9.17e Plot of Adjusted R-Square by Forward Selection Step Stepwise Selection Program 9.17 Stepwise Selection for the Ames Housing Case Program 9.18 Three Variable Selection Methods for the Ames Housing Case The GLMSELECT Procedure for Variable Selection Program 9.19 PROC GLMSELECT with Stepwise Selection for the Ames Housing Case Output 9.18a PROC GLMSELECT for Stepwise Selection Step 1 Output 9.18b PROC GLMSELECT for Stepwise Selection Step 2 Output 9.18c Summary for Stepwise Selection in PROC GLMSELECT Output 9.18d The Selected Model from Stepwise Selection in PROC GLMSELECT Other Features of the GLMSELECT Procedure Table 9.4 Default SLENTRY and SLSTAY Settings by Model Selection Method. Cautionary Note on Sequential Selection Methods Assessing the Validity of Results Using Regression Diagnostics The Assumptions of Linear Regression Residual Analysis for Checking Assumptions Figure 9.3 Fit Plot and Residual Plot for Illustrating a Linear Trend with Constant Variance Figure 9.4 Residual Plot Illustrating a Curvilinear Trend Figure 9.5 Residual Plot Illustrating Unequal Variance Figure 9.6 Residual Plot Illustrating Autocorrelation Program 9.20 Linear Regression Analysis Diagnostics Panel Output 9.19a Linear Regression on Revenue with Diagnostics Panel Output 9.19b Predicted Revenue and Residuals Using the Predictor AdExpense Program 9.21 Linear Regression Analysis Using Transformed Ad Expense (LnAdExp) Output 9.20 Linear Regression on Revenue Using Transformed Ad Expense (LnAdExp) Program 9.22 Diagnostics for Multiple Linear Regression Output 9.21a Multiple Linear Regression for Predicting SalePrice Output 9.21b Residual by Predicted Plot and Q-Q Plot of Residuals for SalePrice Output 9.21c Panel of Residual by Regressors for SalePrice Studentized Residuals Program 9.23 Residuals and Studentized Residuals by AdExpense for Saleprice Output 9.22a Residual and Studentized Residuals by AdExpense for SalePrice Output 9.22b Residuals and Studentized Residuals by AdExpense for Saleprice Using Statistics to Identify Potential Influential Observations Program 9.24 Comparing Regression Lines Based on Influence of Obs 15 Output 9.23 Comparing Regression Lines Based on Influence of Obs 15 Leverage (hii) Discrepancy (RSTUDENTi) Influence Program 9.25 Identifying Suspicious Observations Using Measures of Influence Output 9.24a Linear Regression Output for SalePrice with Influential Observation Output 9.24b Leverage by RStudent Plot Output 9.24c Cook’s D and DFFITS Plots for Detecting Influence Output 9.24d Deletion Statistics for Detecting Influence Output 9.25 Influence Statistics Using the INFLUENCE Option Program 9.26 DFBETA Plots for Assessing Local Influence Output 9.26 DFBETA Plots for Assessing Local Influence Program 9.27 Regression Diagnostics for the Ames Housing Case Output 9.27a Influence Panels and Influential Observations for Ames Housing Output 9.27b Observations Flagged as Influential for Ames Housing Recommendations for Handling Influential Observations Concluding Remarks Key Terms Chapter Quiz Chapter 10: Logistic Regression Analysis Introduction The Logistic Regression Model Development of the Logistic Regression Model Figure 10.1 Scatter Plot of Gr_Liv_Area by Bonus Program 10.1 Scatter Plot of Binned Living Area by Proportion of Successes Output 10.1 Scatter Plot of Binned Living Area by Proportion of Successes The Logit Transformation Estimating the Logistic Regression Parameters Syntax for the Logistic Regression Procedure Program 10.2 Simple Logistic Regression Output 10.2a Model Information and Response Profile for Simple Logistic Regression Output 10.2b Model Convergence, Fit Statistics, and Testing Global Null Output 10.2c Analysis of Maximum Likelihood Estimates Estimating the Odds Ratio from the Parameter Estimates Output 10.2d Odds Ratio Estimate for Gr_Liv_Area Based upon Default UNITS=1 Output 10.3 Odds Ratio Estimate for Gr_Liv_Area Based upon UNITS=100 Additional Measures of Fit Output 10.2e Association of Predicted Probabilities and Observed Responses Assumptions of Logistic Regression Plots for Probabilities of an Event and for the Odds Ratios Figure 10.2 Plot of Gr_Living Area by Probability for Bonus=1 Program 10.3 Odds Ratio with 95% Confidence Interval for Gr_Liv_Area (UNITS=100) Output 10.4 Plot of Odds Ratio with 95% Confidence Interval for Gr_Liv_Area (UNIT=1) Program 10.4 Odds Ratio with 95% Confidence Interval for Gr_Liv_Area (UNITS=100) Output 10.5 Plot of Odds Ratio with 95% Confidence Interval for Gr_Liv_Area (UNITS=100) Program 10.5 UNITS Statement and ODDSRATIO Statement Logistic Regression with a Categorical Predictor Effect Coding Parameterization Program 10.6 Logistic Regression for One Categorical Predictor Using Effect Coding Output 10.6 Logistic Regression for One Categorical Predictor Using Effect Coding Reference Cell Coding Parameterization Program 10.7 Logistic Regression for One Categorical Predictor Using Reference Coding Output 10.7 Logistic Regression for One Categorical Predictor Using Reference Coding Program 10.8 CLASS Statement with Dummy Coded Variable The Multiple Logistic Regression Model Multiple Logistic Regression by Example Program 10.9 Multiple Logistic Regression for Ames Housing Using Reference Coding Output 10.8a Class Level Information Using Reference Coding Output 10.8b Fit Statistics and Global Null Test for Multiple Logistic Regression Output 10.8c Test 3 Analysis of Effects for Multiple Logistic Regression Output 10.8d Maximum Likelihood Estimates and Odds Ratios for Multiple Logistic Regression Variable Selection Backward Elimination Program 10.10 Backward Elimination for Ames Housing Output 10.9a Effects Eligible for Removal for Step 1 of Backward Elimination Output 10.9b Effects Eligible for Removal for Step 2 of Backward Elimination Output 10.9c Effects Eligible for Removal for Steps 3 through 5 of Backward Elimination Output 10.9d Summary of Effects Removed in Backward Elimination Forward Selection Program 10.11 Forward Selection for Ames Housing Output 10.10a Effects Eligible for Entry for Step 1 of Forward Selection Output 10.10b Summary of Effects Entered in Forward Selection Stepwise Selection Program 10.12 Stepwise Selection for Ames Housing Output 10.11a Effects Eligible for Entry for Step 1 of Stepwise Selection Output 10.11b Effects Eligible for Removal After Step 1 of Stepwise Selection Output 10.11c Effects Eligible for Entry for Step 2 of Stepwise Selection Output 10.11d Summary of Effects Entered or Removed in Stepwise Selection Table 10.1 Summary of Effects Entered or Removed in Stepwise Selection Customized Options within the Sequential Methods Output 10.12 Summary of Effects Removed in Backward Elimination Using the STOP= Option Output 10.13 Summary of Effects Entered in Forward Selection Using START= Option Best Subset Selection Program 10.13 Score Chi-Square Statistics for the Best Subsets of Size 1 through 8 Output 10.14 Score Chi-Square Statistics for the Best Subsets of Size 1 through 8 Modeling Interaction Figure 10.3 Mean Plots by Degree and Occupational Area Program 10.14 Testing Main Effects and Interactions for Ames Housing Output 10.15 Testing Main Effects and Interactions for Ames Housing Output 10.16 Example of Failed Model Convergence Program 10.15 Backward Model Selection for Ames Housing Output 10.17a Step 0 of Backward Elimination for Main and Interactions Effects Output 10.17b Interaction Effects Eligible for Removal for Step 1 of Backward Elimination Output 10.17c Interaction Effects Eligible for Removal for Step 2 of Backward Elimination Output 10.17d Effects Eligible for Removal for Step 3 of Backward Elimination Output 10.17e Final Model Selected Using Backward Elimination Program 10.16 Odds Ratios with Plots for Main Effects and Conditional Effects Output 10.18a Odds Ratios with Plots for Main Effects and Conditional Effects Output 10.18b Probabilities for High_Kitchen_Quality by Fullbath_2plus for Overall_Quality=1 Scoring New Data The SCORE Statement with PROC LOGISTIC Program 10.17 Predicted Class for New Observations Using the SCORE Statement in PROC LOGISTIC Output 10.19 Predicted Class for New Observations Using the SCORE Statement in PROC LOGISTIC Using the PLM Procedure to Call Score Code Created by PROC LOGISTIC Program 10.18 Predicted Class for New Observations Using PROC PLM with the SCORE Statement Output 10.20 Predicted Class for New Observations Using PROC PLM with the SCORE Statement The CODE Statement within PROC LOGISTIC Program 10.19 Predicted Class for New Observations Using PROC PLM with the SCORE Statement Output 10.21 Predicted Class for New Observations Using PROC PLM with the SCORE Statement Program 10.20 SAS Scoring Code Created by the PLM Procedure The OUTMODEL and INMODEL Options with PROC LOGISTIC Program 10.21 Model Saved as SAS Data Set Created by the OUTMODEL Option in PROC LOGISTIC Output 10.22 Model Saved as SAS Data Set Created by the OUTMODEL Option in PROC LOGISTIC Key Terms Chapter Quiz Chapter 11: Measure of Model Performance Introduction Preparation for the Modeling Phase Honest Assessment of a Classifier PROC SURVEYSELECT for Creating Training and Validation Data Sets Program 11.1 Partitioning Ames Housing Data into Training and Validation Data Sets Output 11.1a PROC FREQ on Bonus for Ames Housing Data Output 11.1b PROC SURVEYSELECT Using Ames Housing Data Output 11.1c PROC FREQ on Bonus for Ames Training and Validation Data Log 11.1 Partial Log for PROC SURVEYSELECT Using Ames Housing Data Recommendations for the Model Preparation Stage Assessing Classifier Performance Measures of Performance Using the Classification Table Table 11.1 General Form of the Classification Table The CTABLE Option for Producing Classification Results Program 11.2 Classification Tables for Ames Training and Validation Data Sets Output 11.2a Classification Table for Ames Training Data Table 11.2 Classification Table for Ames Training Data Output 11.2b Classification Table for Ames Validation Data Assessing the Performance and Generalizability of a Classifier The Effect of Cutoff Values on Sensitivity and Specificity Estimates Output 11.3 Classification Table for Multiple Cutoff Values for Ames Training Data Figure 11.1 Performance Measures by Cutoff Values for Ames Training Data Program 11.3 Classification Table Using Cutoff=0.20 for Ames Validation Data Output 11.4 Classification Table for Cutoff = 0.20 for Ames Validation Data Measure of Performance Using the Receiver-Operator-Characteristic (ROC) Curve Figure 11.2 ROC Curve for Ames Training Data Producing an ROC Curve Using the SCORE Statement with the OUTROC Option Program 11.4 ROC Curves for Ames Housing Training and Validation Data Output 11.5a: Training and Validation ROC Curves for Ames Housing Data Output 11.5b: ROC Information for Ames Validation Data Model Comparison Using the ROC and ROCCONTRAST Statements Program 11.5 Comparing Two Models Using Validation ROC Curves for Ames Housing Output 11.6a: ROC Curves for Two Models Applied to Ames Validation Data Output 11.6b: ROC Contrast Results for Two Models Applied to Ames Validation Data Measures of Performance Using the Gains and Lift Charts The Gains Chart Program 11.6 Gains Information for Ames Validation Data Output 11.7a Gains Information for Ames Validation Data Output 11.7b Gains Chart for Ames Validation Data The Lift Chart Output 11.8 Lift Chart for Ames Validation Data Adjustment to Performance Estimates When Oversampling Rare Events The PEVENT Option for Defining Prior Probabilities Program 11.7 Use of PEVENT Option to Define Prior Probabilities Output 11.9a The Logistic Regression Model for Ames Training Data Output 11.9b Classification Table for PEVENT = 0.02 and PEVENT = 0.4053 Table 11.3 Classification Table for Ames Housing Training Data Labeled for Bayes’ Theorem Manual Adjustment of the Classification Matrix Table 11.4 General Classification Table Adjusted for Oversampling Scoring the Validation Data Using Adjusted Posterior Probabilities Manually Adjusting Posterior Probabilities to Account for Oversampling Program 11.8 Posterior Probabilities Manually Adjusted for Oversampling Output 11.10 Classification Table for Ames Housing Validation Data Adjusted for Oversampling Manually Adjusted Intercept Using the Offset Program 11.9 Posterior Probabilities Using Manually Adjusted Intercept Program 11.10 Adjusting the Model Intercept Using the OFFSET Option Output 11.11 Logistic Regression Model for Ames Training with Intercept Adjusted for Oversampling Automatically Adjusted Posterior Probabilities to Account for Oversampling Program 11.11 Comparison of the Three Approaches to Adjusting for Oversampling Output 11.12 Posterior Probabilities for Ames Validation Data Using Three Approaches The Use of Decision Theory for Model Selection Decision Cutoffs and Expected Profits for Model Selection Table 11.6 Profit Matrix for Classification Decisions Table 11.7 Profit Matrix for Ames Housing Program 11.12 Classification Results and Profit Information for Ames Validation Data Output 11.13a Classification Matrix for Ames Validation Data Based upon 0.10 Cutoff Output 11.13b Average Expected Profit for Ames Validation Data Based upon 0.10 Cutoff Output 11.13c Line Listing for Several Houses in the Ames Validation Data Set Using Estimated Posterior Probabilities to Determine Cutoffs Program 11.13 Average Profit for Ames Validation Data by Depth and Cutoff Output 11.14a Average Profit for Ames Validation Data by Depth and Cutoff Output 11.14b Maximum Average Profit for Ames Validation Data Key Terms Chapter Quiz References