ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Data Science and Predictive Analytics. Biomedical and Health Applications using R

دانلود کتاب علم داده و تجزیه و تحلیل پیش بینی. کاربردهای زیست پزشکی و بهداشتی با استفاده از R

Data Science and Predictive Analytics. Biomedical and Health Applications using R

مشخصات کتاب

Data Science and Predictive Analytics. Biomedical and Health Applications using R

ویرایش:  
نویسندگان:   
سری:  
ISBN (شابک) : 9783319723471 
ناشر: Springer 
سال نشر: 2018 
تعداد صفحات: 848 
زبان: english 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 32 مگابایت 

قیمت کتاب (تومان) : 44,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 9


در صورت تبدیل فایل کتاب Data Science and Predictive Analytics. Biomedical and Health Applications using R به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب علم داده و تجزیه و تحلیل پیش بینی. کاربردهای زیست پزشکی و بهداشتی با استفاده از R نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب علم داده و تجزیه و تحلیل پیش بینی. کاربردهای زیست پزشکی و بهداشتی با استفاده از R

در طول دهه گذشته، داده های بزرگ در همه بخش های اقتصادی، رشته های علمی و فعالیت های انسانی در همه جا حاضر شده اند. آنها منجر به پیشرفت های فن آوری قابل توجهی شده اند که بر تمام تجربیات بشری تأثیر گذاشته است. توانایی ما برای مدیریت، درک، بازجویی و تفسیر چنین داده‌های بسیار بزرگ، چندمنبعی، ناهمگن، ناقص، چند مقیاسی و نامتجانس با افزایش سریع حجم، پیچیدگی و تکثیر سیل اطلاعات دیجیتال همگام نبوده است. سه دلیل برای این کمبود وجود دارد. اولاً، حجم داده ها بسیار سریعتر از افزایش متناظر قدرت پردازش محاسباتی ما در حال افزایش است (قانون کرایدر > قانون مور). دوم، محدودیت های انضباط سنتی مانع از پیشرفت سریع می شود. سوم، فعالیت های آموزشی و آموزشی ما از روند شتابان پیشرفت های علمی، اطلاعاتی و ارتباطی عقب مانده است. منابع آموزشی دقیق، مواد آموزشی تعاملی، و محیط های آموزشی پویا که از یادگیری فعال علم داده پشتیبانی می کنند، بسیار اندک هستند. کتاب درسی پایه‌های ریاضی را با نمایش‌های ماهرانه و مثال‌هایی از داده‌ها، ابزارها، ماژول‌ها و گردش‌های کاری که به عنوان ستون‌هایی برای پل مورد نیاز فوری برای پر کردن شکاف مهارت‌های تحلیلی پیش‌بینی‌کننده عرضه و تقاضا عمل می‌کنند، متعادل می‌کند. هدف این کتاب درسی با افشای فرصت های عظیم ارائه شده توسط سونامی داده های بزرگ، شناسایی شکاف های دانش خاص، موانع آموزشی و کمبودهای آمادگی نیروی کار است. به طور خاص، این برنامه بر توسعه یک برنامه درسی فرا رشته‌ای با ادغام روش‌های محاسباتی مدرن، تکنیک‌های پیشرفته علم داده، کاربردهای نوآورانه زیست‌پزشکی و تجزیه و تحلیل سلامت تاثیرگذار تمرکز دارد. محتوای این کتاب درسی سطح فارغ التحصیل شکاف قابل توجهی را در ادغام مفاهیم مهندسی مدرن، الگوریتم های محاسباتی، بهینه سازی ریاضی، محاسبات آماری و استنتاج زیست پزشکی پر می کند. تکنیک های تجزیه و تحلیل کلان داده ها و روش های علمی پیش بینی کننده دانش فرا رشته ای گسترده ای را می طلبد، برای طیف بسیار گسترده ای از خوانندگان/ فراگیران جذاب است و فرصت های باورنکردنی را برای مشارکت در سراسر آکادمی، صنعت، سازمان های نظارتی و مالی فراهم می کند. دو مثال زیر نیاز قدرتمند به دانش علمی، توانایی‌های محاسباتی، تخصص بین رشته‌ای و فناوری‌های مدرن لازم برای دستیابی به نتایج مطلوب (بهبود سلامت انسان و بهینه‌سازی بازگشت سرمایه در آینده) را نشان می‌دهد. این تنها توسط تیم‌های آموزش‌دیده مناسب محقق می‌شود که می‌توانند سیستم‌های پشتیبانی تصمیم قوی را با استفاده از تکنیک‌های مدرن و پروتکل‌های مؤثر سرتاسر، مانند آنچه در این کتاب درسی توضیح داده شده، توسعه دهند. • یک متخصص مغز و اعصاب سالمندان در حال معاینه بیمار است که از عدم تعادل راه رفتن و بی ثباتی وضعیت بدن شکایت دارد. برای تعیین اینکه آیا بیمار ممکن است از بیماری پارکینسون رنج ببرد، پزشک داده های بالینی، شناختی، فنوتیپی، تصویربرداری و ژنتیکی (داده های بزرگ) را به دست می آورد. اکثر کلینیک ها و مراکز مراقبت های بهداشتی مجهز به تیم های ماهر تجزیه و تحلیل داده ها نیستند که بتوانند چنین مجموعه داده های پیچیده ای را مورد بحث، هماهنگ و تفسیر قرار دهند. زبان آموزی که یک دوره تحصیلی را با استفاده از این کتاب درسی تکمیل می کند، شایستگی و توانایی مدیریت داده ها، تولید پروتکلی برای استخراج نشانگرهای زیستی و ارائه یک سیستم پشتیبانی تصمیم گیری عملی را خواهد داشت. نتایج این پروتکل به پزشک کمک می‌کند تا کل مجموعه داده‌های بیمار را درک کند و به ایجاد یک تشخیص کلینیکی مبتنی بر شواهد، مبتنی بر داده‌ها، کمک کند. • برای بهبود بازده سرمایه گذاری برای سهامداران خود، یک تولید کننده خدمات بهداشتی باید تقاضا برای محصول خود را با توجه به داده های محیطی، جمعیتی، اقتصادی و زیست اجتماعی (داده های بزرگ) پیش بینی کند. تیم تجزیه و تحلیل داده سازمان وظیفه دارد پروتکلی را ایجاد کند که این عناصر داده ناهمگن را شناسایی، تجمیع، هماهنگ، مدل‌سازی و تجزیه و تحلیل کند تا یک پیش‌بینی روند تولید کند. این سیستم باید یک پیش‌بینی خودکار، تطبیقی، مقیاس‌پذیر و قابل اعتماد از سرمایه‌گذاری بهینه، به‌عنوان مثال، تخصیص تحقیق و توسعه، ارائه دهد که نتیجه شرکت را به حداکثر می‌رساند. خواننده ای که یک دوره مطالعه را با استفاده از این کتاب درسی کامل می کند، می تواند داده های ساختاریافته و بدون ساختار مشاهده شده را دریافت کند، داده ها را به صورت ریاضی به عنوان یک شی قابل محاسبه نمایش دهد، تکنیک های پیش بینی مبتنی بر مدل و بدون مدل مناسب را اعمال کند. نتایج این تکنیک ها ممکن است برای پیش بینی رابطه مورد انتظار بین سرمایه گذاری شرکت، عرضه محصول، تقاضای عمومی مراقبت های بهداشتی (ارائه دهندگان و بیماران) و تخمین بازده سرمایه گذاری های اولیه مورد استفاده قرار گیرد.


توضیحاتی درمورد کتاب به خارجی

Over the past decade, Big Data have become ubiquitous in all economic sectors, scientific disciplines, and human activities. They have led to striking technological advances, affecting all human experiences. Our ability to manage, understand, interrogate, and interpret such extremely large, multisource, heterogeneous, incomplete, multiscale, and incongruent data has not kept pace with the rapid increase of the volume, complexity and proliferation of the deluge of digital information. There are three reasons for this shortfall. First, the volume of data is increasing much faster than the corresponding rise of our computational processing power (Kryder’s law > Moore’s law). Second, traditional discipline-bounds inhibit expeditious progress. Third, our education and training activities have fallen behind the accelerated trend of scientific, information, and communication advances. There are very few rigorous instructional resources, interactive learning materials, and dynamic training environments that support active data science learning. The textbook balances the mathematical foundations with dexterous demonstrations and examples of data, tools, modules and workflows that serve as pillars for the urgently needed bridge to close that supply and demand predictive analytic skills gap. Exposing the enormous opportunities presented by the tsunami of Big data, this textbook aims to identify specific knowledge gaps, educational barriers, and workforce readiness deficiencies. Specifically, it focuses on the development of a transdisciplinary curriculum integrating modern computational methods, advanced data science techniques, innovative biomedical applications, and impactful health analytics. The content of this graduate-level textbook fills a substantial gap in integrating modern engineering concepts, computational algorithms, mathematical optimization, statistical computing and biomedical inference. Big data analytic techniques and predictive scientific methods demand broad transdisciplinary knowledge, appeal to an extremely wide spectrum of readers/learners, and provide incredible opportunities for engagement throughout the academy, industry, regulatory and funding agencies. The two examples below demonstrate the powerful need for scientific knowledge, computational abilities, interdisciplinary expertise, and modern technologies necessary to achieve desired outcomes (improving human health and optimizing future return on investment). This can only be achieved by appropriately trained teams of researchers who can develop robust decision support systems using modern techniques and effective end-to-end protocols, like the ones described in this textbook. • A geriatric neurologist is examining a patient complaining of gait imbalance and posture instability. To determine if the patient may suffer from Parkinson’s disease, the physician acquires clinical, cognitive, phenotypic, imaging, and genetics data (Big Data). Most clinics and healthcare centers are not equipped with skilled data analytic teams that can wrangle, harmonize and interpret such complex datasets. A learner that completes a course of study using this textbook will have the competency and ability to manage the data, generate a protocol for deriving biomarkers, and provide an actionable decision support system. The results of this protocol will help the physician understand the entire patient dataset and assist in making a holistic evidence-based, data-driven, clinical diagnosis. • To improve the return on investment for their shareholders, a healthcare manufacturer needs to forecast the demand for their product subject to environmental, demographic, economic, and bio-social sentiment data (Big Data). The organization’s data-analytics team is tasked with developing a protocol that identifies, aggregates, harmonizes, models and analyzes these heterogeneous data elements to generate a trend forecast. This system needs to provide an automated, adaptive, scalable, and reliable prediction of the optimal investment, e.g., R&D allocation, that maximizes the company’s bottom line. A reader that complete a course of study using this textbook will be able to ingest the observed structured and unstructured data, mathematically represent the data as a computable object, apply appropriate model-based and model-free prediction techniques. The results of these techniques may be used to forecast the expected relation between the company’s investment, product supply, general demand of healthcare (providers and patients), and estimate the return on initial investments.



فهرست مطالب

Foreword
Preface
	Genesis
	Purpose
	Limitations/Prerequisites
	Scope of the Book
	Acknowledgements
DSPA Application and Use Disclaimer
	Biomedical, Biosocial, Environmental, and Health Disclaimer
Notations
Contents
Chapter 1: Motivation
	1.1 DSPA Mission and Objectives
	1.2 Examples of Driving Motivational Problems and Challenges
		1.2.1 Alzheimer´s Disease
		1.2.2 Parkinson´s Disease
		1.2.3 Drug and Substance Use
		1.2.4 Amyotrophic Lateral Sclerosis
		1.2.5 Normal Brain Visualization
		1.2.6 Neurodegeneration
		1.2.7 Genetic Forensics: 2013-2016 Ebola Outbreak
		1.2.8 Next Generation Sequence (NGS) Analysis
		1.2.9 Neuroimaging-Genetics
	1.3 Common Characteristics of Big (Biomedical and Health) Data
	1.4 Data Science
	1.5 Predictive Analytics
	1.6 High-Throughput Big Data Analytics
	1.7 Examples of Data Repositories, Archives, and Services
	1.8 DSPA Expectations
Chapter 2: Foundations of R
	2.1 Why Use R?
	2.2 Getting Started
		2.2.1 Install Basic Shell-Based R
		2.2.2 GUI Based R Invocation (RStudio)
		2.2.3 RStudio GUI Layout
		2.2.4 Some Notes
	2.3 Help
	2.4 Simple Wide-to-Long Data format Translation
	2.5 Data Generation
	2.6 Input/Output (I/O)
	2.7 Slicing and Extracting Data
	2.8 Variable Conversion
	2.9 Variable Information
	2.10 Data Selection and Manipulation
	2.11 Math Functions
	2.12 Matrix Operations
	2.13 Advanced Data Processing
	2.14 Strings
	2.15 Plotting
	2.16 QQ Normal Probability Plot
	2.17 Low-Level Plotting Commands
	2.18 Graphics Parameters
	2.19 Optimization and model Fitting
	2.20 Statistics
	2.21 Distributions
		2.21.1 Programming
	2.22 Data Simulation Primer
	2.23 Appendix
		2.23.1 HTML SOCR Data Import
		2.23.2 R Debugging
			Example
	2.24 Assignments: 2. R Foundations
		2.24.1 Confirm that You Have Installed R/RStudio
		2.24.2 Long-to-Wide Data Format Translation
		2.24.3 Data Frames
		2.24.4 Data Stratification
		2.24.5 Simulation
		2.24.6 Programming
	References
Chapter 3: Managing Data in R
	3.1 Saving and Loading R Data Structures
	3.2 Importing and Saving Data from CSV Files
	3.3 Exploring the Structure of Data
	3.4 Exploring Numeric Variables
	3.5 Measuring the Central Tendency: Mean, Median, Mode
	3.6 Measuring Spread: Quartiles and the Five-Number Summary
	3.7 Visualizing Numeric Variables: Boxplots
	3.8 Visualizing Numeric Variables: Histograms
	3.9 Understanding Numeric Data: Uniform and Normal Distributions
	3.10 Measuring Spread: Variance and Standard Deviation
	3.11 Exploring Categorical Variables
	3.12 Exploring Relationships Between Variables
	3.13 Missing Data
		3.13.1 Simulate Some Real Multivariate Data
		3.13.2 TBI Data Example
		3.13.3 Imputation via Expectation-Maximization
			Types of Missing Data
			General Idea of EM Algorithm
			EM-Based Imputation
			A Simple Manual Implementation of EM-Based Imputation
			Plotting Complete and Imputed Data
			Validation of EM-Imputation Using the Amelia R Package
				Comparison
				Density Plots
	3.14 Parsing Webpages and Visualizing Tabular HTML Data
	3.15 Cohort-Rebalancing (for Imbalanced Groups)
	3.16 Appendix
		3.16.1 Importing Data from SQL Databases
		3.16.2 R Code Fragments
	3.17 Assignments: 3. Managing Data in R
		3.17.1 Import, Plot, Summarize and Save Data
		3.17.2 Explore some Bivariate  Relations in the Data
		3.17.3 Missing Data
		3.17.4 Surface Plots
		3.17.5 Unbalanced Designs
		3.17.6 Aggregate Analysis
	References
Chapter 4: Data Visualization
	4.1 Common Questions
	4.2 Classification of Visualization Methods
	4.3 Composition
		4.3.1 Histograms and Density Plots
		4.3.2 Pie Chart
		4.3.3 Heat Map
	4.4 Comparison
		4.4.1 Paired Scatter Plots
		4.4.2 Jitter Plot
		4.4.3 Bar Plots
		4.4.4 Trees and Graphs
		4.4.5 Correlation Plots
	4.5 Relationships
		4.5.1 Line Plots Using ggplot
		4.5.2 Density Plots
		4.5.3 Distributions
		4.5.4 2D Kernel Density and 3D Surface Plots
		4.5.5 Multiple 2D Image Surface Plots
		4.5.6 3D and 4D Visualizations
	4.6 Appendix
		4.6.1 Hands-on Activity (Health Behavior Risks)
		4.6.2 Additional ggplot Examples
			Housing Price Data
			Modeling the Home Price Index Data (Fig. 4.48)
			Map of the Neighborhoods of Los Angeles (LA)
			Latin Letter Frequency in Different Languages
	4.7 Assignments 4: Data Visualization
		4.7.1 Common Plots
		4.7.2 Trees and Graphs
		4.7.3 Exploratory Data Analytics (EDA)
	References
Chapter 5: Linear Algebra and Matrix Computing
	5.1 Matrices (Second Order Tensors)
		5.1.1 Create Matrices
		5.1.2 Adding Columns and Rows
	5.2 Matrix Subscripts
	5.3 Matrix Operations
		5.3.1 Addition
		5.3.2 Subtraction
		5.3.3 Multiplication
			Elementwise Multiplication
			Matrix Multiplication
		5.3.4 Element-wise Division
		5.3.5 Transpose
		5.3.6 Multiplicative Inverse
	5.4 Matrix Algebra Notation
		5.4.1 Linear Models
		5.4.2 Solving Systems of Equations
		5.4.3 The Identity Matrix
	5.5 Scalars, Vectors and Matrices
		5.5.1 Sample Statistics (Mean, Variance)
			Mean
			Variance
			Applications of Matrix Algebra: Linear Modeling
			Finding Function Extrema (Min/Max) Using Calculus
		5.5.2 Least Square Estimation
			The R lm Function
	5.6 Eigenvalues and Eigenvectors
	5.7 Other Important Functions
	5.8 Matrix Notation (Another View)
	5.9 Multivariate Linear Regression
	5.10 Sample Covariance Matrix
	5.11 Assignments: 5. Linear Algebra and Matrix Computing
		5.11.1 How Is Matrix Multiplication Defined?
		5.11.2 Scalar Versus Matrix Multiplication
		5.11.3 Matrix Equations
		5.11.4 Least Square Estimation
		5.11.5 Matrix Manipulation
		5.11.6 Matrix Transpose
		5.11.7 Sample Statistics
		5.11.8 Least Square Estimation
		5.11.9 Eigenvalues and Eigenvectors
	References
Chapter 6: Dimensionality Reduction
	6.1 Example: Reducing 2D to 1D
	6.2 Matrix Rotations
	6.3 Notation
	6.4 Summary (PCA vs. ICA vs. FA)
	6.5 Principal Component Analysis (PCA)
		6.5.1 Principal Components
	6.6 Independent Component Analysis (ICA)
	6.7 Factor Analysis (FA)
	6.8 Singular Value Decomposition (SVD)
	6.9 SVD Summary
	6.10 Case Study for Dimension Reduction (Parkinson´s Disease)
	6.11 Assignments: 6. Dimensionality Reduction
		6.11.1 Parkinson´s Disease Example
		6.11.2 Allometric Relations in Plants Example
			Load Data
			Dimensionality Reduction
	References
Chapter 7: Lazy Learning: Classification Using Nearest Neighbors
	7.1 Motivation
	7.2 The kNN Algorithm Overview
		7.2.1 Distance Function and Dummy Coding
		7.2.2 Ways to Determine k
		7.2.3 Rescaling of the Features
		7.2.4 Rescaling Formulas
	7.3 Case Study
		7.3.1 Step 1: Collecting Data
		7.3.2 Step 2: Exploring and Preparing the Data
		7.3.3 Normalizing Data
		7.3.4 Data Preparation: Creating Training and Testing Datasets
		7.3.5 Step 3: Training a Model On the Data
		7.3.6 Step 4: Evaluating Model Performance
		7.3.7 Step 5: Improving Model Performance
		7.3.8 Testing Alternative Values of k
		7.3.9 Quantitative Assessment (Tables 7.2 and 7.3)
	7.4 Assignments: 7. Lazy Learning: Classification Using Nearest Neighbors
		7.4.1 Traumatic Brain Injury (TBI)
		7.4.2 Parkinson´s Disease
		7.4.3 KNN Classification in a High Dimensional Space
		7.4.4 KNN Classification in a Lower Dimensional Space
	References
Chapter 8: Probabilistic Learning: Classification Using Naive Bayes
	8.1 Overview of the Naive Bayes Algorithm
	8.2 Assumptions
	8.3 Bayes Formula
	8.4 The Laplace Estimator
	8.5 Case Study: Head and Neck Cancer Medication
		8.5.1 Step 1: Collecting Data
		8.5.2 Step 2: Exploring and Preparing the Data
			Data Preparation: Processing Text Data for Analysis
			Data Preparation: Creating Training and Test Datasets
			Visualizing Text Data: Word Clouds
			Data Preparation: Creating Indicator Features for Frequent Words
		8.5.3 Step 3: Training a Model on the Data
		8.5.4 Step 4: Evaluating Model Performance
		8.5.5 Step 5: Improving Model Performance
		8.5.6 Step 6: Compare Naive Bayesian against LDA
	8.6 Practice Problem
	8.7 Assignments 8: Probabilistic Learning: Classification Using Naive Bayes
		8.7.1 Explain These Two Concepts
		8.7.2 Analyzing Textual Data
	References
Chapter 9: Decision Tree Divide and Conquer Classification
	9.1 Motivation
	9.2 Hands-on Example: Iris Data
	9.3 Decision Tree Overview
		9.3.1 Divide and Conquer
		9.3.2 Entropy
		9.3.3 Misclassification Error and Gini Index
		9.3.4 C5.0 Decision Tree Algorithm
		9.3.5 Pruning the Decision Tree
	9.4 Case Study 1: Quality of Life and Chronic Disease
		9.4.1 Step 1: Collecting Data
		9.4.2 Step 2: Exploring and Preparing the Data
			Data Preparation: Creating Random Training and Test Datasets
		9.4.3 Step 3: Training a Model On the Data
		9.4.4 Step 4: Evaluating Model Performance
		9.4.5 Step 5: Trial Option
		9.4.6 Loading the Misclassification Error Matrix
		9.4.7 Parameter Tuning
	9.5 Compare Different Impurity Indices
	9.6 Classification Rules
		9.6.1 Separate and Conquer
		9.6.2 The One Rule Algorithm
		9.6.3 The RIPPER Algorithm
	9.7 Case Study 2: QoL in Chronic Disease (Take 2)
		9.7.1 Step 3: Training a Model on the Data
		9.7.2 Step 4: Evaluating Model Performance
		9.7.3 Step 5: Alternative Model1
		9.7.4 Step 5: Alternative Model2
	9.8 Practice Problem
	9.9 Assignments 9: Decision Tree Divide and Conquer Classification
		9.9.1 Explain These Concepts
		9.9.2 Decision Tree Partitioning
	References
Chapter 10: Forecasting Numeric Data Using Regression Models
	10.1 Understanding Regression
		10.1.1 Simple Linear Regression
	10.2 Ordinary Least Squares Estimation
		10.2.1 Model Assumptions
		10.2.2 Correlations
		10.2.3 Multiple Linear Regression
	10.3 Case Study 1: Baseball Players
		10.3.1 Step 1: Collecting Data
		10.3.2 Step 2: Exploring and Preparing the Data
		10.3.3 Exploring Relationships Among Features: The Correlation Matrix
		10.3.4 Visualizing Relationships Among Features: The Scatterplot Matrix
		10.3.5 Step 3: Training a Model on the Data
		10.3.6 Step 4: Evaluating Model Performance
	10.4 Step 5: Improving Model Performance
		10.4.1 Model Specification: Adding Non-linear Relationships
		10.4.2 Transformation: Converting a Numeric Variable to a Binary Indicator
		10.4.3 Model Specification: Adding Interaction Effects
	10.5 Understanding Regression Trees and Model Trees
		10.5.1 Adding Regression to Trees
	10.6 Case Study 2: Baseball Players (Take 2)
		10.6.1 Step 2: Exploring and Preparing the Data
		10.6.2 Step 3: Training a Model On the Data
		10.6.3 Visualizing Decision Trees
		10.6.4 Step 4: Evaluating Model Performance
		10.6.5 Measuring Performance with Mean Absolute Error
		10.6.6 Step 5: Improving Model Performance
	10.7 Practice Problem: Heart Attack Data
	10.8 Assignments: 10. Forecasting Numeric Data Using Regression Models
	References
Chapter 11: Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
	11.1 Understanding Neural Networks
		11.1.1 From Biological to Artificial Neurons
		11.1.2 Activation Functions
		11.1.3 Network Topology
		11.1.4 The Direction of Information Travel
		11.1.5 The Number of Nodes in Each Layer
		11.1.6 Training Neural Networks with Backpropagation
	11.2 Case Study 1: Google Trends and the Stock Market: Regression
		11.2.1 Step 1: Collecting Data
			Variables
		11.2.2 Step 2: Exploring and Preparing the Data
		11.2.3 Step 3: Training a Model on the Data
		11.2.4 Step 4: Evaluating Model Performance
		11.2.5 Step 5: Improving Model Performance
		11.2.6 Step 6: Adding Additional Layers
	11.3 Simple NN Demo: Learning to Compute
	11.4 Case Study 2: Google Trends and the Stock Market - Classification
	11.5 Support Vector Machines (SVM)
		11.5.1 Classification with Hyperplanes
			Finding the Maximum Margin
			Linearly Separable Data
			Non-linearly Separable Data
			Using Kernels for Non-linear Spaces
	11.6 Case Study 3: Optical Character Recognition (OCR)
		11.6.1 Step 1: Prepare and Explore the Data
		11.6.2 Step 2: Training an SVM Model
		11.6.3 Step 3: Evaluating Model Performance
		11.6.4 Step 4: Improving Model Performance
	11.7 Case Study 4: Iris Flowers
		11.7.1 Step 1: Collecting Data
		11.7.2 Step 2: Exploring and Preparing the Data
		11.7.3 Step 3: Training a Model on the Data
		11.7.4 Step 4: Evaluating Model Performance
		11.7.5 Step 5: RBF Kernel Function
		11.7.6 Parameter Tuning
		11.7.7 Improving the Performance of Gaussian Kernels
	11.8 Practice
		11.8.1 Problem 1 Google Trends and the Stock Market
		11.8.2 Problem 2: Quality of Life and Chronic Disease
	11.9 Appendix
	11.10 Assignments: 11. Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines
		11.10.1 Learn and Predict a Power-Function
		11.10.2 Pediatric Schizophrenia Study
	References
Chapter 12: Apriori Association Rules Learning
	12.1 Association Rules
	12.2 The Apriori Algorithm for Association Rule Learning
	12.3 Measuring Rule Importance by Using Support and Confidence
	12.4 Building a Set of Rules with the Apriori Principle
	12.5 A Toy Example
	12.6 Case Study 1: Head and Neck Cancer Medications
		12.6.1 Step 1: Collecting Data
		12.6.2 Step 2: Exploring and Preparing the Data
			Visualizing Item Support: Item Frequency Plots
			Visualizing Transaction Data: Plotting the Sparse Matrix
		12.6.3 Step 3: Training a Model on the Data
		12.6.4 Step 4: Evaluating Model Performance
		12.6.5 Step 5: Improving Model Performance
			Sorting the Set of Association Rules
			Taking Subsets of Association Rules
			Saving Association Rules to a File or Data Frame
	12.7 Practice Problems: Groceries
	12.8 Summary
	12.9 Assignments: 12. Apriori Association Rules Learning
	References
Chapter 13: k-Means Clustering
	13.1 Clustering as a Machine Learning Task
	13.2 Silhouette Plots
	13.3 The k-Means Clustering Algorithm
		13.3.1 Using Distance to Assign and Update Clusters
		13.3.2 Choosing the Appropriate Number of Clusters
	13.4 Case Study 1: Divorce and Consequences on Young Adults
		13.4.1 Step 1: Collecting Data
			Variables
		13.4.2 Step 2: Exploring and Preparing the Data
		13.4.3 Step 3: Training a Model on the Data
		13.4.4 Step 4: Evaluating Model Performance
		13.4.5 Step 5: Usage of Cluster Information
	13.5 Model Improvement
		13.5.1 Tuning the Parameter k
	13.6 Case Study 2: Pediatric Trauma
		13.6.1 Step 1: Collecting Data
		13.6.2 Step 2: Exploring and Preparing the Data
		13.6.3 Step 3: Training a Model on the Data
		13.6.4 Step 4: Evaluating Model Performance
		13.6.5 Practice Problem: Youth Development
	13.7 Hierarchical Clustering
	13.8 Gaussian Mixture Models
	13.9 Summary
	13.10 Assignments: 13. k-Means Clustering
	References
Chapter 14: Model Performance Assessment
	14.1 Measuring the Performance of Classification Methods
	14.2 Evaluation Strategies
		14.2.1 Binary Outcomes
		14.2.2 Confusion Matrices
		14.2.3 Other Measures of Performance Beyond Accuracy
		14.2.4 The Kappa (κ) Statistic
			Summary of the Kappa Score for Calculating Prediction Accuracy
		14.2.5 Computation of Observed Accuracy and Expected Accuracy
		14.2.6 Sensitivity and Specificity
		14.2.7 Precision and Recall
		14.2.8 The F-Measure
	14.3 Visualizing Performance Tradeoffs (ROC Curve)
	14.4 Estimating Future Performance (Internal Statistical Validation)
		14.4.1 The Holdout Method
		14.4.2 Cross-Validation
		14.4.3 Bootstrap Sampling
	14.5 Assignment: 14. Evaluation of Model Performance
	References
Chapter 15: Improving Model Performance
	15.1 Improving Model Performance by Parameter Tuning
	15.2 Using caret for Automated Parameter Tuning
		15.2.1 Customizing the Tuning Process
		15.2.2 Improving Model Performance with Meta-learning
		15.2.3 Bagging
		15.2.4 Boosting
		15.2.5 Random Forests
			Training Random Forests
			Evaluating Random Forest Performance
		15.2.6 Adaptive Boosting
	15.3 Assignment: 15. Improving Model Performance
		15.3.1 Model Improvement Case Study
	References
Chapter 16: Specialized Machine Learning Topics
	16.1 Working with Specialized Data and Databases
		16.1.1 Data Format Conversion
		16.1.2 Querying Data in SQL Databases
		16.1.3 Real Random Number Generation
		16.1.4 Downloading the Complete Text of Web Pages
		16.1.5 Reading and Writing XML with the XML Package
		16.1.6 Web-Page Data Scraping
		16.1.7 Parsing JSON from Web APIs
		16.1.8 Reading and Writing Microsoft Excel Spreadsheets Using XLSX
	16.2 Working with Domain-Specific Data
		16.2.1 Working with Bioinformatics Data
		16.2.2 Visualizing Network Data
	16.3 Data Streaming
		16.3.1 Definition
		16.3.2 The stream Package
		16.3.3 Synthetic Example: Random Gaussian Stream
			k-Means Clustering
		16.3.4 Sources of Data Streams
			Static Structure Streams
			Concept Drift Streams
			Real Data Streams
		16.3.5 Printing, Plotting and Saving Streams
		16.3.6 Stream Animation
		16.3.7 Case-Study: SOCR Knee Pain Data
		16.3.8 Data Stream Clustering and Classification (DSC)
		16.3.9 Evaluation of Data Stream Clustering
	16.4 Optimization and Improving the Computational Performance
		16.4.1 Generalizing Tabular Data Structures with dplyr
		16.4.2 Making Data Frames Faster with Data.Table
		16.4.3 Creating Disk-Based Data Frames with ff
		16.4.4 Using Massive Matrices with bigmemory
	16.5 Parallel Computing
		16.5.1 Measuring Execution Time
		16.5.2 Parallel Processing with Multiple Cores
		16.5.3 Parallelization Using foreach and doParallel
		16.5.4 GPU Computing
	16.6 Deploying Optimized Learning Algorithms
		16.6.1 Building Bigger Regression Models with biglm
		16.6.2 Growing Bigger and Faster Random Forests with bigrf
		16.6.3 Training and Evaluation Models in Parallel with caret
	16.7 Practice Problem
	16.8 Assignment: 16. Specialized Machine Learning Topics
		16.8.1 Working with Website Data
		16.8.2 Network Data and Visualization
		16.8.3 Data Conversion and Parallel Computing
	References
Chapter 17: Variable/Feature Selection
	17.1 Feature Selection Methods
		17.1.1 Filtering Techniques
		17.1.2 Wrapper Methods
		17.1.3 Embedded Techniques
	17.2 Case Study: ALS
		17.2.1 Step 1: Collecting Data
		17.2.2 Step 2: Exploring and Preparing the Data
		17.2.3 Step 3: Training a Model on the Data
		17.2.4 Step 4: Evaluating Model Performance
			Comparing with RFE
			Comparing with Stepwise Feature Selection
	17.3 Practice Problem
	17.4 Assignment: 17. Variable/Feature Selection
		17.4.1 Wrapper Feature Selection
		17.4.2 Use the PPMI Dataset
	References
Chapter 18: Regularized Linear Modeling and Controlled Variable Selection
	18.1 Questions
	18.2 Matrix Notation
	18.3 Regularized Linear Modeling
		18.3.1 Ridge Regression
		18.3.2 Least Absolute Shrinkage and Selection Operator (LASSO) Regression
		18.3.3 Predictor Standardization
		18.3.4 Estimation Goals
	18.4 Linear Regression
		18.4.1 Drawbacks of Linear Regression
		18.4.2 Assessing Prediction Accuracy
		18.4.3 Estimating the Prediction Error
		18.4.4 Improving the Prediction Accuracy
		18.4.5 Variable Selection
	18.5 Regularization Framework
		18.5.1 Role of the Penalty Term
		18.5.2 Role of the Regularization Parameter
		18.5.3 LASSO
		18.5.4 General Regularization Framework
	18.6 Implementation of Regularization
		18.6.1 Example: Neuroimaging-Genetics Study of Parkinson´s Disease Dataset
		18.6.2 Computational Complexity
		18.6.3 LASSO and Ridge Solution Paths
		18.6.4 Choice of the Regularization Parameter
		18.6.5 Cross Validation Motivation
		18.6.6 n-Fold Cross Validation
		18.6.7 LASSO 10-Fold Cross Validation
		18.6.8 Stepwise OLS (Ordinary Least Squares)
		18.6.9 Final Models
		18.6.10 Model Performance
		18.6.11 Comparing Selected Features
		18.6.12 Summary
	18.7 Knock-off Filtering: Simulated Example
		18.7.1 Notes
	18.8 PD Neuroimaging-Genetics Case-Study
		18.8.1 Fetching, Cleaning and Preparing the Data
		18.8.2 Preparing the Response Vector
		18.8.3 False Discovery Rate (FDR)
			Graphical Interpretation of the Benjamini-Hochberg (BH) Method
			FDR Adjusting the p-Values
		18.8.4 Running the Knockoff Filter
	18.9 Assignment: 18. Regularized Linear Modeling and Knockoff Filtering
	References
Chapter 19: Big Longitudinal Data Analysis
	19.1 Time Series Analysis
		19.1.1 Step 1: Plot Time Series
		19.1.2 Step 2: Find Proper Parameter Values for ARIMA Model
		19.1.3 Check the Differencing Parameter
		19.1.4 Identifying the AR and MA Parameters
		19.1.5 Step 3: Build an ARIMA Model
		19.1.6 Step 4: Forecasting with ARIMA Model
	19.2 Structural Equation Modeling (SEM)-Latent Variables
		19.2.1 Foundations of SEM
		19.2.2 SEM Components
		19.2.3 Case Study - Parkinson´s Disease (PD)
			Step 1 - Collecting Data
			Step 2 - Exploring and Preparing the Data
			Step 3 - Fitting a Model on the Data
		19.2.4 Outputs of Lavaan SEM
	19.3 Longitudinal Data Analysis-Linear Mixed Models
		19.3.1 Mean Trend
		19.3.2 Modeling the Correlation
	19.4 GLMM/GEE Longitudinal Data Analysis
		19.4.1 GEE Versus GLMM
	19.5 Assignment: 19. Big Longitudinal Data Analysis
		19.5.1 Imaging Data
		19.5.2 Time Series Analysis
		19.5.3 Latent Variables Model
	References
Chapter 20: Natural Language Processing/Text Mining
	20.1 A Simple NLP/TM Example
		20.1.1 Define and Load the Unstructured-Text Documents
		20.1.2 Create a New VCorpus Object
		20.1.3 To-Lower Case Transformation
		20.1.4 Text Pre-processing
			Remove Stopwords
			Remove Punctuation
			Stemming: Removal of Plurals and Action Suffixes
		20.1.5 Bags of Words
		20.1.6 Document Term Matrix
	20.2 Case-Study: Job Ranking
		20.2.1 Step 1: Make a VCorpus Object
		20.2.2 Step 2: Clean the VCorpus Object
		20.2.3 Step 3: Build the Document Term Matrix
		20.2.4 Area Under the ROC Curve
	20.3 TF-IDF
		20.3.1 Term Frequency (TF)
		20.3.2 Inverse Document Frequency (IDF)
		20.3.3 TF-IDF
	20.4 Cosine Similarity
	20.5 Sentiment Analysis
		20.5.1 Data Preprocessing
		20.5.2 NLP/TM Analytics
		20.5.3 Prediction Optimization
	20.6 Assignment: 20. Natural Language Processing/Text Mining
		20.6.1 Mining Twitter Data
		20.6.2 Mining Cancer Clinical Notes
	References
Chapter 21: Prediction and Internal Statistical Cross Validation
	21.1 Forecasting Types and Assessment Approaches
	21.2 Overfitting
		21.2.1 Example (US Presidential Elections)
		21.2.2 Example (Google Flu Trends)
		21.2.3 Example (Autism)
	21.3 Internal Statistical Cross-Validation is an Iterative Process
	21.4 Example (Linear Regression)
		21.4.1 Cross-Validation Methods
		21.4.2 Exhaustive Cross-Validation
		21.4.3 Non-Exhaustive Cross-Validation
	21.5 Case-Studies
		21.5.1 Example 1: Prediction of Parkinson´s Disease Using Adaptive Boosting (AdaBoost)
		21.5.2 Example 2: Sleep Dataset
		21.5.3 Example 3: Model-Based (Linear Regression) Prediction Using the Attitude Dataset
		21.5.4 Example 4: Parkinson´s Data (ppmi_data)
	21.6 Summary of CV output
	21.7 Alternative Predictor Functions
		21.7.1 Logistic Regression
		21.7.2 Quadratic Discriminant Analysis (QDA)
		21.7.3 Foundation of LDA and QDA for Prediction, Dimensionality Reduction, and Forecasting
			LDA (Linear Discriminant Analysis)
			QDA (Quadratic Discriminant Analysis)
		21.7.4 Neural Networks
		21.7.5 SVM
		21.7.6 k-Nearest Neighbors Algorithm (k-NN)
		21.7.7 k-Means Clustering (k-MC)
		21.7.8 Spectral Clustering
			Iris Petal Data
			Spirals Data
			Income Data
	21.8 Compare the Results
	21.9 Assignment: 21. Prediction and Internal Statistical Cross-Validation
	References
Chapter 22: Function Optimization
	22.1 Free (Unconstrained) Optimization
		22.1.1 Example 1: Minimizing a Univariate Function (Inverse-CDF)
		22.1.2 Example 2: Minimizing a Bivariate Function
		22.1.3 Example 3: Using Simulated Annealing to Find the Maximum of an Oscillatory Function
	22.2 Constrained Optimization
		22.2.1 Equality Constraints
		22.2.2 Lagrange Multipliers
		22.2.3 Inequality Constrained Optimization
			Linear Programming (LP)
			Mixed Integer Linear Programming (MILP)
		22.2.4 Quadratic Programming (QP)
	22.3 General Non-linear Optimization
		22.3.1 Dual Problem Optimization
			Motivation
			Example 1: Linear Example
			Example 2: Quadratic Example
			Example 3: More Complex Non-linear Optimization
			Example 4: Another Linear Example
	22.4 Manual Versus Automated Lagrange Multiplier Optimization
	22.5 Data Denoising
	22.6 Assignment: 22. Function Optimization
		22.6.1 Unconstrained Optimization
		22.6.2 Linear Programming (LP)
		22.6.3 Mixed Integer Linear Programming (MILP)
		22.6.4 Quadratic Programming (QP)
		22.6.5 Complex Non-linear Optimization
		22.6.6 Data Denoising
	References
Chapter 23: Deep Learning, Neural Networks
	23.1 Deep Learning Training
		23.1.1 Perceptrons
	23.2 Biological Relevance
	23.3 Simple Neural Net Examples
		23.3.1 Exclusive OR (XOR) Operator
		23.3.2 NAND Operator
		23.3.3 Complex Networks Designed Using Simple Building Blocks
	23.4 Classification
		23.4.1 Sonar Data Example
		23.4.2 MXNet Notes
	23.5 Case-Studies
		23.5.1 ALS Regression Example
		23.5.2 Spirals 2D Data
		23.5.3 IBS Study
		23.5.4 Country QoL Ranking Data
		23.5.5 Handwritten Digits Classification
			Configuring the Neural Network
			Training
			Forecasting
			Examining the Network Structure Using LeNet
	23.6 Classifying Real-World Images
		23.6.1 Load the Pre-trained Model
		23.6.2 Load, Preprocess and Classify New Images - US Weather Pattern
		23.6.3 Lake Mapourika, New Zealand
		23.6.4 Beach Image
		23.6.5 Volcano
		23.6.6 Brain Surface
		23.6.7 Face Mask
	23.7 Assignment: 23. Deep Learning, Neural Networks
		23.7.1 Deep Learning Classification
		23.7.2 Deep Learning Regression
		23.7.3 Image Classification
	References
Summary
Glossary




نظرات کاربران