دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Alan Bernardo Palacio
سری:
ISBN (شابک) : 183864721X, 9781838647216
ناشر: Packt Publishing
سال نشر: 2021
تعداد صفحات: 414
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 18 مگابایت
در صورت تبدیل فایل کتاب Distributed Data Systems with Azure Databricks: Create, deploy, and manage enterprise data pipelines به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب سیستم های داده توزیع شده با Azure Databricks: خطوط لوله داده های سازمانی را ایجاد، استقرار و مدیریت کنید نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Title Page Copyright and Credits Contributors Table of Contents Preface Section 1: Introducing Databricks Chapter 1: Introduction to Azure Databricks Technical requirements Introducing Apache Spark Introducing Azure Databricks Examining the architecture of Databricks Discovering core concepts and terminology Interacting with the Azure Databricks workspace Workspace assets Workspace object operations Using Azure Databricks notebooks Creating and managing notebooks Notebooks and clusters Exploring data management Databases and tables Viewing databases and tables Importing data Creating a table Table details Exploring computation management Displaying clusters Starting a cluster Terminating a cluster Deleting a cluster Cluster information Cluster logs Exploring authentication and authorization Clustering access control Folder permissions Notebook permissions MLflow Model permissions Summary Chapter 2: Creating an Azure Databricks Workspace Technical requirements Using the Azure portal UI Accessing the Workspace UI Configuring an Azure Databricks cluster Creating a new notebook Examining Azure Databricks authentication Access control Working with VNets in Azure Databricks Virtual network requirements Deploying to your own VNet Azure Resource Manager templates Creating an Azure Databricks workspace with an ARM template Reviewing deployed resources Cleaning up resources Setting up the Azure Databricks CLI Authentication through an access token Authentication using an Azure AD token Validating the installation Workspace CLI Using the CLI to explore the workplace Clusters CLI Jobs CLI Groups API The Databricks CLI from Azure Cloud Shell Summary Section 2: Data Pipelines with Databricks Chapter 3: Creating ETL Operations with Azure Databricks Technical requirements Using ADLS Gen2 Setting up a basic ADLS Gen2 data lake Uploading data to ADLS Gen2 Accessing ADLS Gen2 from Azure Databricks Loading data from ADLS Gen2 Using S3 with Azure Databricks Connecting to S3 Loading data into a Spark DataFrame Using Azure Blob storage with Azure Databricks Setting up Azure Blob storage Uploading files and access keys Setting up the connection to Azure Blob storage Transforming and cleaning data Spark data frames Querying using SQL Writing back table data to Azure Data Lake Orchestrating jobs with Azure Databricks ADF Creating an ADF resource Creating an ETL in ADF Scheduling jobs with Azure Databricks Scheduling a notebook as a job Job logs Summary Chapter 5: Introducing Delta Engine Technical requirements Optimizing file management with Delta Engine Merging small files using bin-packing Skipping data Using ZORDER clustering Managing data recency Understanding checkpoints Automatically optimizing files with Delta Engine Using caching to improve performance Delta and Apache Spark caching Caching a subset of the data Configuring the Delta cache Optimizing queries using DFP Using DFP Using Bloom filters Understanding Bloom filters Bloom filters in Azure Databricks Creating a Bloom filter index Optimizing join performance Range join optimization Enabling range join optimization Skew join optimization Relationships and columns Summary Chapter 6: Introducing Structured Streaming Technical requirements Structured Streaming model Using the Structured Streaming API Mapping, filtering, and running aggregations Windowed aggregations on event time Merging streaming and static data Interactive queries Using different sources available in Azure Databricks when dealing with continuous streams of data Using a Delta table as a stream source Azure Event Hubs Auto Loader Apache Kafka Avro data Data sinks Recovering from query failures Optimizing streaming queries Triggering streaming query executions Different kinds of triggers Trigger examples Visualizing data on streaming data frames Example on Structured Streaming Summary Section 3: Machine and Deep Learning with Databricks Chapter 7: Using Python Libraries in Azure Databricks Technical requirements Installing libraries in Azure Databricks Workspace libraries Cluster libraries Notebook-scoped Python libraries PySpark API Main functionalities of PySpark Operating with PySpark DataFrames pandas Dataframe API (Koalas) Using the Koalas API Using SQL in Koalas Working with PySpark Visualizing data Bokeh Matplotlib Plotly Summary Chapter 8: Databricks Runtime for Machine Learning Loading data Reading data from DBFS Reading CSV files Feature engineering Tokenizer Binarizer Polynomial expansion StringIndexer One-hot encoding VectorIndexer Normalizer StandardScaler Bucketizer Element-wise product Time-series data sources Joining time-series data Using the Koalas API Handling missing values Extracting features from text TF-IDF Word2vec Training machine learning models on tabular data Engineering the variables Building the ML model Registering the model in the MLflow Model Registry Model serving Summary Chapter 9: Databricks Runtime for Deep Learning Technical requirements Loading data for deep learning Using TFRecords for distributed learning Structuring TFRecords files Managing data using TFRecords Automating schema inference Using TFRecordDataset to load data Using Petastorm for distributed learning Introducing Petastorm Generating a dataset Reading a dataset Using Petastorm to prepare data for deep learning Data preprocessing and featurization Featurization using a pre-trained model for transfer learning Featurization using pandas UDFs Applying featurization to the DataFrame of images Summary Chapter 10: Model Tracking and Tuning in Azure Databricks Technical requirements Tuning hyperparameters with AutoML Automating model tracking with MLflow Managing MLflow runs Automating MLflow tracking with MLlib Hyperparameter tuning with Hyperopt Hyperopt concepts Defining a search space Applying best practices in Hyperopt Optimizing model selection with scikit-learn, Hyperopt, and MLflow Summary Chapter 11: Managing and Serving Models with MLflow and MLeap Technical requirements Managing machine learning models Using MLflow notebook experiments Registering a model using the MLflow API Transitioning a model stage Model Registry example Exporting and loading pipelines with MLeap Serving models with MLflow Scoring a model Summary Chapter 12: Distributed Deep Learning in Azure Databricks Technical requirements Distributed training for deep learning The ring allreduce technique Using the Horovod distributed learning library in Azure Databricks Installing the horovod library Using the horovod library Training a model on a single node Distributing training with HorovodRunner Distributing hyperparameter tuning using Horovod and Hyperopt Using the Spark TensorFlow Distributor package Summary Other Books You May Enjoy Index