دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 1
نویسندگان: Denis Bakhvalov
سری:
ISBN (شابک) : 9798575614234
ناشر: easyperfect.net
سال نشر: 2020
تعداد صفحات: 175
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 6 مگابایت
در صورت تبدیل فایل کتاب Performance analysis and tuning on modern cpus به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب تجزیه و تحلیل عملکرد و تنظیم در cpus مدرن نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Table Of Contents 1 Introduction 1.1 Why Do We Still Need Performance Tuning? 1.2 Who Needs Performance Tuning? 1.3 What Is Performance Analysis? 1.4 What is discussed in this book? 1.5 What is not in this book? 1.6 Chapter Summary Part1. Performance analysis on a modern CPU 2 Measuring Performance 2.1 Noise In Modern Systems 2.2 Measuring Performance In Production 2.3 Automated Detection of Performance Regressions 2.4 Manual Performance Testing 2.5 Software and Hardware Timers 2.6 Microbenchmarks 2.7 Chapter Summary 3 CPU Microarchitecture 3.1 Instruction Set Architecture 3.2 Pipelining 3.3 Exploiting Instruction Level Parallelism (ILP) 3.3.1 OOO Execution 3.3.2 Superscalar Engines and VLIW 3.3.3 Speculative Execution 3.4 Exploiting Thread Level Parallelism 3.4.1 Simultaneous Multithreading 3.5 Memory Hierarchy 3.5.1 Cache Hierarchy 3.5.1.1 Placement of data within the cache. 3.5.1.2 Finding data in the cache. 3.5.1.3 Managing misses. 3.5.1.4 Managing writes. 3.5.1.5 Other cache optimization techniques. 3.5.2 Main Memory 3.6 Virtual Memory 3.7 SIMD Multiprocessors 3.8 Modern CPU design 3.8.1 CPU Front-End 3.8.2 CPU Back-End 3.9 Performance Monitoring Unit 3.9.1 Performance Monitoring Counters 4 Terminology and metrics in performance analysis 4.1 Retired vs. Executed Instruction 4.2 CPU Utilization 4.3 CPI & IPC 4.4 UOPs (micro-ops) 4.5 Pipeline Slot 4.6 Core vs. Reference Cycles 4.7 Cache miss 4.8 Mispredicted branch 5 Performance Analysis Approaches 5.1 Code Instrumentation 5.2 Tracing 5.3 Workload Characterization 5.3.1 Counting Performance Events 5.3.2 Manual performance counters collection 5.3.3 Multiplexing and scaling events 5.4 Sampling 5.4.1 User-Mode And Hardware Event-based Sampling 5.4.2 Finding Hotspots 5.4.3 Collecting Call Stacks 5.4.4 Flame Graphs 5.5 Roofline Performance Model 5.6 Static Performance Analysis 5.6.1 Static vs. Dynamic Analyzers 5.7 Compiler Optimization Reports 5.8 Chapter Summary 6 CPU Features For Performance Analysis 6.1 Top-Down Microarchitecture Analysis 6.1.1 TMA in Intel® VTune™ Profiler 6.1.2 TMA in Linux Perf 6.1.3 Step1: Identify the bottleneck 6.1.4 Step2: Locate the place in the code 6.1.5 Step3: Fix the issue 6.1.6 Summary 6.2 Last Branch Record 6.2.1 Collecting LBR stacks 6.2.2 Capture call graph 6.2.3 Identify hot branches 6.2.4 Analyze branch misprediction rate 6.2.5 Precise timing of machine code 6.2.6 Estimating branch outcome probability 6.2.7 Other use cases 6.3 Processor Event-Based Sampling 6.3.1 Precise events 6.3.2 Lower sampling overhead 6.3.3 Analyzing memory accesses 6.4 Intel Processor Traces 6.4.1 Workflow 6.4.2 Timing Packets 6.4.3 Collecting and Decoding Traces 6.4.4 Usages 6.4.5 Disk Space and Decoding Time 6.5 Chapter Summary Part2. Source Code Tuning For CPU 7 CPU Front-End Optimizations 7.1 Machine code layout 7.2 Basic Block 7.3 Basic block placement 7.4 Basic block alignment 7.5 Function splitting 7.6 Function grouping 7.7 Profile Guided Optimizations 7.8 Optimizing for ITLB 7.9 Chapter Summary 8 CPU Back-End Optimizations 8.1 Memory Bound 8.1.1 Cache-Friendly Data Structures 8.1.1.1 Access data sequentially. 8.1.1.2 Use appropriate containers. 8.1.1.3 Packing the data. 8.1.1.4 Aligning and padding. 8.1.1.5 Dynamic memory allocation. 8.1.1.6 Tune the code for memory hierarchy. 8.1.2 Explicit Memory Prefetching 8.1.3 Optimizing For DTLB 8.1.3.1 Explicit Hugepages. 8.1.3.2 Transparent Hugepages. 8.1.3.3 Explicit vs. Transparent Hugepages. 8.2 Core Bound 8.2.1 Inlining Functions 8.2.2 Loop Optimizations 8.2.2.1 Low-level optimizations. 8.2.2.2 High-level optimizations. 8.2.2.3 Discovering loop optimization opportunities. 8.2.2.4 Use Loop Optimization Frameworks 8.2.3 Vectorization 8.2.3.1 Compiler Autovectorization. 8.2.3.2 Discovering vectorization opportunities. 8.2.3.3 Vectorization is illegal. 8.2.3.4 Vectorization is not beneficial. 8.2.3.5 Loop vectorized but scalar version used. 8.2.3.6 Loop vectorized in a suboptimal way. 8.2.3.7 Use languages with explicit vectorization. 8.3 Chapter Summary 9 Optimizing Bad Speculation 9.1 Replace branches with lookup 9.2 Replace branches with predication 9.3 Chapter Summary 10 Other Tuning Areas 10.1 Compile-Time Computations 10.2 Compiler Intrinsics 10.3 Cache Warming 10.4 Detecting Slow FP Arithmetic 10.5 System Tuning 11 Optimizing Multithreaded Applications 11.1 Performance Scaling And Overhead 11.2 Parallel Efficiency Metrics 11.2.1 Effective CPU Utilization 11.2.2 Thread Count 11.2.3 Wait Time 11.2.4 Spin Time 11.3 Analysis With Intel VTune Profiler 11.3.1 Find Expensive Locks 11.3.2 Platform View 11.4 Analysis with Linux Perf 11.4.1 Find Expensive Locks 11.5 Analysis with Coz 11.6 Analysis with eBPF and GAPP 11.7 Detecting Coherence Issues 11.7.1 Cache Coherency Protocols 11.7.2 True Sharing 11.7.3 False Sharing 11.8 Chapter Summary Epilog Glossary References Appendix A. Reducing Measurement Noise Appendix B. The LLVM Vectorizer