دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: 1
نویسندگان: Shreyas Subramanian
سری:
ISBN (شابک) : 1394240724, 9781394240722
ناشر: Wiley
سال نشر: 2024
تعداد صفحات: 221
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 11 مگابایت
در صورت تبدیل فایل کتاب Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications (Tech Today) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب راه حل های مبتنی بر مدل زبان بزرگ: نحوه ارائه ارزش با برنامه های مقرون به صرفه هوش مصنوعی (Tech Today) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Cover Contents At A Glance Title Page Copyright Page Dedication Page About the Author About the Technical Editor Contents Introduction GenAI Applications and Large Language Models Importance of Cost Optimization Challenges and Opportunities Micro Case Studies OpenAI: Leading the Way Hugging Face: Open-Source Community Building Bloomberg GPT: LLMs in Large Commercial Institutions Who Is This Book For? Summary Chapter 1 Introduction Overview of GenAI Applications and Large Language Models The Rise of Large Language Models Neural Networks, Transformers, and Beyond GenAI vs. LLMs: What’s the Difference? The Three-Layer GenAI Application Stack The Infrastructure Layer The Model Layer The Application Layer Paths to Productionizing GenAI Applications Sample LLM-Powered Chat Application The Importance of Cost Optimization Cost Assessment of the Model Inference Component Cost Assessment of the Vector Database Component Benchmarking Setup and Results Other Factors to Consider Cost Assessment of the Large Language Model Component Summary Chapter 2 Tuning Techniques for Cost Optimization Fine-Tuning and Customizability Basic Scaling Laws You Should Know Parameter-Efficient Fine-Tuning Methods Adapters Under the Hood Prompt Tuning Prefix Tuning P-tuning IA3 Low-Rank Adaptation Cost and Performance Implications of PEFT Methods Summary Chapter 3 Inference Techniques for Cost Optimization Introduction to Inference Techniques Prompt Engineering Impact of Prompt Engineering on Cost Estimating Costs for Other Models Clear and Direct Prompts Adding Qualifying Words for Brief Responses Breaking Down the Request Example of Using Claude for PII Removal Conclusion Providing Context Examples of Providing Context RAG and Long Context Models Recent Work Comparing RAG with Long Content Models Conclusion Context and Model Limitations Indicating a Desired Format Example of Formatted Extraction with Claude Trade-Off Between Verbosity and Clarity Caching with Vector Stores What Is a Vector Store? How to Implement Caching Using Vector Stores Conclusion Chains for Long Documents What Is Chaining? Implementing Chains Example Use Case Common Components Tools That Implement Chains Comparing Results Conclusion Summarization Summarization in the Context of Cost and Performance Efficiency in Data Processing Cost-Effective Storage Enhanced Downstream Applications Improved Cache Utilization Summarization as a Preprocessing Step Enhanced User Experience Conclusion Batch Prompting for Efficient Inference Batch Inference Experimental Results Using the accelerate Library Using the DeepSpeed Library Batch Prompting Example of Using Batch Prompting Model Optimization Methods Quantization Code Example Recent Advancements: GPTQ Parameter-Efficient Fine-Tuning Methods Recap of PEFT Methods Code Example Cost and Performance Implications Summary References Chapter 4 Model Selection and Alternatives Introduction to Model Selection Motivating Example: The Tale of Two Models The Role of Compact and Nimble Models Examples of Successful Smaller Models Quantization for Powerful but Smaller Models Text Generation with Mistral 7B Zephyr 7B and Aligned Smaller Models CogVLM for Language-Vision Multimodality Prometheus for Fine-Grained Text Evaluation Orca 2 and Teaching Smaller Models to Reason Breaking Traditional Scaling Laws with Gemini and Phi Phi 1, 1.5, and 2 B Models Gemini Models Domain-Specific Models Step 1 - Training Your Own Tokenizer Step 2 - Training Your Own Domain-Specific Model More References for Fine-Tuning Evaluating Domain-Specific Models vs. Generic Models The Power of Prompting with General-Purpose Models Summary Chapter 5 Infrastructure and Deployment Tuning Strategies Introduction to Tuning Strategies Hardware Utilization and Batch Tuning Memory Occupancy Strategies to Fit Larger Models in Memory KV Caching PagedAttention How Does PagedAttention Work? Comparisons, Limitations, and Cost Considerations AlphaServe How Does AlphaServe Work? Impact of Batching Cost and Performance Considerations S3: Scheduling Sequences with Speculation How Does S3 Work? Performance and Cost Streaming LLMs with Attention Sinks Fixed to Sliding Window Attention Extending the Context Length Working with Infinite Length Context How Does StreamingLLM Work? Performance and Results Cost Considerations Batch Size Tuning Frameworks for Deployment Configuration Testing Cloud-NativeInference Frameworks Deep Dive into Serving Stack Choices Batching Options Options in DJL Serving High-Level Guidance for Selecting Serving Parameters Automatically Finding Good Inference Configurations Creating a Generic Template Defining a HPO Space Searching the Space for Optimal Configurations Results of Inference HPO Inference Acceleration Tools TensorRT and GPU Acceleration Tools CPU Acceleration Tools Monitoring and Observability LLMOps and Monitoring Why Is Monitoring Important for LLMs? Monitoring and Updating Guardrails Summary Conclusion Index EULA