ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications (Tech Today)

دانلود کتاب راه حل های مبتنی بر مدل زبان بزرگ: نحوه ارائه ارزش با برنامه های مقرون به صرفه هوش مصنوعی (Tech Today)

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications (Tech Today)

مشخصات کتاب

Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications (Tech Today)

ویرایش: 1 
نویسندگان:   
سری:  
ISBN (شابک) : 1394240724, 9781394240722 
ناشر: Wiley 
سال نشر: 2024 
تعداد صفحات: 221 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 11 مگابایت 

قیمت کتاب (تومان) : 73,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 3


در صورت تبدیل فایل کتاب Large Language Model-Based Solutions: How to Deliver Value with Cost-Effective Generative AI Applications (Tech Today) به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب راه حل های مبتنی بر مدل زبان بزرگ: نحوه ارائه ارزش با برنامه های مقرون به صرفه هوش مصنوعی (Tech Today) نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی درمورد کتاب به خارجی



فهرست مطالب

Cover
Contents At A Glance
Title Page
Copyright Page
Dedication Page
About the Author
About the Technical Editor
Contents
Introduction
	GenAI Applications and Large Language Models
	Importance of Cost Optimization
		Challenges and Opportunities
	Micro Case Studies
		OpenAI: Leading the Way
		Hugging Face: Open-Source Community Building
		Bloomberg GPT: LLMs in Large Commercial Institutions
	Who Is This Book For?
	Summary
Chapter 1 Introduction
	Overview of GenAI Applications and Large Language Models
		The Rise of Large Language Models
		Neural Networks, Transformers, and Beyond
		GenAI vs. LLMs: What’s the Difference?
		The Three-Layer GenAI Application Stack
			The Infrastructure Layer
			The Model Layer
			The Application Layer
	Paths to Productionizing GenAI Applications
		Sample LLM-Powered Chat Application
	The Importance of Cost Optimization
		Cost Assessment of the Model Inference Component
		Cost Assessment of the Vector Database Component
			Benchmarking Setup and Results
			Other Factors to Consider
		Cost Assessment of the Large Language Model Component
	Summary
Chapter 2 Tuning Techniques for Cost Optimization
	Fine-Tuning and Customizability
		Basic Scaling Laws You Should Know
	Parameter-Efficient Fine-Tuning Methods
		Adapters Under the Hood
			Prompt Tuning
			Prefix Tuning
			P-tuning
			IA3
		Low-Rank Adaptation
	Cost and Performance Implications of PEFT Methods
	Summary
Chapter 3 Inference Techniques for Cost Optimization
	Introduction to Inference Techniques
	Prompt Engineering
		Impact of Prompt Engineering on Cost
			Estimating Costs for Other Models
		Clear and Direct Prompts
			Adding Qualifying Words for Brief Responses
			Breaking Down the Request
			Example of Using Claude for PII Removal
			Conclusion
		Providing Context
			Examples of Providing Context
			RAG and Long Context Models
			Recent Work Comparing RAG with Long Content Models
			Conclusion
			Context and Model Limitations
		Indicating a Desired Format
			Example of Formatted Extraction with Claude
			Trade-Off Between Verbosity and Clarity
	Caching with Vector Stores
		What Is a Vector Store?
		How to Implement Caching Using Vector Stores
		Conclusion
	Chains for Long Documents
		What Is Chaining?
		Implementing Chains
			Example Use Case
			Common Components
			Tools That Implement Chains
			Comparing Results
			Conclusion
	Summarization
		Summarization in the Context of Cost and Performance
			Efficiency in Data Processing
			Cost-Effective Storage
			Enhanced Downstream Applications
			Improved Cache Utilization
			Summarization as a Preprocessing Step
			Enhanced User Experience
			Conclusion
	Batch Prompting for Efficient Inference
		Batch Inference
			Experimental Results
			Using the accelerate Library
			Using the DeepSpeed Library
		Batch Prompting
			Example of Using Batch Prompting
	Model Optimization Methods
		Quantization
		Code Example
		Recent Advancements: GPTQ
	Parameter-Efficient Fine-Tuning Methods
		Recap of PEFT Methods
		Code Example
	Cost and Performance Implications
	Summary
	References
Chapter 4 Model Selection and Alternatives
	Introduction to Model Selection
	Motivating Example: The Tale of Two Models
	The Role of Compact and Nimble Models
	Examples of Successful Smaller Models
		Quantization for Powerful but Smaller Models
		Text Generation with Mistral 7B
		Zephyr 7B and Aligned Smaller Models
		CogVLM for Language-Vision Multimodality
		Prometheus for Fine-Grained Text Evaluation
		Orca 2 and Teaching Smaller Models to Reason
		Breaking Traditional Scaling Laws with Gemini and Phi
		Phi 1, 1.5, and 2 B Models
		Gemini Models
	Domain-Specific Models
		Step 1 - Training Your Own Tokenizer
		Step 2 - Training Your Own Domain-Specific Model
			More References for Fine-Tuning
			Evaluating Domain-Specific Models vs. Generic Models
	The Power of Prompting with General-Purpose Models
	Summary
Chapter 5 Infrastructure and Deployment Tuning Strategies
	Introduction to Tuning Strategies
	Hardware Utilization and Batch Tuning
		Memory Occupancy
		Strategies to Fit Larger Models in Memory
		KV Caching
		PagedAttention
			How Does PagedAttention Work?
			Comparisons, Limitations, and Cost Considerations
		AlphaServe
			How Does AlphaServe Work?
			Impact of Batching
			Cost and Performance Considerations
		S3: Scheduling Sequences with Speculation
			How Does S3 Work?
			Performance and Cost
		Streaming LLMs with Attention Sinks
			Fixed to Sliding Window Attention
			Extending the Context Length
			Working with Infinite Length Context
			How Does StreamingLLM Work?
			Performance and Results
			Cost Considerations
		Batch Size Tuning
			Frameworks for Deployment Configuration Testing
			Cloud-NativeInference Frameworks
			Deep Dive into Serving Stack Choices
			Batching Options
			Options in DJL Serving
			High-Level Guidance for Selecting Serving Parameters
		Automatically Finding Good Inference Configurations
			Creating a Generic Template
			Defining a HPO Space
			Searching the Space for Optimal Configurations
			Results of Inference HPO
	Inference Acceleration Tools
		TensorRT and GPU Acceleration Tools
		CPU Acceleration Tools
	Monitoring and Observability
		LLMOps and Monitoring
			Why Is Monitoring Important for LLMs?
			Monitoring and Updating Guardrails
	Summary
Conclusion
Index
EULA




نظرات کاربران