دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش:
نویسندگان: Gerhard Paaß. Sven Giesselbach
سری:
ISBN (شابک) : 9783031231896, 9783031231902
ناشر: Springer
سال نشر: 2023
تعداد صفحات: 448
زبان: English
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود)
حجم فایل: 24 Mb
در صورت تبدیل فایل کتاب Foundation Models for Natural Language Processing: Pre-trained Language Models Integrating Media به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب مدلهای بنیادی برای پردازش زبان طبیعی: مدلهای زبانی از پیش آموزشدیده یکپارچهسازی رسانه نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
این کتاب دسترسی آزاد یک نمای کلی از وضعیت هنر در تحقیقات و کاربردهای مدل های بنیادی ارائه می دهد و برای خوانندگانی که با مفاهیم اولیه پردازش زبان طبیعی (NLP) آشنا هستند در نظر گرفته شده است. در سالهای اخیر، یک الگوی انقلابی جدید برای مدلهای آموزشی برای NLP ایجاد شده است. این مدلها ابتدا روی مجموعههای بزرگی از اسناد متنی برای کسب دانش نحوی و اطلاعات معنایی از قبل آموزش داده شدهاند. سپس، آنها برای کارهای خاص تنظیم می شوند، که اغلب می توانند آنها را با دقت فوق بشری حل کنند. وقتی مدلها به اندازه کافی بزرگ هستند، میتوان با اعلانها به آنها دستور داد تا کارهای جدید را بدون تنظیم دقیق حل کنند. علاوه بر این، آنها را می توان در طیف گسترده ای از رسانه های مختلف و حوزه های مشکل، از پردازش تصویر و ویدئو تا یادگیری کنترل ربات اعمال کرد. از آنجا که آنها طرحی برای حل بسیاری از وظایف در هوش مصنوعی ارائه می دهند، آنها را مدل های بنیادی نامیده اند. پس از مقدمهای کوتاه بر مدلهای پایه NLP، مدلهای زبان اصلی از پیش آموزشدیده BERT، GPT و ترانسفورماتور ترتیب به دنباله، و همچنین مفاهیم خودتوجهی و تعبیههای حساس به زمینه توضیح داده میشوند. سپس، رویکردهای مختلف برای بهبود این مدلها مورد بحث قرار میگیرد، مانند گسترش معیارهای پیشآموزشی، افزایش طول متون ورودی یا گنجاندن دانش اضافی. سپس یک نمای کلی از بهترین مدلها برای حدود بیست حوزه کاربردی ارائه میشود، به عنوان مثال، پاسخ به سؤال، ترجمه، تولید داستان، سیستمهای گفتگو، تولید تصاویر از متن، و غیره. برای هر حوزه کاربردی، نقاط قوت و ضعف مدلهای فعلی عبارتند از مورد بحث قرار گرفته و چشم انداز تحولات بعدی ارائه شده است. علاوه بر این، پیوندهایی به کد برنامه رایگان در دسترس ارائه می شود. فصل پایانی فرصتهای اقتصادی، کاهش ریسکها و پیشرفتهای بالقوه هوش مصنوعی را خلاصه میکند.
This open access book provides a comprehensive overview of the state of the art in research and applications of Foundation Models and is intended for readers familiar with basic Natural Language Processing (NLP) concepts. Over the recent years, a revolutionary new paradigm has been developed for training models for NLP. These models are first pre-trained on large collections of text documents to acquire general syntactic knowledge and semantic information. Then, they are fine-tuned for specific tasks, which they can often solve with superhuman accuracy. When the models are large enough, they can be instructed by prompts to solve new tasks without any fine-tuning. Moreover, they can be applied to a wide range of different media and problem domains, ranging from image and video processing to robot control learning. Because they provide a blueprint for solving many tasks in artificial intelligence, they have been called Foundation Models. After a brief introduction to basic NLP models the main pre-trained language models BERT, GPT and sequence-to-sequence transformer are described, as well as the concepts of self-attention and context-sensitive embedding. Then, different approaches to improving these models are discussed, such as expanding the pre-training criteria, increasing the length of input texts, or including extra knowledge. An overview of the best-performing models for about twenty application areas is then presented, e.g., question answering, translation, story generation, dialog systems, generating images from text, etc. For each application area, the strengths and weaknesses of current models are discussed, and an outlook on further developments is given. In addition, links are provided to freely available program code. A concluding chapter summarizes the economic opportunities, mitigation of risks, and potential developments of AI.
Foreword Preface Acknowledgments Contents About the Authors 1 Introduction 1.1 Scope of the Book 1.2 Preprocessing of Text 1.3 Vector Space Models and Document Classification 1.4 Nonlinear Classifiers 1.5 Generating Static Word Embeddings 1.6 Recurrent Neural Networks 1.7 Convolutional Neural Networks 1.8 Summary References 2 Pre-trained Language Models 2.1 BERT: Self-Attention and Contextual Embeddings 2.1.1 BERT Input Embeddings and Self-Attention Self-Attention to Generate Contextual Embeddings 2.1.2 Training BERT by Predicting Masked Tokens 2.1.3 Fine-Tuning BERT to Downstream Tasks 2.1.4 Visualizing Attentions and Embeddings 2.1.5 Natural Language Understanding by BERT BERT's Performance on Other Fine-Tuning Tasks 2.1.6 Computational Complexity 2.1.7 Summary 2.2 GPT: Autoregressive Language Models 2.2.1 The Task of Autoregressive Language Models 2.2.2 Training GPT by Predicting the Next Token Visualizing GPT Embeddings 2.2.3 Generating a Sequence of Words 2.2.4 The Advanced Language Model GPT-2 2.2.5 Fine-Tuning GPT 2.2.6 Summary 2.3 Transformer: Sequence-to-Sequence Translation 2.3.1 The Transformer Architecture Cross-Attention 2.3.2 Decoding a Translation to Generate the Words 2.3.3 Evaluation of a Translation 2.3.4 Pre-trained Language Models and Foundation Models Available Implementations 2.3.5 Summary 2.4 Training and Assessment of Pre-trained Language Models 2.4.1 Optimization of PLMs Basics of PLM Optimization Variants of Stochastic Gradient Descent Parallel Training for Large Models 2.4.2 Regularization of Pre-trained Language Models 2.4.3 Neural Architecture Search 2.4.4 The Uncertainty of Model Predictions Bayesian Neural Networks Estimating Uncertainty by a Single Deterministic Model Representing the Predictive Distribution by Ensembles 2.4.5 Explaining Model Predictions Linear Local Approximations Nonlinear Local Approximations Explanation by Retrieval Explanation by Generating a Chain of Thought 2.4.6 Summary References 3 Improving Pre-trained Language Models 3.1 Modifying Pre-training Objectives 3.1.1 Autoencoders Similar to BERT 3.1.2 Autoregressive Language Models Similar to GPT 3.1.3 Transformer Encoder-Decoders 3.1.4 Systematic Comparison of Transformer Variants 3.1.5 Summary 3.2 Capturing Longer Dependencies 3.2.1 Sparse Attention Matrices 3.2.2 Hashing and Low-Rank Approximations 3.2.3 Comparisons of Transformers with Long Input Sequences 3.2.4 Summary 3.3 Multilingual Pre-trained Language Models 3.3.1 Autoencoder Models 3.3.2 Seq2seq Transformer Models 3.3.3 Autoregressive Language Models 3.3.4 Summary 3.4 Additional Knowledge for Pre-trained Language Models 3.4.1 Exploiting Knowledge Base Embeddings 3.4.2 Pre-trained Language Models for Graph Learning 3.4.3 Textual Encoding of Tables 3.4.4 Textual Encoding of Knowledge Base Relations 3.4.5 Enhancing Pre-trained Language Models by Retrieved Texts 3.4.6 Summary 3.5 Changing Model Size 3.5.1 Larger Models Usually Have a better Performance 3.5.2 Mixture-of-Experts Models 3.5.3 Parameter Compression and Reduction 3.5.4 Low-Rank Factorization 3.5.5 Knowledge Distillation 3.5.6 Summary 3.6 Fine-Tuning for Specific Applications 3.6.1 Properties of Fine-Tuning Catastrophic Forgetting Fine-Tuning and Overfitting 3.6.2 Fine-Tuning Variants Fine-Tuning in Two Stages Fine-Tuning for Multiple Tasks Meta-Learning to Accelerate Fine-Tuning Fine-Tuning a Frozen Model by Adapters Fine-Tuning GPT-3 3.6.3 Creating Few-Shot Prompts 3.6.4 Thought Chains for Few-Shot Learning of Reasoning 3.6.5 Fine-Tuning Models to Execute Instructions InstructGPT Results Instruction Tuning with FLAN 3.6.6 Generating Labeled Data by Foundation Models 3.6.7 Summary References 4 Knowledge Acquired by Foundation Models 4.1 Benchmark Collections 4.1.1 The GLUE Benchmark Collection 4.1.2 SuperGLUE: An Advanced Version of GLUE 4.1.3 Text Completion Benchmarks 4.1.4 Large Benchmark Collections 4.1.5 Summary 4.2 Evaluating Knowledge by Probing Classifiers 4.2.1 BERT's Syntactic Knowledge 4.2.2 Common Sense Knowledge 4.2.3 Logical Consistency Improving Logical Consistency 4.2.4 Summary 4.3 Transferability and Reproducibility of Benchmarks 4.3.1 Transferability of Benchmark Results Benchmarks May Not Test All Aspects Logical Reasoning by Correlation 4.3.2 Reproducibility of Published Results in Natural Language Processing Available Implementations 4.3.3 Summary References 5 Foundation Models for Information Extraction 5.1 Text Classification 5.1.1 Multiclass Classification with Exclusive Classes 5.1.2 Multilabel Classification 5.1.3 Few- and Zero-Shot Classification Available Implementations 5.1.4 Summary 5.2 Word Sense Disambiguation 5.2.1 Sense Inventories 5.2.2 Models Available Implementations 5.2.3 Summary 5.3 Named Entity Recognition 5.3.1 Flat Named Entity Recognition 5.3.2 Nested Named Entity Recognition Available Implementations 5.3.3 Entity Linking Available Implementations 5.3.4 Summary 5.4 Relation Extraction 5.4.1 Coreference Resolution Available Implementations 5.4.2 Sentence-Level Relation Extraction 5.4.3 Document-Level Relation Extraction 5.4.4 Joint Entity and Relation Extraction Aspect-Based Sentiment Analysis Semantic Role Labeling Extracting Knowledge Graphs from Pre-trained PLMs 5.4.5 Distant Supervision 5.4.6 Relation Extraction Using Layout Information Available Implementations 5.4.7 Summary References 6 Foundation Models for Text Generation 6.1 Document Retrieval 6.1.1 Dense Retrieval 6.1.2 Measuring Text Retrieval Performance 6.1.3 Cross-Encoders with BERT 6.1.4 Using Token Embeddings for Retrieval 6.1.5 Dense Passage Embeddings and Nearest Neighbor Search Available Implementations 6.1.6 Summary 6.2 Question Answering 6.2.1 Question Answering Based on Training Data Knowledge Fine-Tuned Question Answering Models Question Answering with Few-Shot Language Models 6.2.2 Question Answering Based on Retrieval 6.2.3 Long-Form Question Answering Using Retrieval A Language Model with Integrated Retrieval Controlling a Search Engine by a Pre-trained Language Model Available Implementations 6.2.4 Summary 6.3 Neural Machine Translation 6.3.1 Translation for a Single Language Pair 6.3.2 Multilingual Translation 6.3.3 Multilingual Question Answering Available Implementations 6.3.4 Summary 6.4 Text Summarization 6.4.1 Shorter Documents 6.4.2 Longer Documents 6.4.3 Multi-Document Summarization Available Implementations 6.4.4 Summary 6.5 Text Generation 6.5.1 Generating Text by Language Models 6.5.2 Generating Text with a Given Style Style-Conditional Probabilities Prompt-Based Generation 6.5.3 Transferring a Document to Another Text Style Style Transfer with Parallel Data Style Transfer without Parallel Data Style Transfer with Few-Shot Prompts 6.5.4 Story Generation with a Given Plot Specify a Storyline by Keywords or Phrases Specify a Storyline by Sentences Other Control Strategies 6.5.5 Generating Fake News Detecting Fake News 6.5.6 Generating Computer Code Available Implementations 6.5.7 Summary 6.6 Dialog Systems 6.6.1 Dialog Models as a Pipeline of Modules 6.6.2 Advanced Dialog Models 6.6.3 LaMDA and BlenderBot 3 Using Retrieval and Filters 6.6.4 Limitations and Remedies of Dialog Systems Available Implementations 6.6.5 Summary References 7 Foundation Models for Speech, Images, Videos, and Control 7.1 Speech Recognition and Generation 7.1.1 Basics of Automatic Speech Recognition 7.1.2 Transformer-Based Speech Recognition 7.1.3 Self-supervised Learning for Speech Recognition Available Implementations 7.1.4 Text-to-Speech Available Implementations 7.1.5 Speech-to-Speech Language Model 7.1.6 Music Generation Available Implementations 7.1.7 Summary 7.2 Image Processing and Generation 7.2.1 Basics of Image Processing 7.2.2 Vision Transformer 7.2.3 Image Generation 7.2.4 Joint Processing of Text and Images 7.2.5 Describing Images by Text 7.2.6 Generating Images from Text 7.2.7 Diffusion Models Restore an Image Destructed by Noise 7.2.8 Multipurpose Models Available Implementations 7.2.9 Summary 7.3 Video Interpretation and Generation 7.3.1 Basics of Video Processing 7.3.2 Video Captioning 7.3.3 Action Recognition in Videos 7.3.4 Generating Videos from Text Available Implementations 7.3.5 Summary 7.4 Controlling Dynamic Systems 7.4.1 The Decision Transformer 7.4.2 The GATO Model for Text, Images and Control Available Implementations 7.4.3 Summary 7.5 Interpretation of DNA and Protein Sequences 7.5.1 Summary References 8 Summary and Outlook 8.1 Foundation Models Are a New Paradigm 8.1.1 Pre-trained Language Models 8.1.2 Jointly Processing Different Modalities by Foundation Models 8.1.3 Performance Level of Foundation Models Capturing Knowledge Covered by Large Text Collections Information Extraction Text Processing and Text Generation Multimedia Processing 8.1.4 Promising Economic Solutions 8.2 Potential Harm from Foundation Models 8.2.1 Unintentionally Generate Biased or False Statements Accidentally Generated False or Misleading Information Reducing Bias by Retrieval Filtering Biased Text 8.2.2 Intentional Harm Caused by Foundation Models Fake Images Created by Foundation Models Surveillance and Censorship 8.2.3 Overreliance or Treating a Foundation Model as Human 8.2.4 Disclosure of Private Information 8.2.5 Society, Access, and Environmental Harms Access to Foundation Models Energy Consumption of Foundation Models Foundation Models Can Cause Unemployment and Social Inequality Foundation Models Can Promote a Uniform World View and Culture A Legal Regulation of Foundation Models Is Necessary 8.3 Advanced Artificial Intelligence Systems 8.3.1 Can Foundation Models Generate Innovative Content? 8.3.2 Grounding Language in the World 8.3.3 Fast and Slow Thinking 8.3.4 Planning Strategies References Appendix A A.1 Sources and Copyright of Images Used in Graphics Index