2021-09-15

2021-09-15 Wednesday - Book Review: Mastering Transformers

image source: packtpub.com

 

My post on LinkedIn, mentioning this review. 

My review posted to Amazon

Mastering Transformers: Build state-of-the-art models from scratch with advanced natural language processing techniques

Authors:

Full Disclosure: Priyanka Mhatre (Digital Marketing Specialist, Packt) graciously invited me to review this book, before its publication - and provided me with a PDF copy).

 

https://www.linkedin.com/posts/meysam-ac_machinelearning-language-python-activity-6827154631685091328-iw2g

It covers all the important topics from training BERT, GPT, other Transformer models from scratch, fine-tuning models on various tasks such as question answering, NER, classification, zero-shot classification.

-        Explore state-of-the-art NLP solutions with the Transformers library

-        Train a language model in any language with any transformer architecture

-        Fine-tune a pre-trained language model to perform several downstream tasks

-        Select the right framework for the training, evaluation, and production of an end-to-end solution

-        Get hands-on experience in using TensorBoard and Weights & Biases

-        Visualize the internal representation of transformer models for interpretability

Summary:

My one word summary for this book: Fascinating.

A few other key words that come to mind to describe this book: Foundational, Hands-on, Practical, Crisp, Concise, Depth & Breadth, Tremendous Value.

With the continued accelerating explosion in the growth of unstructured data collected by enterprises in texts and documents – the need to be able to analyze and derive meaningful information is more critical than ever – and will be the competitive advantage that distinguishes future winners from losers in the marketplace of solutions.   This book is an investment in expanding your awareness of the techniques and capabilities that will help you navigate those challenges.

From the book: 

Transformer models have gained immense interest because of their effectiveness in all NLP tasks, from text classification to text generation….[and] effectively improve the performance of multilingual and multi-task NLP problems, as well as monolingual and single tasks.

This book is a practical guide to leveraging (and applying) some of the leading-edge concepts, algorithms, and libraries from the fields of Deep Learning (DL) and Natural Language Processing (NLP) to solve real-world problems – ranging from summarization to  question-answering.

In particular, this book will serve as a gentle guided tour of some of the important advances that have occurred (and continue to evolve) as  the transformer architecture gradually evolved into an attention-based encoder-decoder architecture.

What I particularly liked:

The deep subject-matter experience and credentials of the authors (“Savaş Yıldırım graduated from the Istanbul Technical University Department of Computer Engineering and holds a Ph.D. degree in Natural Language Processing (NLP). Currently, he is an associate professor at the Istanbul Bilgi University, Turkey, and is a visiting researcher at the Ryerson University, Canada. He is a proactive lecturer and researcher with more than 20 years of experience teaching courses on machine learning, deep learning, and NLP.”,  and “Meysam Asgari-Chenaghlu is an AI manager at Carbon Consulting and is also a Ph.D. candidate at the University of Tabriz.”)

The companion “Code In Action” YouTube channel playlist for the book,  and the GitHub repository with code examples.

The excellent quality/conciseness/crispness of the writing.

The extensive citation of relevant research papers – and references at the end of chapters.

The authors’ deep practical knowledge – and discussions – of the advantages and disadvantages of different approaches.

The exquisitely balanced need for technical depth in the details covered by a given chapter – with the need to maintain a steady pace of educating & keeping the reader engaged. Some books go too deep, and some stay too shallow. This book is exceptionally well balanced at just the right depth.

The exceptional variety of examples covered.

The quality of the illustrations used to convey complex concepts – Figures 1.19, 3.2, 3.3, 7.8, 9.3 are  just a few examples of the many good diagrams.

Chapter-1’s focus on getting the reader immediately involved in executing a hello-world example with Transformers. The overview of RNNs, FFNNs, LSTMs, and CNNs. An excellent overview of the developments in NLP over the last 10 years that led to the Tranformer architecture.

Chapter-2’s guidance on installing the required software – and the suggestion of Google Colab as an alternative to Anaconda.

Chapter-2’s coverage of community-provided models, benchmarks, TensorFlow, PyTorch, and Transformer - and running a simple Transformer from scratch.

Chapter-3’s coverage of BERT – as well as ALBERT, RoBERTa, and ELECTRA.

Chapter-4’s coverage of AR, GPT, BART, and NLG.

Chapter-5’s coverage of fine-tuning language models for text classification (e.g., for sentiment analysis, or multi-class classification).

Chapter-6’s coverage of NER and POS was of particular interest – given the effort that I had to expend last year doing my own deep-dive to prepare some recommendations for a client – I wish I had had this book then.

Chapter-7’s coverage of USE and SBERT, zero-shot learning with BART, and FLAIR.

Chapter-8’s discussion of efficient sparse parsers (Linformer, and BigBird) – as well as the techniques of distillation, pruning, and quantization – to make efficient models out of trained models. Chapter-8 may well be worth the price of the book, itself.

Chapter-9’s coverage of multilingual and cross-lingual language model training (and pretraining). I found the discussion of “Cross-lingual similarity tasks” (see p-278) to be particularly interesting.

Chapter-10’s coverage of Locust for load testing, fastAPI, and TensorFlow Extended (TFX) – as well as the serving of solutions in environments where CPU/GPU is available.

Chapter-11’s coverage of visualization with exBERT and BertViz – as well as the discussion on tracking model training with TensorBoard and W&B

The ”Other Books You May Enjoy” section at the end of the book (“Getting Started with Google BERT”, and “Mastering spaCy”)

Suggestions for the next edition:

The fonts used for the text in some figures (e.g., 3.8, 3.10, 3.12, 3.13, 3.14, 4.5, 4.6, 6.2, 6.7, 8.4, 8.6, 9.4, 9.5 ) appear to be a bit fuzzy in the PDF version of the book.  Compare those with the clarity of figure 6.6.

 

Table of Contents:

Section 1: Introduction – Recent Developments in the Field, Installations, and Hello World Applications

1 - From Bag-of-Words to the Transformer

2 - A Hands-On Introduction to the Subject

Section 2: Transformer Models – From Autoencoding to Autoregressive Models

3 - Autoencoding Language Models

4 - Autoregressive and Other Language Models

5 - Fine-Tuning Language Models for Text Classification

6 - Fine-Tuning Language Models for Token Classification

7 - Text Representation

Section 3: Advanced Topics

8 - Working with Efficient Transformers

9 - Cross-Lingual and Multilingual Language Modeling

10 - Serving Transformer Models

11 - Attention Visualization and Experiment Tracking

 

No comments:

Copyright

© 2001-2021 International Technology Ventures, Inc., All Rights Reserved.