Full Disclosure: Priyanka Mhatre (Digital Marketing Specialist, Packt) graciously invited me to review this book, before its publication - and provided me with a PDF copy).
https://www.linkedin.com/posts/meysam-ac_machinelearning-language-python-activity-6827154631685091328-iw2g
“It covers all the important
topics from training BERT, GPT, other Transformer models from scratch,
fine-tuning models on various tasks such as question answering, NER,
classification, zero-shot classification.”
-
Explore
state-of-the-art NLP solutions with the Transformers library
-
Train a language
model in any language with any transformer architecture
-
Fine-tune a
pre-trained language model to perform several downstream tasks
-
Select the right
framework for the training, evaluation, and production of an end-to-end
solution
-
Get hands-on
experience in using TensorBoard and Weights & Biases
-
Visualize the
internal representation of transformer models for interpretability
Summary:
My one word summary for this
book: Fascinating.
A few other key words that come
to mind to describe this book: Foundational, Hands-on, Practical, Crisp,
Concise, Depth & Breadth, Tremendous Value.
With the continued accelerating explosion
in the growth of unstructured data collected by enterprises in texts and
documents – the need to be able to analyze and derive meaningful information is
more critical than ever – and will be the competitive advantage that
distinguishes future winners from losers in the marketplace of solutions. This book is an investment in expanding your awareness of the techniques and capabilities that will help you navigate those challenges.
From the book:
“Transformer
models have gained immense interest because of their effectiveness in all NLP
tasks, from text classification to text generation….[and] effectively
improve the performance of multilingual and multi-task NLP problems, as well as
monolingual and single tasks.”
This book is a practical guide to
leveraging (and applying) some of the leading-edge concepts, algorithms, and libraries
from the fields of Deep Learning (DL) and Natural Language Processing (NLP) to
solve real-world problems – ranging from summarization to question-answering.
In particular, this book will
serve as a gentle guided tour of some of the important advances that have occurred
(and continue to evolve) as the
transformer architecture gradually evolved into an attention-based
encoder-decoder architecture.
What I particularly liked:
The deep subject-matter experience and credentials of the
authors (“Savaş Yıldırım graduated from the Istanbul Technical
University Department of Computer Engineering and holds a Ph.D. degree in
Natural Language Processing (NLP). Currently, he is an associate professor at
the Istanbul Bilgi University, Turkey, and is a visiting researcher at the
Ryerson University, Canada. He is a proactive lecturer and researcher with more
than 20 years of experience teaching courses on machine learning, deep
learning, and NLP.”, and “Meysam
Asgari-Chenaghlu is an AI manager at Carbon Consulting and is also a Ph.D.
candidate at the University of Tabriz.”)
The companion “Code In Action”
YouTube channel playlist for the book, and
the GitHub repository with code examples.
The excellent quality/conciseness/crispness
of the writing.
The extensive citation of relevant
research papers – and references at the end of chapters.
The authors’ deep practical knowledge
– and discussions – of the advantages and disadvantages of different
approaches.
The exquisitely balanced need for
technical depth in the details covered by a given chapter – with the need to
maintain a steady pace of educating & keeping the reader engaged. Some
books go too deep, and some stay too shallow. This book is exceptionally well
balanced at just the right depth.
The exceptional variety of
examples covered.
The quality of the illustrations
used to convey complex concepts – Figures 1.19, 3.2, 3.3, 7.8, 9.3 are just a few examples of the many good diagrams.
Chapter-1’s focus on getting the reader immediately involved
in executing a hello-world example with Transformers. The overview of RNNs, FFNNs,
LSTMs, and CNNs. An excellent overview of the developments in NLP over the last
10 years that led to the Tranformer architecture.
Chapter-2’s guidance on installing the required software –
and the suggestion of Google Colab as an alternative to Anaconda.
Chapter-2’s coverage of community-provided models, benchmarks,
TensorFlow, PyTorch, and Transformer - and running a simple Transformer from
scratch.
Chapter-3’s coverage of BERT – as well as ALBERT, RoBERTa,
and ELECTRA.
Chapter-4’s coverage of AR, GPT, BART, and NLG.
Chapter-5’s coverage of fine-tuning language models for text
classification (e.g., for sentiment analysis, or multi-class classification).
Chapter-6’s coverage of NER and POS was of particular
interest – given the effort that I had to expend last year doing my own
deep-dive to prepare some recommendations for a client – I wish I had had this
book then.
Chapter-7’s coverage of USE and SBERT, zero-shot learning
with BART, and FLAIR.
Chapter-8’s discussion of efficient sparse parsers (Linformer,
and BigBird) – as well as the techniques of distillation, pruning, and
quantization – to make efficient models out of trained models. Chapter-8 may
well be worth the price of the book, itself.
Chapter-9’s coverage of multilingual and cross-lingual
language model training (and pretraining). I found the discussion of “Cross-lingual
similarity tasks” (see p-278) to be particularly interesting.
Chapter-10’s coverage of Locust for load testing, fastAPI,
and TensorFlow Extended (TFX) – as well as the serving of solutions in
environments where CPU/GPU is available.
Chapter-11’s coverage of visualization with exBERT and BertViz
– as well as the discussion on tracking model training with TensorBoard and
W&B
The ”Other Books You May Enjoy” section at the end of the
book (“Getting Started with Google BERT”, and “Mastering spaCy”)
Suggestions for the next edition:
The fonts used for the text in some figures (e.g., 3.8, 3.10,
3.12, 3.13, 3.14, 4.5, 4.6, 6.2, 6.7, 8.4, 8.6, 9.4, 9.5 ) appear to be a bit
fuzzy in the PDF version of the book. Compare those with the clarity of figure 6.6.
Table of Contents:
Section 1: Introduction – Recent Developments in the Field,
Installations, and Hello World Applications
1 - From Bag-of-Words to the Transformer
2 - A Hands-On Introduction to the Subject
Section 2: Transformer Models – From Autoencoding to
Autoregressive Models
3 - Autoencoding Language Models
4 - Autoregressive and Other Language Models
5 - Fine-Tuning Language Models for Text Classification
6 - Fine-Tuning Language Models for Token Classification
7 - Text Representation
Section 3: Advanced Topics
8 - Working with Efficient Transformers
9 - Cross-Lingual and Multilingual Language Modeling
10 - Serving Transformer Models
11 - Attention Visualization and Experiment Tracking