Saturday, July 07, 2018

2018-07-07 Saturday - fastText for Text Classification

I'm doing some focused reading this weekend to investigate the relative performance of Machine Learning frameworks leveraging GPU vs CPU implementations - and whether there are cases in which a distributed CPU approach may have an advantage over a GPU approach. 

 This 2016 paper (using fastText, for text classification problems) by a Facebook AI Research (AIR) team (Armand Joulin, Eduourd Grave, Piotr Bojanowski, Thomas Mikolov) achieved some startling results that may be of interest to others.

Bag of Tricks for Efficient Text Classification
https://arxiv.org/abs/1607.01759
"This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore~CPU, and classify half a million sentences among~312K classes in less than a minute."  


https://fasttext.cc/
"FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices."

Implementing Deep Learning Methods and Feature Engineering for Text Data: FastText
https://www.kdnuggets.com/2018/05/implementing-deep-learning-methods-feature-engineering-text-data-fasttext.html

No comments:

Copyright

© 2001-2021 International Technology Ventures, Inc., All Rights Reserved.