Balancing Efficiency and Performance in NLP: A Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML

Estevanell-Valladares, Ernesto L.; Gutiérrez, Yoan; Montoyo-Guijarro, Andrés; Muñoz-Guillena, Rafael; Almeida-Cruz, Yudivián

Balancing Efficiency and Performance in NLPA Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML

Journal:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2024

Issue: 73

Pages: 221-233

Type: Article

DIALNET GOOGLE SCHOLAR Open access editor

More publications in: Procesamiento del lenguaje natural

Abstract

This study critically examines the resource efficiency and performance of Shallow Machine Learning (SML) methods versus Large Language Models (LLMs) in text classification tasks by exploring the balance between accuracy and environmental sustainability. We introduce a novel optimization strategy that prioritizes computational efficiency and ecological impact alongside traditional performance metrics leveraging Automated Machine Learning (AutoML). Our analysis reveals that while the pipelines we developed did not surpass state-of-the-art (SOTA) models regarding raw performance, they offer a significantly reduced carbon footprint. We discovered SML optimal pipelines with competitive performance and up to 70 times less carbon emissions than hybrid or fully LLM pipelines, such as standard BERT and DistilBERT variants. Similarly, we obtain hybrid pipelines (using SML and LLMs) with between 20% and 50% reduced carbon emissions compared to fine-tuned alternatives and only a marginal decrease in performance. This research challenges the prevailing reliance on computationally intensive LLMs for NLP tasks and underscores the untapped potential of AutoML in sculpting the next wave of environmentally conscious AI models.

Bibliographic References

Anthony, L. F. W., B. Kanding, and R. Selvan. 2020. Carbontracker: Tracking and predicting the carbon footprint of training deep learning models. arXiv preprint arXiv:2007.03051.
Bannour, N., S. Ghannay, A. N´ev´eol, and A.-L. Ligozat. 2021. Evaluating the carbon footprint of nlp methods: a survey and analysis of existing tools. In Proceedings of the second workshop on simple and efficient natural language processing, pages 11–21.
Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. 2020. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
Chollet, F. 2018. Keras: The python deep learning library.
Chung, H. W., L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, S. Narang, G. Mishra, A. Yu, V. Zhao, Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei. 2022. Scaling instruction-finetuned language models.
Clark, K., M.-T. Luong, Q. V. Le, and C. D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised crosslingual representation learning at scale.
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dodge, J., T. Prewitt, R. Tachet des Combes, E. Odmark, R. Schwartz, E. Strubell, A. S. Luccioni, N. A. Smith, N. DeCario, and W. Buchanan. 2022. Measuring the carbon intensity of ai in cloud instances. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, pages 1877–1894.
Estévez-Velarde, S., Y. Gutiérrez, Y. Almeida-Cruz, and A. Montoyo. 2021. General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution. Information Sciences, 543:58–71.
Estevez-Velarde, S., Y. Gutiérrez, A. Montoyo, and Y. Almeida-Cruz. 2019. Automl strategy based on grammatical evolution: A case study about knowledge discovery from text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4356–4365.
Estevez-Velarde, S., Y. Gutiérrez, A. Montoyo, and Y. A. Cruz. 2020. Automatic discovery of heterogeneous machine learning pipelines: An application to natural language processing. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3558–3568.
Faiz, A., S. Kaneda, R. Wang, R. Osi, P. Sharma, F. Chen, and L. Jiang. 2023. Llmcarbon: Modeling the end-to-end carbon footprint of large language models. arXiv preprint arXiv:2309.14393.
Fedus, W., B. Zoph, and N. Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120):1–39.
Feurer, M., K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. 2020. Autosklearn 2.0: The next generation. arXiv: Learning.
Floridi, L. and M. Chiriatti. 2020. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694.
González-Carvajal, S. and E. C. Garrido-Merchán. 2020. Comparing bert against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.
Gu, Y., L. Dong, F. Wei, and M. Huang. 2023. Minillm: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations.
He, P., J. Gao, and W. Chen. 2023. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.
He, P., X. Liu, J. Gao, and W. Chen. 2021. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
Holmes, G., A. Donkin, and I. H. Witten. 1994. Weka: a machine learning workbench. pages 357–361. IEEE.
Honnibal, M., I. Montani, S. Van Landeghem, and A. Boyd. 2020. spacy: Industrial-strength natural language processing in python.
Hutter, F., L. Kotthoff, and J. Vanschoren. 2019. Automated Machine Learning. Springer.
Jin, H., Q. Song, and X. Hu. 2019. Autokeras: An efficient neural architecture search system. pages 1946–1956. ACM.
Kaplan, J., S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
Kotthoff, L., C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2019. Autoweka: Automatic model selection and hyperparameter optimization in weka. Automated machine learning: methods, systems, challenges, pages 81–95.
Kowsari, K., K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown. 2019. Text classification algorithms: A survey. Information, 10(4):150.
Lan, Z., M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. CoRR, abs/1909.11942.
LeDell, E. and S. Poirier. 2020. H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML, volume 2020.
Lepikhin, D., H. Lee, Y. Xu, D. Chen, O. Firat, Y. Huang, M. Krikun, N. Shazeer, and Z. Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668.
Lin, Y., Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu. 2021. Bertgcn: Transductive text classification by combining gcn and bert. arXiv preprint arXiv:2105.05727.
Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.
Loper, E. and S. Bird. 2002. Nltk: the natural language toolkit. arXiv preprint cs/0205028.
Maas, A. L., R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In D. Lin, Y. Matsumoto, and R. Mihalcea, editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June. Association for Computational Linguistics.
Mohr, F., M. D. Wever, and E. Hüllermeier. 2018. Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8):1495–1515.
Ng, S. Y., K. M. Lim, C. P. Lee, and J. Y. Lim. 2023. Sentiment analysis using distilbert. In 2023 IEEE 11th Conference on Systems, Process Control (ICSPC), pages 84–89.
Nori, H., N. King, S. M. McKinney, D. Carignan, and E. Horvitz. 2023. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375.
Olson, R. S. and J. H. Moore. 2019. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Automated Machine Learning. Springer, pages 151–160.
OpenAI. 2023. Gpt-4 technical report. Technical report. arXiv:2303.08774.
Öztürk, E., F. Ferreira, H. Jomaa, L. Schmidt-Thieme, J. Grabocka, and F. Hutter. 2022. Zero-shot automl with pretrained models. In International Conference on Machine Learning, pages 17138–17155. PMLR.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85):2825–2830.
Pipalia, K., R. Bhadja, and M. Shukla. 2020. Comparative analysis of different transformer based architectures used in sentiment analysis. In 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART), pages 411–415.
Qiu, X., T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsupervised multitask learners.
Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
Rehurek, R. and P. Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, May. ELRA.
Rodola, G. 2020. Psutil documentation.
Romano, J. D., T. T. Le, W. Fu, and J. H. Moore. 2021. Tpot-nn: augmenting tree-based automated machine learning with neural network estimators. Genetic Programming and Evolvable Machines, 22:207–227.
Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2020. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
Schmidt, V., K. Goyal, A. Joshi, B. Feld, L. Conell, N. Laskaris, D. Blank, J. Wilson, S. Friedler, and S. Luccioni. 2021. Codecarbon: estimate and track carbon emissions from machine learning computing. Cited on, 20.
Schwartz, R., J. Dodge, N. A. Smith, and O. Etzioni. 2019. Green ai.
Sun, C., X. Qiu, Y. Xu, and X. Huang. 2019. How to fine-tune bert for text classification? In Chinese computational linguistics: 18th China national conference, CCL 2019, Kunming, China, October 18–20, 2019, proceedings 18, pages 194–206. Springer.
Sun, X., X. Li, J. Li, F. Wu, S. Guo, T. Zhang, and G. Wang. 2023. Text classification via large language models. arXiv preprint arXiv:2305.08377.
Thompson, N. C., K. Greenewald, K. Lee, and G. F. Manso. 2021. Deep learning’s diminishing returns: The cost of improvement is becoming unsustainable. IEEE Spectrum, 58(10):50–55.
Thornton, C., F. Hutter, H. H. Hoos, and K. Leyton-Brown. 2013. Auto-weka: combined selection and hyperparameter optimization of classification algorithms. Pages 847–855. ACM.
Wang, X., C. Na, E. Strubell, S. Friedler, and S. Luccioni. 2023. Energy and carbon considerations of fine-tuning bert. arXiv preprint arXiv:2311.10267.
Yang, Z., Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
Yun-tao, Z., G. Ling, and W. Yong-cheng. 2005. An improved tf-idf approach for text classification. Journal of Zhejiang University-Science A, 6(1):49–55.
Zaheer, M., G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang, L. Yang, et al. 2020. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297.
Zhang, X., J. Zhao, and Y. LeCun. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.

Data source: Dialnet