Balancing Efficiency and Performance in NLP: A Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML

Estevanell-Valladares, Ernesto L.; Gutiérrez, Yoan; Montoyo-Guijarro, Andrés; Muñoz-Guillena, Rafael; Almeida-Cruz, Yudivián

Balancing Efficiency and Performance in NLPA Cross-Comparison of Shallow Machine Learning and Large Language Models via AutoML

Revista:

Procesamiento del lenguaje natural

ISSN: 1135-5948

Año de publicación: 2024

Número: 73

Páginas: 221-233

Tipo: Artículo

DIALNET GOOGLE SCHOLAR Acceso abierto editor

Otras publicaciones en: Procesamiento del lenguaje natural

Resumen

Este estudio analiza críticamente la eficiencia de recursos y el rendimiento de los métodos de Aprendizaje Automático Superficial (SML) frente a los Grandes Modelos de Lenguaje (LLM) en tareas de clasificación de texto explorando el equilibrio entre precisión y sostenibilidad medioambiental. Se introduce una novedosa estrategia de optimización que prioriza la eficiencia computacional y el impacto ecológico junto con las métricas de rendimiento tradicionales aprovechando el Aprendizaje Automático de Maquinas (AutoML). El análisis revela que, si bien los pipelines desarrollados no superan a los modelos SOTA más avanzados en cuanto a rendimiento bruto, reducen significativamente la huella de carbono. Se descubrieron pipelines óptimos de SML con un rendimiento competitivo y hasta 70 veces menos emisiones de carbono que pipelines híbridos o totalmente LLM, como las variantes estándar de BERT y DistilBERT. Del mismo modo, obtenemos pipelines híbridos (que incorporan SML y LLM) con entre un 20% y un 50% menos de emisiones de carbono en comparación con las alternativas fine-tuneadas y sólo una disminución marginal del rendimiento. Esta investigación pone en cuestión la dependencia predominante de los LLM de alta carga computacional para tareas de PLN y subraya el potencial sin explotar de AutoML para esculpir la próxima oleada de modelos de IA con conciencia medioambiental.

Referencias bibliográficas

Anthony, L. F. W., B. Kanding, and R. Selvan. 2020. Carbontracker: Tracking and predicting the carbon footprint of training deep learning models. arXiv preprint arXiv:2007.03051.
Bannour, N., S. Ghannay, A. N´ev´eol, and A.-L. Ligozat. 2021. Evaluating the carbon footprint of nlp methods: a survey and analysis of existing tools. In Proceedings of the second workshop on simple and efficient natural language processing, pages 11–21.
Brown, T., B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. 2020. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc.
Chollet, F. 2018. Keras: The python deep learning library.
Chung, H. W., L. Hou, S. Longpre, B. Zoph, Y. Tay, W. Fedus, E. Li, X. Wang, M. Dehghani, S. Brahma, A. Webson, S. S. Gu, Z. Dai, M. Suzgun, X. Chen, A. Chowdhery, S. Narang, G. Mishra, A. Yu, V. Zhao, Y. Huang, A. Dai, H. Yu, S. Petrov, E. H. Chi, J. Dean, J. Devlin, A. Roberts, D. Zhou, Q. V. Le, and J. Wei. 2022. Scaling instruction-finetuned language models.
Clark, K., M.-T. Luong, Q. V. Le, and C. D. Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised crosslingual representation learning at scale.
Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Dodge, J., T. Prewitt, R. Tachet des Combes, E. Odmark, R. Schwartz, E. Strubell, A. S. Luccioni, N. A. Smith, N. DeCario, and W. Buchanan. 2022. Measuring the carbon intensity of ai in cloud instances. In Proceedings of the 2022 ACM conference on fairness, accountability, and transparency, pages 1877–1894.
Estévez-Velarde, S., Y. Gutiérrez, Y. Almeida-Cruz, and A. Montoyo. 2021. General-purpose hierarchical optimisation of machine learning pipelines with grammatical evolution. Information Sciences, 543:58–71.
Estevez-Velarde, S., Y. Gutiérrez, A. Montoyo, and Y. Almeida-Cruz. 2019. Automl strategy based on grammatical evolution: A case study about knowledge discovery from text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4356–4365.
Estevez-Velarde, S., Y. Gutiérrez, A. Montoyo, and Y. A. Cruz. 2020. Automatic discovery of heterogeneous machine learning pipelines: An application to natural language processing. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3558–3568.
Faiz, A., S. Kaneda, R. Wang, R. Osi, P. Sharma, F. Chen, and L. Jiang. 2023. Llmcarbon: Modeling the end-to-end carbon footprint of large language models. arXiv preprint arXiv:2309.14393.
Fedus, W., B. Zoph, and N. Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 23(120):1–39.
Feurer, M., K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter. 2020. Autosklearn 2.0: The next generation. arXiv: Learning.
Floridi, L. and M. Chiriatti. 2020. Gpt-3: Its nature, scope, limits, and consequences. Minds and Machines, 30:681–694.
González-Carvajal, S. and E. C. Garrido-Merchán. 2020. Comparing bert against traditional machine learning text classification. arXiv preprint arXiv:2005.13012.
Gu, Y., L. Dong, F. Wei, and M. Huang. 2023. Minillm: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations.
He, P., J. Gao, and W. Chen. 2023. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.
He, P., X. Liu, J. Gao, and W. Chen. 2021. Deberta: Decoding-enhanced bert with disentangled attention. In International Conference on Learning Representations.
Holmes, G., A. Donkin, and I. H. Witten. 1994. Weka: a machine learning workbench. pages 357–361. IEEE.
Honnibal, M., I. Montani, S. Van Landeghem, and A. Boyd. 2020. spacy: Industrial-strength natural language processing in python.
Hutter, F., L. Kotthoff, and J. Vanschoren. 2019. Automated Machine Learning. Springer.
Jin, H., Q. Song, and X. Hu. 2019. Autokeras: An efficient neural architecture search system. pages 1946–1956. ACM.
Kaplan, J., S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei. 2020. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
Kotthoff, L., C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2019. Autoweka: Automatic model selection and hyperparameter optimization in weka. Automated machine learning: methods, systems, challenges, pages 81–95.
Kowsari, K., K. Jafari Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown. 2019. Text classification algorithms: A survey. Information, 10(4):150.
Lan, Z., M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut. 2019. ALBERT: A lite BERT for self-supervised learning of language representations. CoRR, abs/1909.11942.
LeDell, E. and S. Poirier. 2020. H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML, volume 2020.
Lepikhin, D., H. Lee, Y. Xu, D. Chen, O. Firat, Y. Huang, M. Krikun, N. Shazeer, and Z. Chen. 2020. Gshard: Scaling giant models with conditional computation and automatic sharding. arXiv preprint arXiv:2006.16668.
Lin, Y., Y. Meng, X. Sun, Q. Han, K. Kuang, J. Li, and F. Wu. 2021. Bertgcn: Transductive text classification by combining gcn and bert. arXiv preprint arXiv:2105.05727.
Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach.
Loper, E. and S. Bird. 2002. Nltk: the natural language toolkit. arXiv preprint cs/0205028.
Maas, A. L., R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In D. Lin, Y. Matsumoto, and R. Mihalcea, editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June. Association for Computational Linguistics.
Mohr, F., M. D. Wever, and E. Hüllermeier. 2018. Ml-plan: Automated machine learning via hierarchical planning. Machine Learning, 107(8):1495–1515.
Ng, S. Y., K. M. Lim, C. P. Lee, and J. Y. Lim. 2023. Sentiment analysis using distilbert. In 2023 IEEE 11th Conference on Systems, Process Control (ICSPC), pages 84–89.
Nori, H., N. King, S. M. McKinney, D. Carignan, and E. Horvitz. 2023. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375.
Olson, R. S. and J. H. Moore. 2019. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Automated Machine Learning. Springer, pages 151–160.
OpenAI. 2023. Gpt-4 technical report. Technical report. arXiv:2303.08774.
Öztürk, E., F. Ferreira, H. Jomaa, L. Schmidt-Thieme, J. Grabocka, and F. Hutter. 2022. Zero-shot automl with pretrained models. In International Conference on Machine Learning, pages 17138–17155. PMLR.
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12(85):2825–2830.
Pipalia, K., R. Bhadja, and M. Shukla. 2020. Comparative analysis of different transformer based architectures used in sentiment analysis. In 2020 9th International Conference System Modeling and Advancement in Research Trends (SMART), pages 411–415.
Qiu, X., T. Sun, Y. Xu, Y. Shao, N. Dai, and X. Huang. 2020. Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10):1872–1897.
Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsupervised multitask learners.
Raffel, C., N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
Rehurek, R. and P. Sojka. 2010. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pages 45–50, Valletta, Malta, May. ELRA.
Rodola, G. 2020. Psutil documentation.
Romano, J. D., T. T. Le, W. Fu, and J. H. Moore. 2021. Tpot-nn: augmenting tree-based automated machine learning with neural network estimators. Genetic Programming and Evolvable Machines, 22:207–227.
Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2020. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter.
Schmidt, V., K. Goyal, A. Joshi, B. Feld, L. Conell, N. Laskaris, D. Blank, J. Wilson, S. Friedler, and S. Luccioni. 2021. Codecarbon: estimate and track carbon emissions from machine learning computing. Cited on, 20.
Schwartz, R., J. Dodge, N. A. Smith, and O. Etzioni. 2019. Green ai.
Sun, C., X. Qiu, Y. Xu, and X. Huang. 2019. How to fine-tune bert for text classification? In Chinese computational linguistics: 18th China national conference, CCL 2019, Kunming, China, October 18–20, 2019, proceedings 18, pages 194–206. Springer.
Sun, X., X. Li, J. Li, F. Wu, S. Guo, T. Zhang, and G. Wang. 2023. Text classification via large language models. arXiv preprint arXiv:2305.08377.
Thompson, N. C., K. Greenewald, K. Lee, and G. F. Manso. 2021. Deep learning’s diminishing returns: The cost of improvement is becoming unsustainable. IEEE Spectrum, 58(10):50–55.
Thornton, C., F. Hutter, H. H. Hoos, and K. Leyton-Brown. 2013. Auto-weka: combined selection and hyperparameter optimization of classification algorithms. Pages 847–855. ACM.
Wang, X., C. Na, E. Strubell, S. Friedler, and S. Luccioni. 2023. Energy and carbon considerations of fine-tuning bert. arXiv preprint arXiv:2311.10267.
Yang, Z., Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
Yun-tao, Z., G. Ling, and W. Yong-cheng. 2005. An improved tf-idf approach for text classification. Journal of Zhejiang University-Science A, 6(1):49–55.
Zaheer, M., G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang, L. Yang, et al. 2020. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297.
Zhang, X., J. Zhao, and Y. LeCun. 2015. Character-level convolutional networks for text classification. Advances in neural information processing systems, 28.

Fuente de los datos: Dialnet