Sentiment Analysis of Multilingual Tweets Using Hybrid AI Models
Keywords:
multilingual sentiment analysis, hybrid AI, transformers, code-switching, lexical features, emoji sentiment, stacked ensembling, social media analyticsAbstract
Social media platforms produce vast, multilingual streams of short, noisy text where sentiment is often signaled through code-switching, slang, emojis, and cultural references. While large multilingual transformers (e.g., mBERT, XLM-R) have improved cross-lingual sentiment classification, performance still degrades on low-resource languages, code-switched text, and sarcasm. This manuscript presents a hybrid AI approach that combines (i) a strong multilingual transformer encoder, (ii) lightweight language-specific lexical/emoji features, (iii) code-switch and script-aware preprocessing, and (iv) stacked ensembling with a shallow meta-learner.
Using a balanced corpus of 200k tweets across five languages—English, Hindi, Spanish, Arabic, and Bengali—with three sentiment classes (positive, negative, neutral), we benchmark baselines (TF–IDF + SVM; mBERT; XLM-R) against two hybrid variants. Our proposed model fuses sentence-level transformer embeddings with affective lexicon counts, emoji sentiment priors, punctuation patterns, and a code-switch intensity score, then feeds them to a gradient-boosting meta-classifier on top of a fine-tuned transformer head. In simulated experiments with stratified splits (70/10/20), the proposed hybrid improves average macro-F1 by 4.0 points over a fine-tuned XLM-R baseline, with significant gains (p < .05) for code-switched and emoji-heavy tweets. Error analysis shows reduced confusion between neutral vs. mildly positive and improved robustness to script mixing (Latin–Devanagari). We discuss model design, training regime, and statistical validation, and we highlight implications for multilingual customer analytics, public-health monitoring, and civic sentiment tracking.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 The journal retains copyright of all published articles, ensuring that authors have control over their work while allowing wide dissenmination.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Articles are published under the Creative Commons Attribution NonCommercial 4.0 License (CC BY NC 4.0), allowing others to distribute, remix, adapt, and build upon the work for non-commercial purposes while crediting the original author.
