حملات تخاصمی در یک مدل تحلیل احساس متن
محورهای موضوعی : پردازش چند رسانه ای، سیستمهای ارتباطی، سیستمهای هوشمندسحر مکرمی سفیدآب 1 , سیدابوالقاسم میرروشندل 2 , حمیدرضا احمدی فر 3 , مهد ی مکرمی 4
1 - دانشکده فنی، دانشگاه گیلان، رشت، ایران
2 - استادیار، دانشکده فنی، دانشگاه گیلان، رشت، ایران
3 - استادیار، دانشکده فنی، دانشگاه گیلان، رشت، ایران
4 - دانشگاه پیامنور واحد رشت، رشت، ایران
کلید واژه: تحلیل احساس, گرادیان تابع هزینه, حملات متنی, پردازش زبان طبیعی, نمونههای تخاصمی,
چکیده مقاله :
: شبکههای عصبی عمیق دقت و کارایی بالایی در حل مسائل مختلف دارند اما در برابر نمونههای تخاصمی آسیب پذیر هستند. این دسته از نمونههای مخرب به منظور فریب مدل آموزشدیده و بررسی آسیبپذیری مدلهای شبکه عصبی تولید میشوند. در حوزه متن روشهای موفق زیادی برای ساخت این نمونهها ارائه نشده است. در این پژوهش یک روش قوی مبتنی بر گرادیان تابع هزینه مدل برای تولید نمونه-های تخاصمی متنی ارائه شده و نشان داده شده که میتوان با جایگزینی تعداد کمی از کلمات موجود در نمونههای اصلی با کلماتی که بیشترین تاثیر منفی را روی تصمیم طبقهبند دارند، نمونههای جدیدی مشابه با نمونههای اولیه برای فریب طبقهبند تحلیل احساس در سطح کلمه تولید نمود. در نهایت با بهرهگیری از این نمونهها دقت دو مدل طبقهبند از پیشآموزشدیده بررسی شد. روش مورد استفاده در این پژوهش، با دستکاری اندک نمونههای ورودی، موفق به کاهش دقت طبقهبندی از 86 درصد به کمتر از 10 درصد شده است.
Background and Purpose: Recently some researchers have shown that deep learning models, despite their high accuracy, can be vulnerable through some manipulations of their input samples. This manipulation leads to the production of new samples called Adversarial examples. These samples are very similar to the original ones, so humans cannot differentiate between these samples and the original, and cannot remove them from the dataset before predicting the model and preventing model errors. Various types of research have been done to generate malicious samples and inject them into the model, among which, the production of text samples has its own difficulties due to the discrete nature of the text. In this research, we tried to reach the highest level of vulnerability by providing a method with the least manipulation of the input data, and by testing the proposed method, we were able to bring the accuracy of CNN and LSTM models to less than 10%.Methods: In this research, for making malicious samples, first, a word that can increase the amount of error in the classification prediction is selected from the word dictionary as a candidate word for replacement by using Taylor expansion and then considering the importance of each word in the calculated cost of the corresponding candidate word, we proposed an arrangement for substitution between words. Finally, we moved the words in the specified order until the output of the model changed.Results: The evaluation of the presented method on two sentiment analysis models, LSTM and CNN, has shown that the proposed method has been very effective in reducing the accuracy of both models to less than 10% with a small number of replacements and this indicates the success of the proposed method compared to some other similar methods.Conclusion: As mentioned, most of the attention of science and industry is on the production of different systems using deep learning methods, so their security of them is also important. It is important to increase the strength of the models against adversarial examples. In this research, a method with the least amount of manipulation was presented to produce textual conflict samples. It seems that in the future it will be possible to use different methods of making natural texts to produce samples that, in addition to the apparent similarity to the original sample, are also comprehensible in terms of content.
C.Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhun, I. Goodfellow and R. Fergus, “Intriguing properties of neural networks”, 2nd International Conference on Learning Representations, ICLR 2014, Banff, Canada, 2014. |
R. Jia., P. Liang, Adversarial examples for evaluating reading comprehension systems. In EMNLP, 2017 |
Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation”, In Proceedings of ICLR, 2018. |
I. Fursov, A. Zaytsev, P. Burnyshev, E. Dmitrieva, N. Klyuchnikov, A. Kravchenko, E. aArtemova and E. Burnaev, “A differentiable language model adversarial attack on text classifiers”, arXiv:2107.11275v1 [cs.CL], 23 Jul 2021. |
Z. Kong, J. Xue, Y. Wang, L. Huang, Z. Niu and E. Li, “A survey on adversarial attack in the age of artificial intelligence”, Wireless Communications and Mobile Computing, Volume 2021, Article ID 4907754, 22 pages, 2021. |
J. Xu and Q. Du, “TextTricker:Loss-based and gradient-based adversarial attacks on text classification models”, Engineering Applications of Artificial Intelligence,Volume 92, Elsevier, 0952-1976, 2020. |
H. Hosseini, S. Kannan, B. Zhang and R. Poovendran, “Deceiving google’s perspective api built for detecting toxic comments,” arXiv preprint arXiv:1702.08138, 2017. |
M. Alzantot, Y. Sharma, A. Elgohary, B. Ho, M. Srivastava and K. Chang, “Generating natural language adversarial examples, in Proceedings of Conference on Empiritical Methods in Natural Language Processing (EMNLP), 2018. |
B. Liang, H. Li, M. Su, P. Bian, X. Li and W. ChangShi, “Deep text classification can be fooled”, arXiv preprint arXiv:1704.08006, 2017. |
S. Samanta and S. Mehta, “Towards crafting text adversarial samples”, arXiv preprint arXiv:2003.10388, July 2017. |
N. Papernot, P. McDaniel, A. Swami and R. Harang,“Crafting adversarial input sequences for recurrent neural networks”, In 2016 IEEE Military Communications Conference, MILCOM 2016, Baltimore, MD, USA, p.p. 49–54, November 1-3, 2016. |
M. Sato, J. Suzuki, H. Shindo and Y. Matsumoto, “Interpretable adversarial perturbation in input embedding space for text”, In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, (IJCAI 2018), Stockholm, Sweden, p.p. 4323– 4330, July 13-19, 2018. |
M. Behjati, S. M. Moosavi-Dezfooli, M. SoleymaniBaghshah and P. Frossard, “Universal adversarial attacks on text classifiers”, In ICASSP, 2019. |
L. Song, X. Yu, H. Peng and K. Narasimhan, “Universal adversarial attacks with natural triggers for text classification”, arXiv:2005.00174v2 [cs.CL], 7 Apr 2021. |
S. Ren, Y. Deng, H. He and W. Che,“Generating natural language adversarial examples through probability weighted word saliency”, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, p.p. 1085–1097, 2019. |
J. Ebrahimi, A. Rao, D. Lowd and D. Dou, “Hotflip: White-box adversarial examples for text classification”, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, p.p. 31–36, 2018. |
E. Wallace, S Feng, N. Kandpal, M. Gardner and S. Singh, “Universal adversarial triggers for attacking and analyzing nlp”, arXiv preprint arXiv:1908.07125, 2019. |
H. Zhang, H. Zhou, N. Miao and L. Li, “Generating fluent adversarial examples for natural languages”, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019. |
R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D Manning, A. Ng and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank”, in Proceedings of the conference on empirical methods in natural language processing (EMNLP), p.p. 1631–1642, 2013. |
T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch and A. Joulin, “Advances in pre-training distributed word representations”, In LREC, 2018. |