حملات تخاصمی در یک مدل تحلیل احساس متن

محورهای موضوعی : پردازش چند رسانه ای، سیستمهای ارتباطی، سیستمهای هوشمند

سحر مکرمی سفیدآب ¹ , سیدابوالقاسم میرروشندل ² , حمیدرضا احمدی فر ³ , مهد ی مکرمی ⁴

1 - دانشکده فنی، دانشگاه گیلان، رشت، ایران
2 - استادیار، دانشکده فنی، دانشگاه گیلان، رشت، ایران
3 - استادیار، دانشکده فنی، دانشگاه گیلان، رشت، ایران
4 - دانشگاه پیام‌نور واحد رشت، رشت، ایران

تاریخ دریافت : 1400/12/19 تاریخ پذیرش : 1401/07/20 تاریخ انتشار : 1400/04/01

کلید واژه: تحلیل احساس, گرادیان تابع هزینه, حملات متنی, پردازش زبان طبیعی, نمونه‌های تخاصمی,

چکیده مقاله :

: شبکه‌های عصبی عمیق دقت و کارایی بالایی در حل مسائل مختلف دارند اما در برابر نمونه‌های تخاصمی آسیب پذیر هستند. این‌ دسته از نمونه‌های مخرب به منظور فریب مدل آموزش‌دیده و بررسی آسیب‌پذیری مدل‌های شبکه عصبی تولید می‌شوند. در حوزه متن روش‌های موفق زیادی برای ساخت این‌ نمونه‌ها ارائه نشده است. در این پژوهش یک روش قوی مبتنی بر گرادیان تابع هزینه مدل برای تولید نمونه-های تخاصمی متنی ارائه شده و نشان داده شده که می‌توان با جایگزینی تعداد کمی از کلمات موجود در نمونه‌های اصلی با کلماتی که بیشترین تاثیر منفی را روی تصمیم طبقه‌بند دارند، نمونه‌های جدیدی مشابه با نمونه‌های اولیه برای فریب طبقه‌بند تحلیل احساس در سطح کلمه تولید نمود. در نهایت با بهره‌گیری از این نمونه‌ها دقت دو مدل طبقه‌بند از پیش‌آموزش‌دیده بررسی شد. روش مورد استفاده در این پژوهش، با دست‌کاری اندک نمونه‌های ورودی، موفق به کاهش دقت طبقه‌بندی از 86 درصد به کمتر از 10 درصد شده است.

چکیده انگلیسی:

Background and Purpose: Recently some researchers have shown that deep learning models, despite their high accuracy, can be vulnerable through some manipulations of their input samples. This manipulation leads to the production of new samples called Adversarial examples. These samples are very similar to the original ones, so humans cannot differentiate between these samples and the original, and cannot remove them from the dataset before predicting the model and preventing model errors. Various types of research have been done to generate malicious samples and inject them into the model, among which, the production of text samples has its own difficulties due to the discrete nature of the text. In this research, we tried to reach the highest level of vulnerability by providing a method with the least manipulation of the input data, and by testing the proposed method, we were able to bring the accuracy of CNN and LSTM models to less than 10%.Methods: In this research, for making malicious samples, first, a word that can increase the amount of error in the classification prediction is selected from the word dictionary as a candidate word for replacement by using Taylor expansion and then considering the importance of each word in the calculated cost of the corresponding candidate word, we proposed an arrangement for substitution between words. Finally, we moved the words in the specified order until the output of the model changed.Results: The evaluation of the presented method on two sentiment analysis models, LSTM and CNN, has shown that the proposed method has been very effective in reducing the accuracy of both models to less than 10% with a small number of replacements and this indicates the success of the proposed method compared to some other similar methods.Conclusion: As mentioned, most of the attention of science and industry is on the production of different systems using deep learning methods, so their security of them is also important. It is important to increase the strength of the models against adversarial examples. In this research, a method with the least amount of manipulation was presented to produce textual conflict samples. It seems that in the future it will be possible to use different methods of making natural texts to produce samples that, in addition to the apparent similarity to the original sample, are also comprehensible in terms of content.

منابع و مأخذ:

C.Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhun, I. Goodfellow and R. Fergus, “Intriguing properties of neural networks”, 2^nd International Conference on Learning Representations, ICLR 2014, Banff, Canada, 2014.

R. Jia., P. Liang, Adversarial examples for evaluating reading comprehension systems. In EMNLP, 2017

Y. Belinkov and Y. Bisk, “Synthetic and natural noise both break neural machine translation”, In Proceedings of ICLR, 2018.

I. Fursov, A. Zaytsev, P. Burnyshev, E. Dmitrieva, N. Klyuchnikov, A. Kravchenko, E. aArtemova and E. Burnaev, “A differentiable language model adversarial attack on text classiﬁers”, arXiv:2107.11275v1 [cs.CL], 23 Jul 2021.

Z. Kong, J. Xue, Y. Wang, L. Huang, Z. Niu and E. Li, “A survey on adversarial attack in the age of artificial intelligence”, Wireless Communications and Mobile Computing, Volume 2021, Article ID 4907754, 22 pages, 2021.

J. Xu and Q. Du, “TextTricker:Loss-based and gradient-based adversarial attacks on text classification models”, Engineering Applications of Artificial Intelligence,Volume 92, Elsevier, 0952-1976, 2020.

H. Hosseini, S. Kannan, B. Zhang and R. Poovendran, “Deceiving google’s perspective api built for detecting toxic comments,” arXiv preprint arXiv:1702.08138, 2017.

M. Alzantot, Y. Sharma, A. Elgohary, B. Ho, M. Srivastava and K. Chang, “Generating natural language adversarial examples, in Proceedings of Conference on Empiritical Methods in Natural Language Processing (EMNLP), 2018.

B. Liang, H. Li, M. Su, P. Bian, X. Li and W. ChangShi, “Deep text classification can be fooled”, arXiv preprint arXiv:1704.08006, 2017.

S. Samanta and S. Mehta, “Towards crafting text adversarial samples”, arXiv preprint arXiv:2003.10388, July 2017.

N. Papernot, P. McDaniel, A. Swami and R. Harang,“Crafting adversarial input sequences for recurrent neural networks”, In 2016 IEEE Military Communications Conference, MILCOM 2016, Baltimore, MD, USA, p.p. 49–54, November 1-3, 2016.

M. Sato, J. Suzuki, H. Shindo and Y. Matsumoto, “Interpretable adversarial perturbation in input embedding space for text”, In Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence, (IJCAI 2018), Stockholm, Sweden, p.p. 4323– 4330, July 13-19, 2018.

M. Behjati, S. M. Moosavi-Dezfooli, M. SoleymaniBaghshah and P. Frossard, “Universal adversarial attacks on text classifiers”, In ICASSP, 2019.

L. Song, X. Yu, H. Peng and K. Narasimhan, “Universal adversarial attacks with natural triggers for text classification”,

arXiv:2005.00174v2 [cs.CL], 7 Apr 2021.

S. Ren, Y. Deng, H. He and W. Che,“Generating natural language adversarial examples through probability weighted word saliency”, In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, p.p. 1085–1097, 2019.

J. Ebrahimi, A. Rao, D. Lowd and D. Dou, “Hotﬂip: White-box adversarial examples for text classiﬁcation”, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Volume 2: Short Papers, p.p. 31–36, 2018.

E. Wallace, S Feng, N. Kandpal, M. Gardner and S. Singh, “Universal adversarial triggers for attacking and analyzing nlp”, arXiv preprint arXiv:1908.07125, 2019.

H. Zhang, H. Zhou, N. Miao and L. Li, “Generating ﬂuent adversarial examples for natural languages”, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019.

R. Socher, A. Perelygin, J. Wu, J. Chuang, C. D Manning, A. Ng and C. Potts, “Recursive deep models for semantic compositionality over a sentiment treebank”, in Proceedings of the conference on empirical methods in natural language processing (EMNLP), p.p. 1631–1642, 2013.

T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch and A. Joulin, “Advances in pre-training distributed word representations”, In LREC, 2018.

_||_

اشتراک گذاری

آدرس مقاله

حملات تخاصمی در یک مدل تحلیل احساس متن

سکوی نشر دانش

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی