SPARSE AND TRANSFERABLE UNIVERSAL SINGULAR VECTORS ATTACK

Мұқаба

Дәйексөз келтіру

Толық мәтін

Ашық рұқсат Ашық рұқсат
Рұқсат жабық Рұқсат берілді
Рұқсат жабық Тек жазылушылар үшін

Аннотация

Mounting concerns about neural networks’ safety and robustness call for a deeper understanding of models’ vulnerability and research in adversarial attacks. Motivated by this, we propose a novel universal attack that is highly efficient in terms of transferability. In contrast to the existing (p, q)-singular vectors approach, we focus on finding sparse singular vectors of Jacobian matrices of the hidden layers by employing the truncated power iteration method. We discovered that using resulting vectors as adversarial perturbations can effectively attack the original model and models with entirely different architectures, highlighting the importance of sparsity constraint for attack transferability. Moreover, we achieve results comparable to dense baselines while damaging less than 1% of pixels and utilizing only 256 samples for perturbation fitting. Our algorithm also admits higher attack magnitude without affecting the human ability to solve the task, and damaging 5% of pixels attains more than a 50% fooling rate on average across models. Finally, our findings demonstrate the vulnerability of state-of-the-art models to universal sparse attacks and highlight the importance of developing robust machine learning systems.

Негізгі сөздер

Авторлар туралы

K. Kuvshinova

Sber AI Lab; Skolkovo Institute of Science and Technology

Хат алмасуға жауапты Автор.
Email: kseniia.kuvshinova@skoltech.ru
Moscow; Moscow

O. Tsymboi

Sber AI Lab; Moscow Institute of Physics and Technology

Email: tsimboy.oa@phystech.edu
Moscow; Moscow

I. Oseledets

Artificial Intelligence Research Institute (AIRI); Skolkovo Institute of Science and Technology

Email: oseledets@airi.net
Moscow; Moscow

Әдебиет тізімі

  1. Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
  2. Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M.A., Lacroix T., Rozie`re B., Goyal N., Hambro E., Azhar F., Rodriguez A., Joulin A., Grave E., Lample G. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  3. Chung H.W., Hou L., Longpre S., Zoph B., Tay Y., Fedus W., Li Y., Wang X., Dehghani M., Brahma S., et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  4. Roy N., Posner I., Barfoot T., Beaudoin P., Bengio Y., Bohg J., Brock O., Depatie I., Fox D., Koditschek D., et al. From machine learning to robotics: Challenges and opportunities for embodied intelligence. arXiv preprint arXiv:2110.15245.
  5. Baevski A., Zhou H., Mohamed A., Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477.
  6. Szegedy C., Zaremba W., Sutskever I., Bruna J., Erhan D., Goodfellow I., Fergus R. Intriguing properties of neural networks. In International Conference on Learning Representations.
  7. Goodfellow I.J., Shlens J., Szegedy C. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572.
  8. Moosavi-Dezfooli S.M., Fawzi A., Frossard P. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2574–2582.
  9. Khrulkov V., Oseledets I. Art of singular vectors and universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8562–8570.
  10. Boyd D.W. The Power Method for lp Norms. Linear Algebra and its Applications, vol. 9 (1974), pp. 95–101.
  11. Song Y., Shu R., Kushman N., Ermon S. Constructing Unrestricted Adversarial Examples with Generative Models. In Advances in Neural Information Processing Systems (NeurIPS). vol. 31.
  12. Brown T.B., Carlini N., Zhang C., et al. Unrestricted Adversarial Examples. arXiv preprint arXiv:1809.08352.
  13. Hu C., Shi W. Adversarial Color Film: Effective Physical-World Attack to DNNs. arXiv preprint arXiv:2209.02430.
  14. Li J., Schmidt F., Kolter J.Z. Adversarial camera stickers: A physical camera-based attack on deep learning systems. In International Conference on Machine Learning. PMLR, pp. 3896–3904.
  15. Pautov M., Melnikov G., Kaziakhmedov E., et al. On Adversarial Patches: Real-World Attack on ArcFace-100 Face Recognition System. In 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON). IEEE, pp. 391–396.
  16. Kaziakhmedov E., Kireev K., Melnikov G., et al. Real-World Attack on MTCNN Face Detection System. In 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON). IEEE, pp. 422–427.
  17. Croce F., Hein M. Sparse and Imperceivable Adversarial Attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4724–4732.
  18. Modas A., Moosavi-Dezfooli S.M., Frossard P. SparseFool: a few pixels make a big difference. arXiv preprint arXiv:1811.02248.
  19. Dong X., Chen D., Bao J., Qin C., Yuan L., Zhang W., Yu N., Chen D. Greedyfool: Distortion-Aware Sparse Adversarial Attack. In Advances in Neural Information Processing Systems, vol. 33, pp. 11226–11236.
  20. Papernot N., McDaniel P., Goodfellow I. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277.
  21. Papernot N., McDaniel P., Goodfellow I., Jha S., Celik Z.B., Swami A. Practical Black-Box Attacks against Machine Learning. In Proceedings of the 2017 ACM Asia Conference on Computer and Communications Security. Association for Computing Machinery, pp. 506–519.
  22. Liu S., Deng W. Very Deep Convolutional Neural Network Based Image Classification Using Small Training Sample Size. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730–734.
  23. Croce F., Andriushchenko M., Singh N.D., Flammarion N., Hein M. Sparse-RS: A Versatile Framework for QueryEfficient Sparse Black-Box Adversarial Attacks. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 6 (2022), pp. 6437–6445.
  24. He Z., Wang W., Dong J., Tan T. Transferable sparse adversarial attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14963–14972.
  25. Hayes J., Danezis G. Learning universal adversarial perturbations with generative models. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, pp. 43–49.
  26. Mopuri K.R., Ojha U., Garg U., Babu R.V. Nag: Network for adversary generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 742–751.
  27. Deng J., Dong W., Socher R., Li L., Li K., Fei-Fei L. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255.
  28. Yuan X.T., Zhang T. Truncated Power Method for Sparse Eigenvalue Problems. Journal of Machine Learning Research, vol. 14, no. 4.
  29. Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708.
  30. Tan M., Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning (ICML), pp. 6105–6114.
  31. Szegedy C., Liu W., Jia Y., et al. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9.
  32. He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
  33. Zagoruyko S., Komodakis N. Wide Residual Networks. In British Machine Vision Conference (BMVC). BMVA Press.
  34. Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jegou H. Training Data-Efficient Image Transformers & Distillation Through Attention. In International Conference on Machine Learning, pp. 10347–10357.
  35. Ansel J., Yang E., He H., Gimelshein N., Jain A., Voznesensky M., Bao B., Bell P., Berard D., Burovski E., et al. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. vol. 2, pp. 929–947.
  36. Wolf T., Debut L., Sanh V., Chaumond J., Delangue C., Moi A., Cistac P., Rault T., Louf R., Funtowicz M., et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45.
  37. Naseer M., Khan S., Hayat M., Khan F.S., Porikli F. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 262–271.
  38. Shafahi A., Najibi M., Xu Z., Dickerson J., Davis L.S., Goldstein T. Universal adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 5636–5643.
  39. Co K.T., Munoz-Gonzalez L., de Maupeou S., Lupu E.C. Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, pp. 275–289.
  40. Peng A., Deng K., Zeng H., Wu K., Yu W. Detecting Adversarial Examples via Classification Difference of a Robust Surrogate Model. In International Conference on Neural Information Processing. Springer, pp. 558–570.
  41. Lukasik J., Gavrikov P., Keuper J., Keuper M. Improving Native CNN Robustness with Filter Frequency Regularization. Transactions on Machine Learning Research.
  42. Wu B., Xu C., Dai X., Wan A., Zhang P., Yan Z., Tomizuka M., Gonzalez J., Keutzer K., Vajda P. Visual Transformers: Token-Based Image Representation and Processing for Computer Vision. arXiv preprint, vol. arXiv:2006.03677.
  43. Moosavi-Dezfooli S.M., Fawzi A., Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582.
  44. Carlini N., Wagner D. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57.
  45. Madry A., Makelov A., Schmidt L., Tsipras D., Vladu A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
  46. Wong E., Schmidt F., Kolter Z. Wasserstein adversarial examples via projected sinkhorn iterations. In International Conference on Machine Learning. PMLR, pp. 6808–6817.
  47. Wong E., Kolter Z. Learning perturbation sets for robust machine learning. arXiv preprint arXiv:2007.08450.
  48. Guo Y., Yan Z., Zhang C. Subspace attack: Exploiting promising subspaces for query-efficient black-box attacks. In Advances in Neural Information Processing Systems, vol. 32.
  49. Cheng M., Le T., Chen P.Y., Yi J., Zhang H., Hsieh C.J. Query-efficient hard-label black-box attack: An optimizationbased approach. arXiv preprint arXiv:1807.04457.
  50. Wang L., Zhang H., Yi J., Hsieh C.J., Jiang Y. Spanning attack: Reinforce black-box attacks with unlabeled data. Machine Learning, vol. 109 (2020), pp. 2349–2368.
  51. Guo C., Gardner J., You Y., Wilson A.G., Weinberger K.Q. Simple black-box adversarial attacks. In International Conference on Machine Learning. PMLR, pp. 2484–2493.
  52. Andriushchenko M., Croce F., Flammarion N., Hein M. Square attack: A query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision. Springer, pp. 484–501.
  53. Chen J., Jordan M.I., Wainwright M.J. Hopskipjumpattack: A query-efficient decision-based attack. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 1277–1294.
  54. Papernot N., McDaniel P., Jha S., Fredrikson M., Celik Z.B., Swami A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, pp. 372–387.
  55. Su J., Vargas D.V., Sakurai K. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, vol. 23, no. 5 (2019), pp. 828–841.
  56. Chen P.Y., Sharma Y., Zhang H., Yi J., Hsieh C.J. EAD: Elastic-net attacks to deep neural networks via adversarial examples. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32.
  57. Xu K., Liu S., Zhao P., Chen P.Y., Zhang H., Fan Q., Erdogmus D., Wang Y., Lin X. Structured Adversarial Attack: Towards General Implementation and Better Interpretability. In International Conference on Learning Representations.
  58. Dong X., Chen D., Bao J., Qin C., Yuan L., Zhang W., Yu N., Chen D. Greedyfool: Distortion-aware sparse adversarial attack. In Advances in Neural Information Processing Systems, vol. 33, pp. 11226–11236.
  59. Zhu M., Chen T., Wang Z. Sparse and imperceptible adversarial attack via a homotopy algorithm. In Proceedings of the International Conference on Machine Learning. PMLR, pp. 12868–12877.
  60. Moosavi-Dezfooli S.M., Fawzi A., Fawzi O., Frossard P. Universal Adversarial Perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  61. Tsymboi O., Malaev D., Petrovskii A., Oseledets I. Layerwise universal adversarial attack on NLP models. In Findings of the Association for Computational Linguistics: ACL 2023, pp. 129–143.
  62. Poursaeed O., Katsman I., Gao B., Belongie S.F. Generative adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4422–4431.
  63. Chen Z., Li B., Wu S., Jiang K., Ding S., Zhang W. Content-based Unrestricted Adversarial Attack. arXiv preprint arXiv:2305.10665.
  64. Zhou Z., Hu S., Zhao R., Wang Q., Zhang L.Y., Hou J., Jin H. Downstream-Agnostic Adversarial Examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4345–4355.
  65. Zhou Z., Hu S., Li M., Zhang H., Zhang Y., Jin H. Advclip: Downstream-Agnostic Adversarial Examples in Multimodal Contrastive Learning. In Proceedings of the 31st ACM International Conference on Multimedia, pp. 6311–6320.
  66. Ban Y., Dong Y. Pre-Trained Adversarial Perturbations, vol. 35 (2022), pp. 1196–1209.
  67. Zhang C., Benz P., Imtiaz T., Kweon I.S. Understanding adversarial examples from the mutual influence of images and perturbations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14521–14530.
  68. Li M., Yang Y., Wei K., Yang X., Huang H. Learning universal adversarial perturbation by adversarial example. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1350–1358.
  69. Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, vol. 3, no. 1 (2011), pp. 1–122.
  70. Co K.T., Munoz-Gonzalez L., Kanthan L., Glocker B., Lupu E.C. Universal adversarial robustness of texture and shape-biased models. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 799–803.
  71. Zhang Z., Liu P., Wang X., Xu C. Improving adversarial transferability with scheduled step size and dual example. arXiv preprint arXiv:2301.12968.
  72. Devaguptapu C., Agarwal D., Mittal G., Gopalani P., Balasubramanian V.N. On adversarial robustness: A neural architecture search perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 152–161.
  73. Wu B., Chen J., Cai D., He X., Gu Q. Do wider neural networks really help adversarial robustness? Advances in Neural Information Processing Systems, vol. 34 (2021), pp. 7054–7067.
  74. Zhu Z., Liu F., Chrysos G., Cevher V. Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization). Advances in neural information processing systems, vol. 35 (2022), pp. 36094–36107.
  75. Huang H., Wang Y., Erfani S., Gu Q., Bailey J., Ma X. Exploring architectural ingredients of adversarially robust deep neural networks. Advances in Neural Information Processing Systems, vol. 34 (2021), pp. 5545–5559.

Қосымша файлдар

Қосымша файлдар
Әрекет
1. JATS XML

© Russian Academy of Sciences, 2025

Согласие на обработку персональных данных с помощью сервиса «Яндекс.Метрика»

1. Я (далее – «Пользователь» или «Субъект персональных данных»), осуществляя использование сайта https://journals.rcsi.science/ (далее – «Сайт»), подтверждая свою полную дееспособность даю согласие на обработку персональных данных с использованием средств автоматизации Оператору - федеральному государственному бюджетному учреждению «Российский центр научной информации» (РЦНИ), далее – «Оператор», расположенному по адресу: 119991, г. Москва, Ленинский просп., д.32А, со следующими условиями.

2. Категории обрабатываемых данных: файлы «cookies» (куки-файлы). Файлы «cookie» – это небольшой текстовый файл, который веб-сервер может хранить в браузере Пользователя. Данные файлы веб-сервер загружает на устройство Пользователя при посещении им Сайта. При каждом следующем посещении Пользователем Сайта «cookie» файлы отправляются на Сайт Оператора. Данные файлы позволяют Сайту распознавать устройство Пользователя. Содержимое такого файла может как относиться, так и не относиться к персональным данным, в зависимости от того, содержит ли такой файл персональные данные или содержит обезличенные технические данные.

3. Цель обработки персональных данных: анализ пользовательской активности с помощью сервиса «Яндекс.Метрика».

4. Категории субъектов персональных данных: все Пользователи Сайта, которые дали согласие на обработку файлов «cookie».

5. Способы обработки: сбор, запись, систематизация, накопление, хранение, уточнение (обновление, изменение), извлечение, использование, передача (доступ, предоставление), блокирование, удаление, уничтожение персональных данных.

6. Срок обработки и хранения: до получения от Субъекта персональных данных требования о прекращении обработки/отзыва согласия.

7. Способ отзыва: заявление об отзыве в письменном виде путём его направления на адрес электронной почты Оператора: info@rcsi.science или путем письменного обращения по юридическому адресу: 119991, г. Москва, Ленинский просп., д.32А

8. Субъект персональных данных вправе запретить своему оборудованию прием этих данных или ограничить прием этих данных. При отказе от получения таких данных или при ограничении приема данных некоторые функции Сайта могут работать некорректно. Субъект персональных данных обязуется сам настроить свое оборудование таким способом, чтобы оно обеспечивало адекватный его желаниям режим работы и уровень защиты данных файлов «cookie», Оператор не предоставляет технологических и правовых консультаций на темы подобного характера.

9. Порядок уничтожения персональных данных при достижении цели их обработки или при наступлении иных законных оснований определяется Оператором в соответствии с законодательством Российской Федерации.

10. Я согласен/согласна квалифицировать в качестве своей простой электронной подписи под настоящим Согласием и под Политикой обработки персональных данных выполнение мною следующего действия на сайте: https://journals.rcsi.science/ нажатие мною на интерфейсе с текстом: «Сайт использует сервис «Яндекс.Метрика» (который использует файлы «cookie») на элемент с текстом «Принять и продолжить».