SPARSE AND TRANSFERABLE UNIVERSAL SINGULAR VECTORS ATTACK
- Авторлар: Kuvshinova K.1,2, Tsymboi O.1,3, Oseledets I.4,2
-
Мекемелер:
- Sber AI Lab
- Skolkovo Institute of Science and Technology
- Moscow Institute of Physics and Technology
- Artificial Intelligence Research Institute (AIRI)
- Шығарылым: Том 65, № 3 (2025)
- Беттер: 275-293
- Бөлім: Optimal control
- URL: https://journal-vniispk.ru/0044-4669/article/view/293539
- DOI: https://doi.org/10.31857/S0044466925030044
- EDN: https://elibrary.ru/HRSLBS
- ID: 293539
Дәйексөз келтіру
Аннотация
Mounting concerns about neural networks’ safety and robustness call for a deeper understanding of models’ vulnerability and research in adversarial attacks. Motivated by this, we propose a novel universal attack that is highly efficient in terms of transferability. In contrast to the existing (p, q)-singular vectors approach, we focus on finding sparse singular vectors of Jacobian matrices of the hidden layers by employing the truncated power iteration method. We discovered that using resulting vectors as adversarial perturbations can effectively attack the original model and models with entirely different architectures, highlighting the importance of sparsity constraint for attack transferability. Moreover, we achieve results comparable to dense baselines while damaging less than 1% of pixels and utilizing only 256 samples for perturbation fitting. Our algorithm also admits higher attack magnitude without affecting the human ability to solve the task, and damaging 5% of pixels attains more than a 50% fooling rate on average across models. Finally, our findings demonstrate the vulnerability of state-of-the-art models to universal sparse attacks and highlight the importance of developing robust machine learning systems.
Негізгі сөздер
Авторлар туралы
K. Kuvshinova
Sber AI Lab; Skolkovo Institute of Science and Technology
Хат алмасуға жауапты Автор.
Email: kseniia.kuvshinova@skoltech.ru
Moscow; Moscow
O. Tsymboi
Sber AI Lab; Moscow Institute of Physics and Technology
Email: tsimboy.oa@phystech.edu
Moscow; Moscow
I. Oseledets
Artificial Intelligence Research Institute (AIRI); Skolkovo Institute of Science and Technology
Email: oseledets@airi.net
Moscow; Moscow
Әдебиет тізімі
- Dosovitskiy A., Beyer L., Kolesnikov A., Weissenborn D., Zhai X., Unterthiner T., Dehghani M., Minderer M., Heigold G., Gelly S., Uszkoreit J., Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
- Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M.A., Lacroix T., Rozie`re B., Goyal N., Hambro E., Azhar F., Rodriguez A., Joulin A., Grave E., Lample G. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Chung H.W., Hou L., Longpre S., Zoph B., Tay Y., Fedus W., Li Y., Wang X., Dehghani M., Brahma S., et al. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Roy N., Posner I., Barfoot T., Beaudoin P., Bengio Y., Bohg J., Brock O., Depatie I., Fox D., Koditschek D., et al. From machine learning to robotics: Challenges and opportunities for embodied intelligence. arXiv preprint arXiv:2110.15245.
- Baevski A., Zhou H., Mohamed A., Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. arXiv preprint arXiv:2006.11477.
- Szegedy C., Zaremba W., Sutskever I., Bruna J., Erhan D., Goodfellow I., Fergus R. Intriguing properties of neural networks. In International Conference on Learning Representations.
- Goodfellow I.J., Shlens J., Szegedy C. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572.
- Moosavi-Dezfooli S.M., Fawzi A., Frossard P. DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2574–2582.
- Khrulkov V., Oseledets I. Art of singular vectors and universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8562–8570.
- Boyd D.W. The Power Method for lp Norms. Linear Algebra and its Applications, vol. 9 (1974), pp. 95–101.
- Song Y., Shu R., Kushman N., Ermon S. Constructing Unrestricted Adversarial Examples with Generative Models. In Advances in Neural Information Processing Systems (NeurIPS). vol. 31.
- Brown T.B., Carlini N., Zhang C., et al. Unrestricted Adversarial Examples. arXiv preprint arXiv:1809.08352.
- Hu C., Shi W. Adversarial Color Film: Effective Physical-World Attack to DNNs. arXiv preprint arXiv:2209.02430.
- Li J., Schmidt F., Kolter J.Z. Adversarial camera stickers: A physical camera-based attack on deep learning systems. In International Conference on Machine Learning. PMLR, pp. 3896–3904.
- Pautov M., Melnikov G., Kaziakhmedov E., et al. On Adversarial Patches: Real-World Attack on ArcFace-100 Face Recognition System. In 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON). IEEE, pp. 391–396.
- Kaziakhmedov E., Kireev K., Melnikov G., et al. Real-World Attack on MTCNN Face Detection System. In 2019 International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON). IEEE, pp. 422–427.
- Croce F., Hein M. Sparse and Imperceivable Adversarial Attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4724–4732.
- Modas A., Moosavi-Dezfooli S.M., Frossard P. SparseFool: a few pixels make a big difference. arXiv preprint arXiv:1811.02248.
- Dong X., Chen D., Bao J., Qin C., Yuan L., Zhang W., Yu N., Chen D. Greedyfool: Distortion-Aware Sparse Adversarial Attack. In Advances in Neural Information Processing Systems, vol. 33, pp. 11226–11236.
- Papernot N., McDaniel P., Goodfellow I. Transferability in machine learning: From phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277.
- Papernot N., McDaniel P., Goodfellow I., Jha S., Celik Z.B., Swami A. Practical Black-Box Attacks against Machine Learning. In Proceedings of the 2017 ACM Asia Conference on Computer and Communications Security. Association for Computing Machinery, pp. 506–519.
- Liu S., Deng W. Very Deep Convolutional Neural Network Based Image Classification Using Small Training Sample Size. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730–734.
- Croce F., Andriushchenko M., Singh N.D., Flammarion N., Hein M. Sparse-RS: A Versatile Framework for QueryEfficient Sparse Black-Box Adversarial Attacks. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 6 (2022), pp. 6437–6445.
- He Z., Wang W., Dong J., Tan T. Transferable sparse adversarial attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14963–14972.
- Hayes J., Danezis G. Learning universal adversarial perturbations with generative models. In 2018 IEEE Security and Privacy Workshops (SPW). IEEE, pp. 43–49.
- Mopuri K.R., Ojha U., Garg U., Babu R.V. Nag: Network for adversary generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 742–751.
- Deng J., Dong W., Socher R., Li L., Li K., Fei-Fei L. ImageNet: A Large-Scale Hierarchical Image Database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 248–255.
- Yuan X.T., Zhang T. Truncated Power Method for Sparse Eigenvalue Problems. Journal of Machine Learning Research, vol. 14, no. 4.
- Huang G., Liu Z., Van Der Maaten L., Weinberger K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708.
- Tan M., Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning (ICML), pp. 6105–6114.
- Szegedy C., Liu W., Jia Y., et al. Going Deeper with Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9.
- He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.
- Zagoruyko S., Komodakis N. Wide Residual Networks. In British Machine Vision Conference (BMVC). BMVA Press.
- Touvron H., Cord M., Douze M., Massa F., Sablayrolles A., Jegou H. Training Data-Efficient Image Transformers & Distillation Through Attention. In International Conference on Machine Learning, pp. 10347–10357.
- Ansel J., Yang E., He H., Gimelshein N., Jain A., Voznesensky M., Bao B., Bell P., Berard D., Burovski E., et al. PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. vol. 2, pp. 929–947.
- Wolf T., Debut L., Sanh V., Chaumond J., Delangue C., Moi A., Cistac P., Rault T., Louf R., Funtowicz M., et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45.
- Naseer M., Khan S., Hayat M., Khan F.S., Porikli F. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 262–271.
- Shafahi A., Najibi M., Xu Z., Dickerson J., Davis L.S., Goldstein T. Universal adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence. vol. 34, pp. 5636–5643.
- Co K.T., Munoz-Gonzalez L., de Maupeou S., Lupu E.C. Procedural Noise Adversarial Examples for Black-Box Attacks on Deep Convolutional Networks. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. Association for Computing Machinery, pp. 275–289.
- Peng A., Deng K., Zeng H., Wu K., Yu W. Detecting Adversarial Examples via Classification Difference of a Robust Surrogate Model. In International Conference on Neural Information Processing. Springer, pp. 558–570.
- Lukasik J., Gavrikov P., Keuper J., Keuper M. Improving Native CNN Robustness with Filter Frequency Regularization. Transactions on Machine Learning Research.
- Wu B., Xu C., Dai X., Wan A., Zhang P., Yan Z., Tomizuka M., Gonzalez J., Keutzer K., Vajda P. Visual Transformers: Token-Based Image Representation and Processing for Computer Vision. arXiv preprint, vol. arXiv:2006.03677.
- Moosavi-Dezfooli S.M., Fawzi A., Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582.
- Carlini N., Wagner D. Towards Evaluating the Robustness of Neural Networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57.
- Madry A., Makelov A., Schmidt L., Tsipras D., Vladu A. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083.
- Wong E., Schmidt F., Kolter Z. Wasserstein adversarial examples via projected sinkhorn iterations. In International Conference on Machine Learning. PMLR, pp. 6808–6817.
- Wong E., Kolter Z. Learning perturbation sets for robust machine learning. arXiv preprint arXiv:2007.08450.
- Guo Y., Yan Z., Zhang C. Subspace attack: Exploiting promising subspaces for query-efficient black-box attacks. In Advances in Neural Information Processing Systems, vol. 32.
- Cheng M., Le T., Chen P.Y., Yi J., Zhang H., Hsieh C.J. Query-efficient hard-label black-box attack: An optimizationbased approach. arXiv preprint arXiv:1807.04457.
- Wang L., Zhang H., Yi J., Hsieh C.J., Jiang Y. Spanning attack: Reinforce black-box attacks with unlabeled data. Machine Learning, vol. 109 (2020), pp. 2349–2368.
- Guo C., Gardner J., You Y., Wilson A.G., Weinberger K.Q. Simple black-box adversarial attacks. In International Conference on Machine Learning. PMLR, pp. 2484–2493.
- Andriushchenko M., Croce F., Flammarion N., Hein M. Square attack: A query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision. Springer, pp. 484–501.
- Chen J., Jordan M.I., Wainwright M.J. Hopskipjumpattack: A query-efficient decision-based attack. In 2020 IEEE Symposium on Security and Privacy (SP). IEEE, pp. 1277–1294.
- Papernot N., McDaniel P., Jha S., Fredrikson M., Celik Z.B., Swami A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, pp. 372–387.
- Su J., Vargas D.V., Sakurai K. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, vol. 23, no. 5 (2019), pp. 828–841.
- Chen P.Y., Sharma Y., Zhang H., Yi J., Hsieh C.J. EAD: Elastic-net attacks to deep neural networks via adversarial examples. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32.
- Xu K., Liu S., Zhao P., Chen P.Y., Zhang H., Fan Q., Erdogmus D., Wang Y., Lin X. Structured Adversarial Attack: Towards General Implementation and Better Interpretability. In International Conference on Learning Representations.
- Dong X., Chen D., Bao J., Qin C., Yuan L., Zhang W., Yu N., Chen D. Greedyfool: Distortion-aware sparse adversarial attack. In Advances in Neural Information Processing Systems, vol. 33, pp. 11226–11236.
- Zhu M., Chen T., Wang Z. Sparse and imperceptible adversarial attack via a homotopy algorithm. In Proceedings of the International Conference on Machine Learning. PMLR, pp. 12868–12877.
- Moosavi-Dezfooli S.M., Fawzi A., Fawzi O., Frossard P. Universal Adversarial Perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Tsymboi O., Malaev D., Petrovskii A., Oseledets I. Layerwise universal adversarial attack on NLP models. In Findings of the Association for Computational Linguistics: ACL 2023, pp. 129–143.
- Poursaeed O., Katsman I., Gao B., Belongie S.F. Generative adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4422–4431.
- Chen Z., Li B., Wu S., Jiang K., Ding S., Zhang W. Content-based Unrestricted Adversarial Attack. arXiv preprint arXiv:2305.10665.
- Zhou Z., Hu S., Zhao R., Wang Q., Zhang L.Y., Hou J., Jin H. Downstream-Agnostic Adversarial Examples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4345–4355.
- Zhou Z., Hu S., Li M., Zhang H., Zhang Y., Jin H. Advclip: Downstream-Agnostic Adversarial Examples in Multimodal Contrastive Learning. In Proceedings of the 31st ACM International Conference on Multimedia, pp. 6311–6320.
- Ban Y., Dong Y. Pre-Trained Adversarial Perturbations, vol. 35 (2022), pp. 1196–1209.
- Zhang C., Benz P., Imtiaz T., Kweon I.S. Understanding adversarial examples from the mutual influence of images and perturbations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14521–14530.
- Li M., Yang Y., Wei K., Yang X., Huang H. Learning universal adversarial perturbation by adversarial example. In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 1350–1358.
- Boyd S., Parikh N., Chu E., Peleato B., Eckstein J., et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, vol. 3, no. 1 (2011), pp. 1–122.
- Co K.T., Munoz-Gonzalez L., Kanthan L., Glocker B., Lupu E.C. Universal adversarial robustness of texture and shape-biased models. In 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 799–803.
- Zhang Z., Liu P., Wang X., Xu C. Improving adversarial transferability with scheduled step size and dual example. arXiv preprint arXiv:2301.12968.
- Devaguptapu C., Agarwal D., Mittal G., Gopalani P., Balasubramanian V.N. On adversarial robustness: A neural architecture search perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 152–161.
- Wu B., Chen J., Cai D., He X., Gu Q. Do wider neural networks really help adversarial robustness? Advances in Neural Information Processing Systems, vol. 34 (2021), pp. 7054–7067.
- Zhu Z., Liu F., Chrysos G., Cevher V. Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization). Advances in neural information processing systems, vol. 35 (2022), pp. 36094–36107.
- Huang H., Wang Y., Erfani S., Gu Q., Bailey J., Ma X. Exploring architectural ingredients of adversarially robust deep neural networks. Advances in Neural Information Processing Systems, vol. 34 (2021), pp. 5545–5559.
Қосымша файлдар
