Measuring similarity between Karel programs using character and word  n -grams

G. Sidorov; M. Ibarra Romero; I. Markov; R. Guzman-Cabrera; L. Chanona-Hernández; F. Velásquez

doi:10.1134/S0361768817010066

Measuring similarity between Karel programs using character and word n-grams

Авторлар: Sidorov G.¹, Ibarra Romero M.¹, Markov I.¹, Guzman-Cabrera R.², Chanona-Hernández L.³, Velásquez F.⁴
Мекемелер:
1. Instituto Politécnico Nacional (IPN)
2. Engineering Division
3. Instituto Politécnico Nacional
4. Polytechnic University of Queretaro
Шығарылым: Том 43, № 1 (2017)
Беттер: 47-50
Бөлім: Article
URL: https://journal-vniispk.ru/0361-7688/article/view/176478
DOI: https://doi.org/10.1134/S0361768817010066
ID: 176478

Дәйексөз келтіру

Толық мәтін

Ашық рұқсат
Рұқсат жабық

Рұқсат берілді
Рұқсат жабық

Тек жазылушылар үшін

Аннотация
Авторлар туралы
Әдебиет тізімі
Қосымша файлдар
Статистика

Аннотация

We present a method for measuring similarity between source codes. We approach this task from the machine learning perspective using character and word n-grams as features and examining different machine learning algorithms. Furthermore, we explore the contribution of the latent semantic analysis in this task. We developed a corpus in order to evaluate the proposed approach. The corpus consists of around 10,000 source codes written in the Karel programming language to solve 100 different tasks. The results show that the highest classification accuracy is achieved when using Support Vector Machines classifier, applying the latent semantic analysis, and selecting as features trigrams of words.

Негізгі сөздер

machine learning, similarity, Karel programming language, character n-grams, word n-grams, SVM, LSA

Қосымша файлдар

Әрекет

1. JATS XML

Жүктеу

Пайдаланушының аты
Құпиясөз
Мені есте сақтау

Құпия сөзді ұмыттыңыз ба?	Тіркеу

Пайдаланушының аты
Құпиясөз
Мені есте сақтау

Құпия сөзді ұмыттыңыз ба?	Тіркеу

Measuring similarity between Karel programs using character and word n-grams

Толық мәтін

Аннотация

Негізгі сөздер

Авторлар туралы

G. Sidorov

M. Ibarra Romero

I. Markov

R. Guzman-Cabrera

L. Chanona-Hernández

F. Velásquez

Қосымша файлдар