Abstract
Text mining is an important field in information retrieval; it organize a large number of text
documents that available on the internet to facilitate the retrieved processing and increase
efficiency. Text classification is automatically determining the category to new or unseen
documents that depends on content of document itself. In text classification, text
preprocessing is a fundamental step to obtained a better result. The Arabic text processing
depends on stemming algorithms to achieve high accuracy. This research aims to compare
between two stemming algorithms stem approach (snowball light) and root approach
(Shereen Khoja) using three similarity measures: Euclidean distance, cosine similarity, and
pearson correlation distance. This research use Arabic Wikipedia dataset and TF-IDF as
weight scheme to construct the vector space model to represent the weight of selected
features of text. For evaluation measures, the research applies overall accuracy, average
recall, average precision, and average F1 measure to assess the results of the classified text
documents. The collection of document is divided into training and test documents
according to three experimental (85% – 15%) (80% – 20%) (90% – 10%) for training and
test document respectively. The results showed the overall accuracy of Shereen Khoja
stemmer is better than Snowball stemmer in all experimental excluding cosine similarity in
the first experimental and Euclidean distance in the third experimental which has a better
accuracy when use Snowball stemmer.
Ali, M (2021). A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures. Afribary. Retrieved from https://track.afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures
Ali, Mohamed "A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures" Afribary. Afribary, 19 May. 2021, https://track.afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures. Accessed 27 Nov. 2024.
Ali, Mohamed . "A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures". Afribary, Afribary, 19 May. 2021. Web. 27 Nov. 2024. < https://track.afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures >.
Ali, Mohamed . "A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures" Afribary (2021). Accessed November 27, 2024. https://track.afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures