ABSTRACT
The validity and quality of data analysis relies largely on the data accuracy
and completeness of the data matrix. Missing values are unavoidable statistical
research problems in almost every research study and if not handled properly,
may provide negative and bias conclusion. This study purposely sought to
investigate the efficacy and accuracy of the convergence of five imputation
algorithms: expectation maximization (EM), multiple imputation by chained
equation (MICE), k nearest neighbor (KNN), mean substitution (MS) and
regression substitution (RS) in estimating and replacing missing values in crosssectional
world population data sheet using MCAR and MAR assumptions. This
thesis used Little’s Test to verify whether a given data matrix with missing values
is MCAR or MAR. Multiple linear regression analysis model was used to run the
complete data of the world population data sheet, and thereafter, missing values
in the complete data sets were artificially introduced at 5%, 10%, 20%, 30%
and 40% under two missing data mechanisms (MCAR & MAR). The imputation
algorithms used for evaluating missing data problems were assessed and compared
using average coefficient difference (ACD) of multiple linear regression (MLR)
model, mean absolute difference (MAD) and the coefficient of determination (R2).
The study suggested that, when data on cross-sectional World Population Data
Sheet is missing completely at random (MCAR) and normally distributed, the
regression substitution is the best approach. The MICE algorithm was found to be
comparatively the best method for replacing missingness under MAR assumption.
Since this thesis is mainly concentrated on missing data imputation in a crosssectional
dataset, it is recommended that in future categorical and longitudinal
studies should be considered.
GYIMAH, O (2021). Statistical Assessment Of Imputation Algorithms For Estimation Of Missing Values In Cross Sectional Data. Afribary. Retrieved from https://track.afribary.com/works/statistical-assessment-of-imputation-algorithms-for-estimation-of-missing-values-in-cross-sectional-data
GYIMAH, OSCAR "Statistical Assessment Of Imputation Algorithms For Estimation Of Missing Values In Cross Sectional Data" Afribary. Afribary, 18 Apr. 2021, https://track.afribary.com/works/statistical-assessment-of-imputation-algorithms-for-estimation-of-missing-values-in-cross-sectional-data. Accessed 27 Nov. 2024.
GYIMAH, OSCAR . "Statistical Assessment Of Imputation Algorithms For Estimation Of Missing Values In Cross Sectional Data". Afribary, Afribary, 18 Apr. 2021. Web. 27 Nov. 2024. < https://track.afribary.com/works/statistical-assessment-of-imputation-algorithms-for-estimation-of-missing-values-in-cross-sectional-data >.
GYIMAH, OSCAR . "Statistical Assessment Of Imputation Algorithms For Estimation Of Missing Values In Cross Sectional Data" Afribary (2021). Accessed November 27, 2024. https://track.afribary.com/works/statistical-assessment-of-imputation-algorithms-for-estimation-of-missing-values-in-cross-sectional-data