A novel prediction algorithm for multivariate data sets

  • Pinki Sagar Computer Science and Technology, Manav Rachna University, Haryana, India
  • Prinima Gupta Computer Science and Technology, Manav Rachna University, Haryana, India
  • Rohit Tanwar School of Computer Science, University of Petroleum & Energy Studies, Dehradun Uttarakhand, India
Keywords: Coefficient of determination, Mean square error, Actual means, Multiple Linear Regression (MLR), Root Mean Square Error (RMSE), Mean Square Error (MSE)


Regression analysis is a statistical technique that is most commonly used for forecasting. Data sets are becoming very large due to continuous transactions in today's high-paced world. The data is difficult to manage and interpret. All the independent variables can’t be considered for the prediction because it costs high for maintenance of the data set. A novel algorithm for prediction has been implemented in this paper. Its emphasis is on the extraction of efficient independent variables from various variables of the data set. The selection of variables is based on Mean Square Errors (MSE) as well as on the coefficient of determination r2p, after that, the final prediction equation for the algorithm is framed on the basis of deviation of the actual mean. This is a statistical-based prediction algorithm that is used to evaluate the prediction based on four parameters:  Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and residuals. This algorithm has been implemented for a multivariate data set with low maintenance costs, preprocessing costs, lower root mean square error and residuals. For one-dimensional, two-dimensional, frequent stream data, time-series data, and continuous data, the proposed prediction algorithm can also be used. The impact of this algorithm is to enhance the accuracy rate of forecasting and minimized average error rate.


Download data is not yet available.


Antoniadis, A., Lambert-Lacroix, S., & Poggi, J.-M. (2021). Random forests for global sensitivity analysis: A selective review. Reliability Engineering & System Safety. 28 193 – 222.

Chai, D. J., Kim, E. H., Jin, L., Hwang, B., & Ryu, K. H. (2007). Prediction of Frequent Items to One Dimensional Stream Data. International Conference on Computational Science and its Applications(ICCSA). 353 – 360. IEEE. DOI: https://doi.org/10.1109/ICCSA.2007.61

Daniya, T., Geetha. M., & Cristin, B. (2020).Least Square Estimation of Parameters for Linear Regression. International Journal of Control and Automation, 13, 447 - 452.

Gauba, H., Kumar, P., Roy, P. P., Singh, P., Dogra, D. P., & Raman, B. (2017). Prediction of advertisement preference by fusing EEG response and sentiment analysis. Neural Networks, 92, 77–88 . DOI: https://doi.org/10.1016/j.neunet.2017.01.013

Ilayaraja M., & Meyyappan T. (2015). Efficient Data Mining Method to Predict the Risk of Heart Diseases Through Frequent Itemsets. Procedia Computer Science,70, 586–592. DOI: https://doi.org/10.1016/j.procs.2015.10.040

Kavitha S, Varuna S ., & Ramya R.(2016). A comparative analysis on linear regression and support vector regression, International Conference on Green Engineering and Technologies (IC-GET), (1-5).IEEE. DOI: https://doi.org/10.1109/GET.2016.7916627

Khan, F., Kari, D., Karatepe, I. A., & Kozat, S. S. (2016). Universal Nonlinear Regression on High Dimensional Data Using Adaptive Hierarchical Trees. IEEE Transactions on Big Data, 2(2), 175–188. DOI: https://doi.org/10.1109/TBDATA.2016.2555323

Mukherjee, S., Ghosh, S., Ghosh, S., Kumar, P., & Roy, P. P. (2019). Predicting Video-frames Using Encoder-convlstm Combination. International Conference on Acoustics, Speech and Signal Processing (ICASSP). (2027-2031). IEEE.

Mustapha, A., & Fadzil, F. (2015). A Regression Approach for Forecasting Vendor Revenue in Telecommunication Industries. International Journal of Engineering and Technology. 6, 2604-2608.

Ostertagova, E., Frankovsky, P., & Ostertag, O. (2016). Application of polynomial regression models for prediction of stress state in structural elements. Global Journal of Pure and Applied Mathematics. 12, 3187-3199.

Saptawati, G. A. P., & Nata, G. N. M. (2015). Knowledge discovery on drilling data to predict potential gold deposit. International Conference on Data and Software Engineering (ICoDSE), (143-147). IEEE. DOI: https://doi.org/10.1109/ICODSE.2015.7436987

Yang, X., Mao, S., Gao, H., Duan, Y., & Zou, Q. (2019). Novel Financial Capital Flow Forecast Framework Using Time Series Theory and Deep Learning: A Case Study Analysis of Yu’e Bao Transaction Data. IEEE Access, 7, 70662–70672.

Yıldırım, D.C., Toroslu, I.H. & Fiore, U. (2021). Forecasting directional movement of Forex data using LSTM with technical and macroeconomic indicators. Financ Innov 7.1-36.

Zhao. F., & Li, Q. (2005). A plane regression-based sequence forecast algorithm for stream data. International Conference on Machine Learning and Cybernetics(ICMLC), (1559-1562). IEEE.

How to Cite
Sagar, P., Gupta, P., & Tanwar, R. (2021). A novel prediction algorithm for multivariate data sets. Decision Making: Applications in Management and Engineering, 4(2), 225-240. https://doi.org/10.31181/dmame210402215s