Journal of Applied Science and Engineering

Published by Tamkang University Press


Impact Factor



Shenghui Zhao1,2, Haibao Chen  1,2, Ruibin Zhao1,2, Yuyan Zhao1,2 and Guilin Chen1,2

1School of Computer and Information Engineering, Chuzhou University, Chuzhou 239000, P.R. China
2Anhui Center for Collaborative Innovation in Geographical Information Integration and Application, Chuzhou 239000, P.R. China


Received: November 27, 2015
Accepted: May 5, 2016
Publication Date: December 1, 2016

Download Citation: ||  


In order to guarantee the cloud service quality, the service should be able to dynamically predict the change of data processing request. Existing prediction methods in cloud are mostly focused on the amount of computing resource required by service. In fact, in cloud computing environment for big data processing, it is not enough to simply predict the computing resource, because when the created virtual machine is far from the data, it will need a certain time to transfer data to the virtual machine for processing. To solve this problem, in this paper, we propose a data-centered prediction method using Bayes classifier, which can make prediction for data type or location based on the data resources needed by the service request. We carry out experiments with Google cluster trace, and the experimental results show that our method performs better than the existing methods. For example, our method improves the load prediction accuracy by 4560% compared to other state-of-the-art methods based on final state-based method, simple moving average method, linear weighted moving average method, exponential moving average method, and prior probability-based method.

Keywords: Big Data, Bayes Classifier, Data-centered Prediction Method, Google Cluster Trace


  1. [1] Shen, Z., Subbiah, S., Gu, X., et al., “Cloudscale: Elastic Resource Scaling for Multi-tenant Cloud Systems,” Proc. of the 2nd ACM Symposium on Cloud Computing, Cascais, Portugal, pp. 5:15:14 (2011).
  2. [2] Information on rscloud/
  3. [3] Ritov, Y., Bickel, P. J., Gamst, A. C., et al., “The Bayesian Analysis of Complex, High-Dimensional Models: Can It Be CODA?” Statistical Science, Vol. 29, No. 4, pp. 619639 (2014). doi: 10.1214/14-STS483
  4. [4] Di, S., Kondo, D. and Cirne, W., “Google Host Load Prediction Based on Bayesian Model with Optimized Feature Combination,” Journal of Parallel and Distributed Computing, Vol. 74, No. 1, pp. 18201832 (2014). doi: 10.1016/j.jpdc.2013.10.001
  5. [5] Jules, O., Hafid, A. and Serhani, M. A., “Bayesian Network and Probabilistic Ontology Driven Trust Model for SLA Management of Cloud Services,” Proc. of 2014 IEEE 3rd International Conference on Cloud Networking (CloudNet), Luxembourg, pp. 7783 (2014). doi: 10.1109/CloudNet.2014.6968972
  6. [6] Wu, X., Zhu, X., Wu, G. Q., et al., “Data Mining with Big Data,” IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 1, pp. 97107 (2014).
  7. [7] Sharma, B., Chudnovsky, V., Hellerstein, J. L., Rifaat, R. and Das, C. R., “Modeling and Synthesizing Task Placement Constraints in Google Compute Clusters,” Proc. of the 2nd ACM Symposium on Cloud Computing, Cascais, Portugal, pp. 3:13:14 (2011).
  8. [8] Reiss, C., Tumanov, A., Ganger, G. R., et al., “Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis,” Proc. of the Third ACM Symposium on Cloud Computing. San Jose, CA, USA, pp. 7:1 7:14 (2012).
  9. [9] Khan, A., Yan, X., Tao, S., et al., “Workload Characterization and Prediction in the Cloud: A Multiple Time Series Approach,” Proc. of 2012 IEEE Network Operations and Management Symposium (NOMS), Maui, Havaii, USA, pp. 12871294 (2012). doi: 10. 1109/NOMS.2012.6212065
  10. [10] Barnes, B. J., Rountree, B., Lowenthal, D. K., et al., “A Regression-based Approach to Scalability Prediction,” Proc. of the 22nd Annual International Conference on Supercomputing, Island of Kos, Greece, pp. 368377 (2008). doi: 10.1145/1375527.1375580
  11. [11] Roy, N., Dubey, A. and Gokhale, A., “Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting,” Proc. of the 2011 IEEE 4th International Conference on Cloud Computing, Washington, DC, USA, pp. 500–507 (2011). doi: 10.1109/CL OUD.2011.42
  12. [12] Saripalli, P., Kiran, G. V. R., Shankar, R. R., et al., “Load Prediction and Hot Spot Detection Models for Autonomic Cloud Computing,” Proc. of 2011 Fourth IEEE International Conference on Utility and Cloud Computing, Melbourne, Australia, pp. 397402 (2011). doi: 10.1109/UCC.2011.66
  13. [13] Gong, Z., Gu, X. and Wilkes, J., “PRESS: PRedictive Elastic ReSource Scaling for Cloud Systems,” Proc. of the 2010 International Conference on Network and Service Management, Niagara Falls, Canada, pp. 916 (2010). doi: 10.1109/CNSM.2010.5691343
  14. [14] Carrington, L., Snavely, A. and Wolter, N., “A Performance Prediction Framework for Scientific Applications,” Future Generation Computer Systems, Vol. 22, No. 3, pp. 336346 (2006). doi: 10.1016/j.future.2004. 11.019
  15. [15] Khazaei, H., Miši, J. and Miši, V. B., “Performance Analysis of Cloud Computing Centers Using m/g/m/ m+ r Queuing Systems,” IEEE Transactions on Parallel and Distributed Systems, Vol. 23, No. 5, pp. 936 943 (2012). doi: 10.1109/TPDS.2011.199
  16. [16] Yang, Q., Peng, C., Zhao, H., et al., “A New Method Based on PSR and EA-GMDH for Host Load Prediction in Cloud Computing System,” The Journal of Supercomputing, Vol. 68, No. 3, pp. 14021417 (2014). doi: 10.1007/s11227-014-1097-x
  17. [17] Gu, Z., Chang, C., He, L., et al., “Developing a Pattern Discovery Model for Host Load Data,” Proc. of 2014 IEEE 17th International Conference on Computational Science and Engineering, Chengdu, China, pp. 265 271 (2014). doi: 10.1109/CSE.2014.78
  18. [18] Gmach, D., Rolia, J., Cherkasova, L. and Kemper, A., “Capacity Management and Demand Prediction for Next Generation Data Centers,” Proc. of International Conference on Web Services, Salt Lake City, Utah, USA, pp. 18 (2007). doi: 10.1109/ICWS.2007.62
  19. [19] Govindan, S., Choi, J., Urgaonkar, B., et al., “Statistical Profiling-based Techniques for Effective Power Provisioning in Data Centers,” Proc. of the 4th ACM European Conference on Computer Systems, Nuremberg, Germany, pp. 317330 (2011). doi: 10.1145/15 19065.1519099
  20. [20] Wood, T., Cherkasova, L., Ozonat, K., et al., “Profiling and Modeling Resource Usage of Virtualized Applications,” Proc. of the 9th ACM/IFIP/USENIX International Conference on Middleware, Leuven, Belgium, pp. 366387 (2008). doi: 10.1007/978-3-540-89856- 6_19