Designing a Reading Material Recommendation System for EFL Learners

Chin-Hwa  Kuo; Chen-Chung  Chi

doi:10.6180/jase.2014.17.4.04

Designing a Reading Material Recommendation System for EFL Learners

Computer Science and Information Engineering

Document preprocessing, feature extraction and classifier accuracy estimation procedure.

Chin-Hwa Kuo ¹ and Chen-Chung Chi¹

¹CSIE, Tamkang University, Tamsui, Taiwan 251, R.O.C.

Received: May 14, 2014
Accepted: October 24, 2014
Publication Date: December 1, 2014

Download Citation: ||https://doi.org/10.6180/jase.2014.17.4.04

ABSTRACT

For numerous people who are English as a foreign language (EFL) learners, reading English articles is an effective activity for improving reading comprehension. In this research, an article recommendation system that identifies articles of suitable difficulty levels for EFL learners was designed. The system design was based on the vocabulary sets of the General English Proficiency Test (GEPT). Using text mining and classifying techniques, the system compares the difficulty levels of articles found on news Web sites and in textbooks, as well as articles written by EFL high school students. In this study, language learners’ current language proficiency levels were assessed to create the learning environment introduced in Krashen’s second language acquisition theory. The document classification verification results indicated that the reading material recommendation system (which is based on GEPT vocabulary sets as the foundation of article feature extraction) can effectively classify the difficulty levels of vocabularies contained in articles of various difficulty levels. Additionally, articles that complied with learners’ language levels based on the evaluation results were used as the reading materials for learning purposes.

Keywords: Article Recommendation System, Cosine Similarity, Document Readability, Second Language Acquisition

REFERENCES

[1] Nakamura, J. and Csikszentmihalyi, M., “The Concept of Flow,” Handbook of Positive Psychology, pp. 89 105 (2002). doi: 10.1007/978-94-017-9088-8_16
[2] Csikszentmihalyi, M., Finding Flow, New York: Basic (1997).
[3] Csikszentmihalyi, M., Beyond Boredom and Anxiety, Jossey-Bass Publishers, pp. 102000, Original work published (1975).
[4] Krashen, S. D., Principles and Practice in Second Language Acquisition, New York: Phoenix ELT, pp. 2030 (1995).
[5] Danielson, W. A. and Bryan, S. D., “Computer Automation of Two Readability Formulas,” Journalism Quarterly, pp. 201206 (1963). doi: 10.1177/1077 69906304000207
[6] Chall, J. S. and Dale, E., Readability Revisited: The New ale-Chall Readability Formula, Cambridge, MA: Brookline Books (1995).
[7] Chall, J. S., Readability: An Appraisal of Research and Publication, Bureau of Educational Research Monographs, Columbus: Ohio State University Press, Epping, England: Bowker (1958).
[8] Klare, G. R., The Measurement of Readability, Ames: Iowa State University Press (1963). doi: 10.1177/ 002194366400100207
[9] Williams, C. B., “A Note on the Statistical Analysis of Sentence Length as a Criterion of Literary Style,” Biometrika Trust, Vol. 31, No. 3, pp. 356361 (1940). doi: 10.2307/2332615
[10] Zhang, L., Liu, Z. and Ni, J., “Feature-Based Assessment of Text Readability,” Seventh International Conference on Internet Computing for Engineering and Science, pp. 5154 (2013). doi: 10.1109/ICICSE.2013.18
[11] Thompson, K. C. and Callan, J., “Predicting Reading Difficulty with Statistical Language Models,” Journal of the American Society For Information Science and Technology, Vol. 56, pp. 14481462 (2005). doi: 10.1002/asi.20243
[12] Petersen, S. E. and Ostendorf, M., “A Machine Learning Approach to Reading Level Assessment,” Computer Speech and Language, Vol. 23, pp. 89106 (2009). doi: 10.1016/j.csl.2008.04.003
[13] Heilman, M., Zhao, L., Pinoand, J. and Eskenazi, M., “Retrieval of Reading Materials for Vocabulary and Reading Practice,” Proceedings of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications, pp. 8088 (2008). doi: 10. 3115/1631836.1631846
[14] Miltsakaki, E., “Matching Readers Preferences and Reading Skills with Appropriate Web Texts,” Proceedings of the European Chapter of the Association for Computational Linguistics (EACL) Demonstrations Session, pp. 4952, Athens, Greece (2009). doi: 10. 3115/1609049.1609062
[15] Martin, L. and Gottron, T., “Readability and the Web,” Future Internet, Vol. 4, No. 1, pp. 238252 (2012). doi: 10.3390/fi4010238
[16] Kincaid, J. P., Fishburne, Jr., R. P., Rogers, R. L. and Chissom, B. S., “Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel,” Research Branch Report, pp. 875, Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN (1975).
[17] McLaughlin, G. H., “SMOG Grading - A New Readability Formula,” Journal of Reading, Vol. 12, No. 8, pp. 639646 (1969).
[18] Gunning, R., The Technique of Clear Writing, New York, NY: McGraw-Hill International Book Co. (1952).
[19] Pitler, E. and Nenkova, A., “Revisiting Readability: A Unified Framework for Predicting Text Quality,” Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 186195, Honolulu (2008). doi: 10.3115/1613715.1613742
[20] Sanmin, available from http://www.sanmin.com.tw/ page-history.asp, retrieved (2011).
[21] IWiLL, available from http://cube.iwillnow.org, retrieved (2012).
[22] GEPT, General English Proficiency Test, available from http://www.gept.org.tw, retrieved (2011).
[23] LTTC, available from http://www.lttc.ntu.edu.tw, retrieved (2012).
[24] CNN, available from http://edition.cnn.com/, retrieved (2011).
[25] The China Post, available from http://chinapost.com. tw/, retrieved (2011).
[26] BBC, available from http://www.bbc.co.uk/, retrieved (2011).
[27] Bird, S., Klein, E., Loper, E. and Baldridge, J., “Multidisciplinary Instruction with the Natural Language Toolkit,” Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 6270 (2008). doi: 10.3115/1627306.1627317
[28] Orange, Analyze Process through Visual Programming, available from http://orange.biolab.si/features. html, retrieved (2013).
[29] Liu, B., “Information Retrieval and Web Search, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer (2007). doi: 10.5860/CHOICE.49-2718
[30] Ito, K., Encyclopedic Dictionary of Mathematics, 2nd edition, p. 82, p. 113, p. 144, p. 145, MIT Press, ISBN 978-0-262-59020-4 (1993).
[31] Apostol, T., Calculus, Multi-Variable Calculus and Linear Algebra with Applications, Vol. 2, John Wiley and Sons, ISBN 9780471000075 (1969).
[32] Kohavi, R., “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, Vol. 2, No. 12, pp. 11371143, (Morgan Kaufmann, San Mateo, CA) (1995).
[33] Konchady, M., Text Mining Application Programming, Boston, Mass: Charles River Media (2006).
[34] Senter, R. J. and Smith, E. A., Automated Readability Index, Wright Patterson Air Force Base, P. iii, AMRLTR-6620 (1967).
[35] Coleman, M. and Liau, T. L., “AComputer Readability Formula Designed for Machine Scoring,” Journal of Applied Psychology, Vol. 60, pp. 283284 (1975). doi: 10.1037/h0076540
[36] Kincaid, J. P., Braby, R. and Mears, J., “Electronic Authoring and Delivery of Technical Information,” Journal of Instructional Development, Vol. 11, No. 813 (1988). doi: 10.1007/BF02904998