ISBN: 978-981-11-0008-6 DOI: 10.18178/wcse.2016.06.132
Hadoop Local Tasks Scheduling Optimization Algorithm Based on Logistic Regression Model
Abstract— For a TaskTracker has multiple local tasks available, by default, the scheduler executes those
tasks in succession with the order of the tasks to be found, this is inefficient. In order to optimize the local
tasks scheduling, this paper presented Hadoop local tasks scheduling optimization algorithm based on
Logistic regression model. First, related feature vectors of the local tasks were selected and defined, then,
based on the way of machine learning with Logistic regression model, trained these vector to get the weight
of each vector to decide the task priority, and updated the model constantly by the overload rules. The
experimental results show that the proposed algorithm improves map task data locality, at the same time of
reducing job running time.
Index Terms— Hadoop, MapReduce, local tasks scheduling, task priority , overload rules, Logistic regression
model three.
Shuai Renjun, Shen Yang, Pan Jing, Dong Yanan
School of Computer Science and Technology, Nanjing Technology University, CHINA
Chen Ping
Nanjing Health Information Center, CHINA
Cite: Shuai Renjun, Shen Yang, Chen Ping, Pan Jing, Dong Yanan, "Hadoop Local Tasks Scheduling Optimization Algorithm Based on Logistic Regression Model," Proceedings of 2016 6th International Workshop on Computer Science and Engineering, pp. 738 -742, Tokyo, 17-19 June, 2016.