学术报告:Characterization of MapReduce Applications on Private and Public Cloud Platforms

发布时间:2014-10-08

Name and Website:

Fan Zhang

http://www.mit.edu/~f_zhang

http://space.mit.edu/people/zhang-fan


Time: 14:00, Oct. 8

Venue: Room 414, SEIEE-3

 

Title:

Characterization of MapReduce Applications on Private and Public Cloud Platforms

Abstract:

The MapReduce programming model is a widely accepted solution to address the rapid growth of big-data processing demands. Various MapReduce applications with a very large volume of input data can run on an elastic compute cloud composed of many distributed computing instances. A public cloud provider, such as Amazon EC2, offers a spectrum of cloud resources with varying costs. Cloud users typically rent these elastic cloud resources as virtual machines (VMs) in a pay-as-you-go model to have access to large scale cloud resources. However, different applications scale differently based on their type, behavior and effective use of resources available.

 

In this work, we attempt to characterize how MapReduce performance is affected by increased compute resources for a variety of application types. These applications span across data- and compute-intensive benchmarks. Our major findings are as follows: (1) The execution times of MapReduce applications follow a power-law distribution, (2) For map-intensive applications, the power-law scalability starts from a small cluster size, and (3) For reduce-intensive applications, the power-law scalability starts from a lager cluster size.

 

Our research has also developed an in-depth understanding of MapReduce application performance and analyzed the impact of scaling input datasets. While we might expect that "embarrassingly parallel" MapReduce jobs should scale linearly with input dataset size, our results show that execution time sometimes increases nonlinearly. These results show that our execution-time analysis distinguishes four typical application behaviors when scaling input datasets.

 

Our characterization work will aid users in choosing appropriate computing resources, both virtual and physical, from small-scale experimental test runs. These approaches will predict performance speedups or slowdowns for MapReduce applications when scaling the infrastructure or the input datasets.

 

Photo and Bio:

 

 

Dr. Fan Zhang is currently a visiting scientist with the MIT Kavli Institute for Astrophysics and Space Research (MKI), jointly appointed by IBM Watson System Group as a Senior Software Engineer. He has also been appointed as a visiting associate professor in Shenzhen Institute of advanced technology, Chinese Academy of Science since Jan 2014.  He received his Ph.D. in Department of Control Science and Engineering, Tsinghua University in Jan 2012. From 2013 to 2014, he was a postdoctoral associate with MKI. From 2011 to 2013 he was a research scientist at Cloud Computing Laboratory, Carnegie Mellon University.

 

An IEEE Senior Member, he received an Education Faculty Award from Amazon Web Service (2014), an Honorarium Research Funding Award from the University of Chicago and Argonne National Laboratory (2013), a Meritorious Service Award (2013) from IEEE Transactions on Service Computing, two IBM Ph.D. Fellowship Awards (2010 and 2011). His research interests include Gravitational-wave big-data analysis, simulation-based optimization, cloud computing, and novel programming models for streaming data applications on elastic cloud platforms.

联系我们 webmaster@cs.sjtu.edu.cn

上海交通大学计算机科学与工程系版权所有 @ 2013