Name
and Website:
Fan Zhang
http://www.mit.edu/~f_zhang
http://space.mit.edu/people/zhang-fan
Time: 14:00, Oct. 8
Venue: Room 414, SEIEE-3
Title:
Characterization
of MapReduce Applications on Private and Public Cloud Platforms
Abstract:
The
MapReduce programming model is a widely accepted solution to address the rapid
growth of big-data processing demands. Various MapReduce applications with a
very large volume of input data can run on an elastic compute cloud composed of
many distributed computing instances. A public cloud provider, such as Amazon
EC2, offers a spectrum of cloud resources with varying costs. Cloud users
typically rent these elastic cloud resources as virtual machines (VMs) in a
pay-as-you-go model to have access to large scale cloud resources. However,
different applications scale differently based on their type, behavior and
effective use of resources available.
In this
work, we attempt to characterize how MapReduce performance is affected by
increased compute resources for a variety of application types. These
applications span across data- and compute-intensive benchmarks. Our major
findings are as follows: (1) The execution times of MapReduce applications
follow a power-law distribution, (2) For map-intensive applications, the
power-law scalability starts from a small cluster size, and (3) For
reduce-intensive applications, the power-law scalability starts from a lager
cluster size.
Our research has also developed an in-depth understanding of MapReduce
application performance and analyzed the impact of scaling input datasets.
While we might expect that "embarrassingly parallel" MapReduce jobs
should scale linearly with input dataset size, our results show that execution
time sometimes increases nonlinearly. These results show that our
execution-time analysis distinguishes four typical application behaviors when
scaling input datasets.
Our characterization work will aid users in choosing appropriate
computing resources, both virtual and physical, from small-scale experimental
test runs. These approaches will predict performance speedups or slowdowns for
MapReduce applications when scaling the infrastructure or the input datasets.
Photo
and Bio:
Dr. Fan Zhang is currently a visiting
scientist with the MIT Kavli Institute for Astrophysics and Space Research (MKI),
jointly appointed by IBM Watson System Group as a Senior Software Engineer. He has
also been appointed as a visiting associate professor in Shenzhen Institute of
advanced technology, Chinese Academy of Science since Jan 2014. He received his Ph.D. in Department of
Control Science and Engineering, Tsinghua University in Jan 2012. From 2013 to
2014, he was a postdoctoral associate with MKI. From 2011 to 2013 he was a
research scientist at Cloud Computing Laboratory, Carnegie Mellon University.
An
IEEE Senior Member, he received an Education Faculty Award from Amazon Web
Service (2014), an Honorarium Research Funding Award from the University of
Chicago and Argonne National Laboratory (2013), a Meritorious Service Award
(2013) from IEEE Transactions on Service Computing, two IBM Ph.D. Fellowship
Awards (2010 and 2011). His research interests include Gravitational-wave big-data
analysis, simulation-based optimization, cloud computing, and novel programming
models for streaming data applications on elastic cloud platforms.