Interpreting the Data: Parallel Analysis with. Sawzall. Rob Pike, Sean Dorward, Robert Griesemer,. Sean Quinlan. Google, Inc. Presented by Alexey. Interpreting the Data: Parallel Analysis with Sawzall Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan Scientific Programming Journal Special Issue. Cue Sawzall, a new language that Google use to write distributed, parallel data- processing programs for use on their clusters. While the.
|Published (Last):||20 March 2014|
|PDF File Size:||13.43 Mb|
|ePub File Size:||3.12 Mb|
|Price:||Free* [*Free Regsitration Required]|
Search the Blog
The paper is well written with lot of examples. The paper is from the organization Google which is popular for their capabilities for massive computation on Data and is about the product they are using to solve day to day problems in Google.
Sawzall is a statically typed language for processing very large amount of data on multiple machines. It generally breaks the calculation in two phases first phase analyses the sawzaall and second phase aggregates the result.
The calculation is divided into pieces and distributed, keeping computation near data. It works above Google infrastructure.
Reading Paper — Interpreting the Data: Parallel Analysis in Sawzall – Bipin Upadhyaya
Protocol Buffers are used to describe the format of permanent records stored on disk. Software called the Workqueue is handled scheduling wirh job to run on a cluster of machines. The paper gives a detailed overview of sawzall programming language with examples. The benchmark test cases are all CPU-bound cases.
Interpreting the Data: Parallel Analysis with Sawzall
However, in the paper, the authors talked about the applications for this language being mostly IO-bound. It would seem to make sense if they gave some examples that are IO-bound and still be able to show the performance advantage of Sawzall.
Sawzall is also a level of abstraction above MapReduce, but still appears to be a bit more restrictive than Pig Latin . A sawzall program has a fairly rigid structure consisting of a filtering phase the map step followed by an aggregation phase the reduce step.
It was a little bit concerning factor as with terabytes of data being processed error can easily happen.
Kamath, S Narayanam, C. You are commenting using your WordPress. You are commenting using your Snalysis account. You are commenting using your Facebook account. Notify me of new comments via email.