大数据管理:概念、技术与挑战

孟小峰 已出版文章查询
孟小峰
本平台内已出版文章查询
1 慈祥 已出版文章查询
慈祥
本平台内已出版文章查询
1

+ 作者地址

1中国人民大学信息学院 北京 100872


0
  • 摘要
  • 参考文献
  • 相关文章
  • 统计
云计算、物联网、社交网络等新兴服务促使人类社会的数据种类和规模正以前所未有的速度增长,大数据时代正式到来.数据从简单的处理对象开始转变为一种基础性资源,如何更好地管理和利用大数据已经成为普遍关注的话题.大数据的规模效应给数据存储、管理以及数据分析带来了极大的挑战,数据管理方式上的变革正在酝酿和发生.对大数据的基本概念进行剖析,并对大数据的主要应用作简单对比.在此基础上,阐述大数据处理的基本框架,并就云计算技术对于大数据时代数据管理所产生的作用进行分析.最后归纳总结大数据时代所面临的新挑战.

[1] Nature .Big Data[EB/OL].http://www.nature.com/news/specials/bigdata/index.html,2012-10-02.

[2] Bryant R E;Katz R H;Lazowska E D .Big-Data computing: Creating revolutionary breakthroughs in commerce,science,and society[R].http://www.era.org/ccc/docs/init/Big_Data.pdf,2012-10-02.

[3] Science .Special online collection: Dealing with data[EB/OL].http://www.sciencemag.org/site/special/data/,2012-10-02.

[4] Agrawal D;Bernstein P;Bertino E et al.Challenges and opportunities with big data-A community white paper developed by leading researchers across the United States[OL].http://cra.org/ccc/docs/init/bigdata whitepaper.pdf,2012-10-02.

[5] Manyika J;Chui M;Brown B et al.Big data:The next frontier for innovation, competition, and productivity[OL].http://www.mckinsey.com/Insights/MGI/Research/Technology _ and _ Innovation/Big _data_The_next frontier for innovation,2012-10-02.

[6] World Economic Forum .Big data, big impact:New possibilities for international development[OL].http://www3.weforum.org/docs/WEF_ TC_ MFS_BigDataBigImpact_Briefing_2012.pdf,2012-10-02.

[7] Big Data Across the Federal Government[EB/OL].http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_fact_sheet_final_1.pdf,2012-10-02.

[8] UN Global Pulse .Big Data for Development:Challenges & Opportunities[OL].http://www.unglobalpulse.org/project s/BigDataforDevelopment,2012-10-02.

[9] TimesN Y .The age of big data[EB/OL].http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?pagewanted =all,2012-10-02.

[10] Grobelnik M .Big data computing:Creating revolutionary breakthroughs in commerce,science,and society[OL].http://videolectures.net/eswc2012_grobelnik_big_data/,2012-10-02.

[11] Barwick H .The "four Vs" of Big Data. Implementing Information Infrastructure Symposium[EB/OL].http://www.computerworld.com.au/article/396198/iiis_ four_vs_big_dat a/,2012-10-02.

[12] IBM .What is big data[EB/OL].http://www-01.ibm.com/software/data/bigdata/,2012-10-02.

[13] Big data[EB/OL].http://en.wikipedia.org/wiki/Big_data,2012-10-02.

[14] Hey T;Tansley S;Tolle K.The Fourth Paradigm:Dataintensive Scientific Discovery[M].Microsoft Research,Redmond,Washington,2009

[15] Computational Social Science[J].Science,2009(Feb.6 TN.5915):721.

[16] Watts D J .A twenty-first century science[J].Nature,2007,445(7127):489.

[17] The Economist .Data,data,everywhere A special report on managing information[EB/OL].http://www.economist.com/node/15557443,2012-10-02.

[18] Kumar R .Two computational paradigm for big data[EB/OL].http://kdd2012.sigkdd.Org/sites/images/summerschool/Ravi-Kumar.pdf,2012-10-02.

[19] InformationWeek Report .The big data management challenge[OL].http://reports.information week.com/abstract/81/8766/business-intelligence-and-informationmanagement/ research-the-big-data-management-challenge.html,2012-10-02.

[20] Storm[EB/OL].http://github.com/nathanmarz/storm,2012-10-02.

[21] Neumeyer L;Robbins B;Nair A.S4:Distributed Stream Computing Platform[A].Piscataway,NJ:IEEE,2010:170-177.

[22] Goodhope K;Koshy J;Kreps J et al.Building Linkedln's Real time Activity Data Pipeline[J].Data Engineering,2012,35(02):33-45.

[23] Dean J;Ghemawat S.MapReduce: Simplified data processing on large clusters[A].Berkeley,CA:USENIX Association,2004:137-150.

[24] Das S.Data Infrastructure at LinkedIn[A].http://www-conf.slac.stanford.edu/xldb2011/talks/xldb2011 tue 1005Linkedln.pdf,2011

[25] ScholarSpace[EB/OL].http://www.cdblp.cn/,2012-10-02.

[26] Haas L.Integrating Extremely Large Data is Extremely Challenging[A].http://idke.ruc.edu.cn/xldb/www.xldb asia.org/program.html

[27] Rajaraman A;Jeff Ullman .Mining of Massive Datasets[OL].http://i.stanford.edu/ullman/mmds.html,2012-10-02.

[28] Chapman A;Allen M D;Blaustein B.It's About the Data:Provenance as a Tool for Assessing Data Fitness[A].Berkeley,CA:USENIX Association,2012

[29] Hadoop[EB/OL].http://hadoop.apache.org/index.html,2012-10-02.

[30] Sanjay Ghemawat;Howard Gobioff;Shun-Tak Leung .The Google File System[J].Operating systems review,2003(5):29-43.

[31] GFS: Evolution on Fast-Forward[J].Communications of the ACM,2010(3):42,44.

[32] Chaiken R;Jenkins B;Larson P-A et al.SCOPE:Easy and efficient parallel processing of massive data sets[J].PVLDB,2008,1(02):1265-1276.

[33] HDFS Architecture Guide[EB/OL].http://hadoop.apache.org/docs/hdfs/r0.22.0/hdfs design.html,2012-10-02.

[34] CloudStore[EB/OL].http://code.google.com/p/kosmosfs/,2012-10-02.

[35] Beaver D;Kumar S;Li H C.Finding a Needle in Haystack:Facebook's Photo Storage[A].Berkeley,CA:USENIX Association,2010:47-60.

[36] TFS[EB/OL].http://code.taobao.org/p/tfs/wiki/index/,2012-10-02.

[37] FastDFS[EB/OL].http://code.google.com/p/fastdfs/w/list,2012-10-02.

[38] Brewer E A.Towards robust distributed systems (Invited Talk)[A].New York:ACM,2000

[39] Chang F;Dean J;Ghemawat S.Bigtable: A distributed storage system for structured data[A].Berkeley,CA:USENIX Association,2006:205-218.

[40] Giuseppe DeCandia;Deniz Hastorun;Madan Jampani;Gunavardhan Kakulapati;Avinash Lakshman;Alex Pilchin;Swaminathan Sivasubramanian;Peter Vosshall;Werner Vogels .Dynamo: Amazon's Highly Available Key-value Store[J].Operating systems review,2007(6):205-220.

[41] Cooper B F;Ramakrishnan R;Srivastava U et al.PNUTS:Yahoo! 's hosted data serving platform[J].PVLDB,2008,1(02):1277-1288.

[42] NOSQL Databases[EB/OL].http://nosql-database.org/,2012-10-02.

[43] Strauch C .NoSQL Databases[EB/OL].http://www.christof-strauch.de/nosqldbs.pdf,2012-10-02.

[44] Baker J;Bond C;Corbett J.Megastore:Providing Scalable,Highly Available Storage for Interactive Services[A].,2011:223-234.

[45] Corbett J C;Dean J;Epstein M.Spanner:Google's globally-distributed database[A].Berkeley,CA:USENIX Association,2012

[46] Shute J;Oancea M;Ellner S.F1:The fault-tolerant distributed RDBMS supporting google's ad business[A].New York:ACM,2012:777-778.

[47] Peng D;Dabek F.Large-scale incremental processing using distributed transactions and notifications[A].Berkeley,CA:USENIX Association,2010:1-15.

[48] Iyer S C;Utts M .Help test some next-generation infrastructure[EB/OL].http://googleweb mastercentral.blogspot.com/2009/08/help-test-some-next-generation.html,2012-10-02.

[49] Wang Haixun .KDD summer school,2012.Managing and Mining Billion-Node Graphs[EB/OL].http://kdd2012.sigkdd.org/sites/images/summerschool/HaixunWang.pdf,2012-10-02.

[50] ITHbase[EB/OL].http://github.com/hbase-trx/hbase-transactional-tableindexed,2012-10-02.

[51] IHbase[EB/OL].http://github.com/ykulbak/ihbase,2012-10-02.

[52] Zou Yongqiang;Liu Jia;Wang Shicai.CCIndex:A complemental clustering index on distributed ordered tables for multi-dimensional range queries[A].Beilin:Springer-Verlag,2010:247-261.

[53] Agrawal P;Silberstein A;Cooper B F.Asynchronous view maintenance for VLSD databases[A].New York:ACM,2009:179-192.

[54] Wang Jinbao;Wu Sai;Gao Hong.Indexing multi dimensional data in a cloud system[A].New York:ACM,2010:591-602.

[55] Ding Linlin;Qiao Baiyou;Wang Guoren.An efficient quad-tree based index structure for cloud data management[A].Beilin:Springer-Verlag,2011:238-250.

[56] Zhang Xiangyu;Ai Jing;Wang Zhongyuan.An efficient multi-dimensional index for cloud data management[A].New York:ACM,2009:17-24.

[57] Papadopoulos A;Katsaros D.A-Tree:Distributed indexing of multidimensional data for cloud computing environments[A].Piscataway,NJ:IEEE,2011:407-414.

[58] Nishimura S;Das S;Agrawal D.MDHBase:A scalable multi-dimensional data infrastructure for location aware services[A].Piscataway,NJ:IEEE,2011:7-16.

[59] Ma Youzhong;Rao Jia;Hu Weisong.An efficient index for massive IOT data in cloud environment[A].New York:ACM,2012

[60] Malewicz G;Austern M H;Bik A J C.Pregel:A system for large-scale graph processing[A].New York:ACM,2010:135-146.

[61] Leslie G .Valiant:A bridging model for parallel computation[J].Communications of the ACM,1990,33(08):103-111.

[62] Melnik S;Gubarev A;Long Jingjing et al.Dremel:Interactive analysis of web-scale datasets[J].PVLDB,2010,3(01):330-339.

[63] Google BigQuery[EB/OL].http://cloud.google.com/products/big-query.html,2012-10-02.

[64] Hall A;Bachmann O;Büssow R et al.Processing a trillion cells per mouse click[J].PVLDB,2012,5(11):1436-1446.

[65] Michael Isard;Mihai Budiu;Yuan Yu;Andrew Birrell;Dennis Fetterly .Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks[J].Operating systems review,2007(3):59-72.

[66] Cascading[EB/OL].http://www.cascading.org/,2012-10-02.

[67] BY YUNHONG GU;ROBERT L. GROSSMAN .Sector and Sphere: the design and implementation of a high-performance data cloud[J].Philosophical transactions of the Royal Society. Mathematical, physical, and engineering sciences,2009(1897):2429-2445.

[68] Battré D;Ewen S;Hueske F.Nephele/PACTs:A programming model and execution framework for web-scale analytical processing[A].New York:ACM,2010:119-130.

[69] Gunda P K;Ravindranath L;Thekkath C A.Nectar:Automatic management of data and computation in datacenters[A].Berkeley,CA:USENIX Association,2010:75-88.

[70] Popa L;Budiu M.DryadInc:Reusing work in large scale computations[A].Berkeley,CA:USENIX Association,2009

[71] Bhatotia P;Wieder A;Rodrigues R.Incoop:MapReduce for incremental computations[A].New York:ACM,2011

[72] Yan Cairong;Yang Xin;Yu Ze.IncMR:Incremental data processing based on MapReduce[A].Piscataway,NJ:IEEE,2012:534-541.

[73] Olston C;Chiou G;Chitnis L.Nova:Continuous Pig/Hadoop workflows[A].New York:ACM,2011:1081-1090.

[74] Condie T;Conway N;Alvaro P.MapReduce Online[A].Berkeley,CA:USENIX Association,2010:313-328.

[75] Shi Yingjie;Meng Xiaofeng;Wang Fusheng.You can stop early with COLA: Online processing of aggregate queries in the cloud[A].New York:ACM,2012

[76] Logothetis D;Trezzo C;Webb K.In-situ mapreduce for log processing[A].Berkeley,CA:USENIX Association,2011

[77] Trezzo C J .Continuous mapreduce:An architecture for largc-scale in-situ data processing[D].USA:University of California,San Diego,2010.

[78] Bu Yingyi;Howe B;Balazinska M et al.HaLoop:Efficient iterative data processing on large clusters[J].PVLDB,2010,3(01):285-296.

[79] Ekanayake J;Li Hui;Zhang Bingjing.Twister:A runtime for iterative MapReduce[A].New York:ACM,2010:810-818.

[80] Zhang Y;Gao Q;Gao L.iMapReduce:A distributed computing framework for iterative computation[A].Piscataway,NJ:IEEE,2011:1112-1121.

[81] Elnikety E;Elsayed E;Ramadan H E.iHadoop:Asynchronous iterations for mapreduce[A].Piscataway,NJ:IEEE,2011:81-90.

[82] Zhang Yanfeng;Gao Qixin;Gao Lixin.Prlter:A distributed framework for prioritized iterative computations[A].New York:ACM,2011:1-14.

[83] Zaharia M;Chowdhury M;Franklin M.Spark:Cluster computing with working sets[A].Berkeley,CA:USENIX Association,2010

[84] Borthakur D;Gray J;Sarma J S.Apache hadoop goes realtime at Facebook[A].New York:ACM,2011:1071-1080.

[85] Mazur E;Li Boduo;Diao Yanlei.Towards scalable one-pass analytics using MapReduce[A].Piscataway,NJ:IEEE,2011:1102-1111.

[86] Li Boduo;Mazur E;Diao Yanlei.A platform for scalable one-pass analytics using MapReduce[A].New York:ACM,2011:985-996.

[87] Kumar V;Andrade H;Gedik B.DEDUCE:At the intersection of MapReduce and stream processing[A].New York:ACM,2010:657-662.

[88] Backman N;Pattabiraman K;Fonseca R.C-MR:Continuously executing MapReduce workflows on multi-core processors[A].New York:ACM,2012:1-8.

[89] Aly A M;SallamA;Gnanasekaran B M.M3:Stream processing on main-memory MapReduce[A].Piscataway,NJ:IEEE,2012:1253-1256.

[90] Brito A;Martin A;Knauth T.Scalable and lowlatency data processing with stream MapReduce[A].Piscataway,NJ:IEEE,2011:48-58.

[91] Wang Lam;Liu Lu;Prasad S et al.Muppet:MapReduce style processing of fast data[J].PVLDB,2012,5(12):1814-1825.

[92] Ogawa H;Nakada H;Takano R.SSS: An implementation of key-value store based MapReduce framework[A].Piscataway,NJ:IEEE,2010:754-761.

[93] Zaharia M;Das T;Li H.Discretized streams:An efficient and fault-tolerant model for stream processing on large clusters[A].Berkeley,CA:USENIX Association,2012

[94] Seo S;Jang I;Woo K.HPMR:Prefetching and preshuffling in shared MapReduce computation environment[A].Piscataway,NJ:IEEE,2009:1-8.

[95] Dittrich J;Quiané-Ruiz J-A;Jindal A et al.Hadoop+ +:Making a yellow elephant run like a cheetah (without it even noticing)[J].PVLDB,2010,3(01):518-529.

[96] Babu S.Towards automatic optimization of MapReduce programs[A].New York:ACM,2010:137-142.

[97] Shafer J;Rixner S;Cox A L.The Hadoop distributed filesystem: Balancing portability and performance[A].Piscataway,NJ:IEEE,2010:122-121.

[98] Lu Wei;Shen Yanyan;Chen Su et al.Efficient processing of k nearest neighbor joins using MapReduce[J].PVLDB,2012,5(10):1016-1027.

[99] Zhang Xiaofei;Chen Lei;Wang Min .Efficient muti-way theta join processing using MapReduce[J].PVLDB,2012,5(11):1184-1195.

[100] Pans are N;Borkar V R;Jermaine C et al.Online aggregation for large MapReduce jobs[J].PVLDB,2011,4(11):1135-1145.

[101] Silva Y N;Reed J M.Exploiting MapReducebased similarity joins[A].New York:ACM,2012:693-696.

[102] Okcan I;Riedewald M.Processing theta-joins using MapReduce[A].New York:ACM,2011:949-960.

[103] Afrati F N;Das S A;Menestrina D.Fuzzy joins using MapReduce[A].Piscataway,NJ:IEEE,2012:498-509.

[104] Gudmundsson G P;Amsaleg L;Jónsson B P.Distributed high-dimensional index creation using Hadoop,HDFS and C++[A].Piscataway,NJ:IEEE,2012:1-6.

[105] Liao Haojun;Han Jizhong;Fang Jinyun.Multi-dimensional index on Hadoop distributed file system[A].Piscataway,NJ:IEEE,2010:240-249.

[106] Thusoo A;Sarma J S;Jain N.Hive-A petabyte scale data warehouse using Hadoop[A].Piscataway,NJ:IEEE,2010:996-1005.

[107] Abouzied A;Bajda-Pawlikowski K;Huang Jiewen.HadoopDB in action:Building real world applications[A].New York:ACM,2010:1111-1114.

[108] Chen Songting .Cheetah:A high performance,custom data warehouse on top of MapReduce[J].PVLDB,2010,3(02):1459-1468.

[109] Su Xueyuan;Swart G.Oracle in-database hadoop:When mapreduce meets RDBMS[A].New York:ACM,2012:779-790.

[110] (O)zcan F;Hoa D;Beyer K S.Emerging trends in the enterprise data analytics: Connccting Hadoop and DB2 warehouse[A].New York:ACM,2011:1161-1164.

[111] Xu Yu;Kostamaa P;Gao Like.Integrating hadoop and parallel DBMs[A].New York:ACM,2010:969-974.

[112] Yang Yan;Ni Xianhua;Wang Hongjun.Parallel implementation of ant-based clustering algorithm based on Hadoop[A].Beilin:Springer-Verlag,2012:190-197.

[113] Yang Lai;Shi Zhongzhi;Xu L D.DH-TRIE frequent pattern mining on Hadoop using JPA[A].Piscataway,NJ:IEEE,2011:875-878.

[114] Nair S;Mehta J.Clustering with Apache Hadoop[A].New York:ACM,2011:505-509.

[115] Yang Lai;Shi Zhongzhi.An efficient data mining framework on Hadoop using Java persistence API[A].Piscataway,NJ:IEEE,2010:203-209.

[116] Chiky R;Ghisloti R;Kazi-Aoul Z.Development of a distributed recommender system using the Hadoop framework[A].,2012:495-500.

[117] Jiang Jing;Lu Jie;Zhang Guangquan.Scaling-up item-based collaborative filtering recommendation algorithm based on Hadoop[A].Piscataway,NJ:IEEE,2011:490-497.

[118] De Pessemier T;Vanhecke K;Dooms S.Contentbased recommendation algorithms on the Hadoop MapReduce framework[A].New York:ACM,2011:237-240.

[119] Perlich C;Dalessandro B;Hook R.Bid optimizing and inventory scoring in targeted online advertising[A].New York:ACM,2012:804-812.

[120] Agrawal R;Srikant R.Privacy preserving data mining[A].New York:ACM,2000:439-450.

[121] Dwork C.Differential privacy[A].Beilin:Springer-Verlag,2006:1-12.

[122] H(a)rder T;Hudlet V;Ou Y.Energy efficiency is not enough,energy proportionality is needed![A].Beilin:Springer-Verlag,2011:226-239.

[123] Times N Y .Power,Pollution and the Internet[EB/OL].http://www.nytimes.com/2012/09/23/technology/data-centers-waste-vast-amounts-of-energy-belying-industry-image.html?pagewanted =all,2012-10-02.

[124] Chen Cheng;He Bingsheng;Tang Xueyan.Green databases through integration of renewable energy[A].http://www.cidrdb.org/2013,2012

[125] Lee S-W;Moon B.Design of flash-based DBMS:An inpage logging approach[A].New York:ACM,2007:55-66.

[126] Soundararajan G;Prabhakaran V;Balakrishnan M.Extending SSD lifetimes with disk-based write caches[A].Berkeley,CA:USENIX Association,2010

[127] Yang Puyuan;Jin Peiquan;Yue Lihua.Hybrid storage with disk based write cache[A].Beilin:Springer-Verlag,2011:190-201.

[128] Koltsidas I;Viglas S D .Flashing up the storage layer[J].PVLDB,2008,1(01):514-525.

[129] Yang Qing;Ren Jin.I-CASH:Intelligently coupled array of SSDand HDD[A].Piscataway,NJ:IEEE,2011:278-289.

[130] Chen Feng;Koufaty D A;Zhang Xiaodong.Hystor:Making the best use of solid state drives in high performance storage systems[A].New York:ACM,2011:22-32.

[131] Payer H;Sanvido M A;Bandic Z Z.Combo drive:Optimizing cost and performance in a heterogeneous storage device[A].New York:ACM,2009

[132] Zhang Ning;Tatemura Junichi;Patel J M et al.Towards cost-effective storage provisioning for DBMSs[J].PVLDB,2011,5(04):274-285.

[133] Canim M;Mihaila G A;Bhattacharjee B et al.SSD bufferpool extensions for database systems[J].PVLDB,2010,3(1/2):1435-1446.

[134] Do Jaeyoung;Zhang Donghui;Patel Jignesh M.Turbocharging DBMS buffer pool using SSDs[A].New York:ACM:1113-1124.

[135] Ou Y;H(a)rder T.Improving database performance using a flash-based write cache[A].Beilin:Springer-Verlag,2012:2-13.

[136] Luo T;Lee R;Mesnier M et al.hStorage-DB:Heterogeneity-aware data management to exploit the full capability of hybrid storage systems[J].PVLDB,2012,5(10):1076-1087.

[137] Kang W-H;Lee S-W;Moon B .Flash-based extended cache for higher throughput and faster recovery[J].PVLDB,2012,5(11):1615-1626.

[138] Norman D A.The Design of Everyday Things[M].New York:Basic Books Inc,2002

[139] Babu S.Towards automatic optimization of MapReduce programs[A].New York:ACM,2010:137-142.

[140] Jahani E;Cafarella M J;Ré C .Automatic optimization for MapReduce programs[J].PVLDB,2011,4(06):385-396.

[141] Cafarella M J;Ré C.Manimal:Relational optimization for data-intensive programs[A].New York:ACM,2010

[142] Herodotou H;Lim H;Luo G.Starfish:A self-tuning system for big data analytics[A].,2011:261-272.

[143] Wong, Pak Chung;Shen, Han-Wei;Johnson, Christopher R.;Chen, Chaomei;Ross, Robert B. .The Top 10 Challenges in Extreme-Scale Visual Analytics[J].IEEE Computer Graphics and Applications,2012(4):63-67.

[144] Olston C;Reed B;Srivastava U.Pig Latin:A not-soforeign language for data processing[A].New York:ACM,2008:1099-1110.

[145] Pike R;Dorward S;Griesemer R et al.Interpreting the data:Parallel analysis with Sawzall[J].Science of Computer Programming,2005,13(04):277-298.

[146] Isard M;Yu Y.Distributed data-parallel computing using a high-level programming language[A].New York:ACM,2009:987-994.

[147] Fegaras L;Li C;Gupta U.XML query optimization in MapReduce[A].New York:ACM,2011

[148] Iu M Y;Zwaenepoel W.Hadoop ToSQL:A MapReduce query optimizer[A].New York:ACM,2010:251-264.

[149] Fegaras L;Li C;Gupta U.An optimization framework for map-reduce queries[A].New York:ACM,2012:26-37.

[150] He Bingsheng;Yang Mao;Guo Zhenyu.Comet:Batched stream processing for data intensive distributed computing[A].New York:ACM,2010:63-74.

[151] LeeR;Luo Tian;Huai Yin.YSmart:Yet another SQL-to-MapReduce translator[A].Piscataway,NJ:IEEE,2011:25-36.

[152] Chattopadhyay B;Lin Liang;Liu Weiran et al.Tenzing a SQL implementation on the MapReduce framework[J].PVLDB,2011,4(12):1318-1327.

[153] Stoellberger P.S4Latin:Language-based big data streaming[M].UK:University of Edinburgh,2011

[154] Morton K;Balazinska M;Grossman D.ParaTimer:A progress indicator for MapReduce DAGs[A].New York:ACM,2010:507-518.

[155] Morton K;Friesen A;Balazinska M.KAMD:Estimating the progress of MapReduce pipelines[A].Piscataway,NJ:IEEE,2010:681-684.

[156] Huang Dachuan;Shi Xuanhua;Ibrahim Shadi.MR-scope:A real-time tracing tool for MapReduce[A].New York:ACM,2010:849-855.

[157] Tan Jiaqi;Kavulya S;Gandhi R.Visual,log-based causal tracing for performance debugging of MapReduce systems[A].Piscataway,NJ:IEEE,2010:795-806.

[158] Khoussainova N;Balazinska M;Suciu D .PerfXplain:Debugging MapReduce job performance[J].PVLDB,2012,5(07):598-609.

[159] Dai Jinquan;Huang Jie;Huang Shengsheng.HiTune:Dataflow-based performance analysis for big data cloud[A].Berkeley,CA:USENIX Association,2011

[160] Baru C;Bhandarkar M;Nambiar R.Introducting the big data benchmarking community[A].http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012 _tue_ LT07 _ Baru _etal.pdf,2012

[161] Chen Y.We don't know enough to make a big data benchmark suite-An academia-industry view[M].Berkeley:EECS Department,University of California

[162] Chen Yanpei;Ganapathi A;Griffith R.The case for evaluating MapReduce performance using workload suites[A].Piscataway,NJ:IEEE,2011:390-399.

[163] Sangroya A;Serrano D;Bouchenak S.MRBS: A comprehensive MapReduce benchmark suite[M].LIG Laboratory,University of Grenoble,2012

[164] GridMix[EB/OL].http://hadoop.apache.org/docs/mapreduce/current/gridmix.html,2012-10-02.

[165] TPC-DS[EB/OL].http://www.tpc.org/tpcds/,2012-10-02.

[166] Patil S;Polte M;Ren K.YCSB+ +:Benchmarking and performance debugging advanced features in scalable table stores[A].New York:ACM,2011

[167] Chen Y;Alspaugh S;Katz R .Interactive query processing in big data systems:A cross-industry study of MapReduce workloads[J].PVLDB,2012,5(12):1802-1813.


语种: 中文   

基金国家自然科学基金项目(61070055,91024032,91124001,60833005)

关键词大数据 数据分析 云计算


期刊热词
  • + 更多
  • 字体大小