整理和翻新一下自己看过和笔记过的Big Data相关的论文和Blog
Discretized Streams, 离散化的流数据处理
Spark - A Fault-Tolerant Abstraction for In-Memory Cluster Computing
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
Sparrow - Distributed, Low Latency Scheduling
The Log: What every software engineer should know about real-time data‘s unifying abstraction
Kafka: a Distributed Messaging System for Log Processing
Apache Samza - Reliable Stream Processing atop Apache Kafka and Hadoop YARN
bigtable: A Distributed Storage System for Structured Data
Dremel - Interactive Analysis of WebScale Datasets
Chubby - lock service for loosely-coupled distributed systems
Megastore - Providing Scalable, Highly Available Storage for Interactive Services
一致性问题
Why Vector Clock are Easy or Hard?
索引技术
数据模型
NoSQL Data Modeling Techniques
系统
Dynamo: Amazon’s Highly Available Key-value Store
Cassandra - A Decentralized Structured Storage System
YARN - Yet Another Resource Negotiator
海量文档查同或聚类问题 -- Locality Sensitive Hash 算法
同步和异步, 阻塞和非阻塞, Reactor和Proactor
原文:http://www.cnblogs.com/fxjwind/p/3535054.html