Big Data. Seifedine Kadry

Чтение книги онлайн.

Читать онлайн книгу Big Data - Seifedine Kadry страница 18

Big Data - Seifedine  Kadry

Скачать книгу

       Replication—Replication is the process of placing the same set of data over multiple nodes. Replication can be performed using a peer‐to‐peer model or a master‐slave model.

       Sharding—Sharding is the process of placing different sets of data on different nodes.

       Sharding and Replication—Sharding and replication can either be used alone or together.

      2.2.1 Sharding

      Sharding is the process of partitioning very large data sets into smaller and easily manageable chunks called shards. The partitioned shards are stored by distributing them across multiple machines called nodes. No two shards of the same file are stored in the same node, each shard occupies separate nodes, and the shards spread across multiple nodes collectively constitute the data set.

image image

      Figure 2.6b shows an example as how a data block is split up into shards across multiple nodes. A data set with employee details is split up into four small blocks: shard A, shard B, shard C, shard D and stored across four different nodes: node A, node B, node C, and node D. Sharding improves the fault tolerance of the system as the failure of a node affects only the block of the data stored in that particular node.

      2.2.2 Data Replication

image

      2.2.2.1 Master‐Slave Model

      2.2.2.2 Peer‐to‐Peer Model

image

      In the peer‐to‐peer model the workload or the task is partitioned among the nodes. The nodes consume as well as donate the resources. Resources such as disk storage space, memory, bandwidth, processing power, and so forth, are shared among the nodes.

      Reliability of this type of configuration is improved through replication. Replication is the process of sharing the same data across multiple nodes to avoid single point of failure. Also, the nodes connected in a peer‐to‐peer configuration are geographically distributed across the globe.

      2.2.3 Sharding and Replication

Скачать книгу