Big Data. Seifedine Kadry

Чтение книги онлайн.

Читать онлайн книгу Big Data - Seifedine Kadry страница 20

Big Data - Seifedine  Kadry

Скачать книгу

Cassandra OrientDB SimpleDB RethinkDB Accumulo ArangoDB BerkeleyDB Oracle MarkLogic Big Table FlockDB

      2.4.3 NewSQL Databases

      NewSQL databases provide scalable performance similar to that of NoSQL systems combining the ACID properties of a traditional database management system. VoltDB, NuoDB, Clustrix, MemSQL, and TokuDB are some of the examples of NewSQL database.

      NewSQL databases are distributed in nature, horizontally scalable, fault tolerant, and support relational data model with three layers: the administrative layer, transactional layer, and storage layer. NewSQL database is highly scalable and operates in shared nothing architecture. NewSQL has SQL compliant syntax and uses relational data model for storage. Since it supports SQL compliant syntax, transition from RDBMS to the highly scalable system is made easy.

      The applications targeting these NewSQL systems are those that execute the same queries repeatedly with different inputs and have a large number of transactions. Some of the commercial products of NewSQL databases are briefed below.

      2.4.3.1 Clustrix

      Clustrix is a high performance, fault tolerant, distributed database. Clustrix is used in applications with massive, high transactional volume.

      2.4.3.2 NuoDB

      NuoDB is a cloud based, scale‐out, fault tolerant, distributed database. They support both batch and real‐time SQL queries.

      2.4.3.3 VoltDB

      VoltDB is a scale‐out, in‐memory, high performance, fault tolerant, distributed database. They are used to make real‐time decisions to maximize business value.

      2.4.3.4 MemSQL

      MemSQL is a high performance, in‐memory, fault tolerant, distributed database. MemSQL is known for its blazing fast performance and used for real‐time analytics.

      Scalability is the ability of the system to meet the increasing demand for storage capacity. A system capable of scaling delivers increased performance and efficiency. With the advent of the big data era there is an imperative need to scale data storage platforms to make them capable of storing petabytes of data. The storage platforms can be scaled in two ways:

       Scaling‐up (vertical scalability)

       Scaling‐out (horizontal scalability)

image

      Chapter 2 Refresher

      1 The set of loosely connected computers is called _____.LANWANWorkstationClusterAnswer:dExplanation: In a computer cluster all the participating computers work together on a particular task.

      2 Cluster computing is classified intoHigh‐availability clusterLoad‐balancing clusterBoth a and bNone of the aboveAnswer:c

      3 The computer cluster architecture emerged as a result of ____.ISAWorkstationSupercomputersDistributed systemsAnswer:dExplanation: A distributed system is a computer system spread out over a geographic area.

      4 Cluster adopts _______ mechanism to eliminate the service interruptions.Sharding ReplicationFailoverPartitionAnswer:c

      5 _______ is the process of switching to a redundant node upon the abnormal termination or failure of a previously active node.ShardingReplicationFailoverPartitionAnswer:c

      6 _______ adds more storage resources and CPU to increase capacity.Horizontal scalingVertical scalingPartitionAll of the mentionedAnswer:bExplanation: When the primary steps down, the MongoDB closes all client connections.

      7 _______ is the process of copying the same data blocks across multiple nodes.ReplicationPartitionShardingNone of the aboveAnswer:aExplanation: Replication is the process of copying the same data blocks across multiple nodes to overcome the loss of data when a node crashes.

      8 _______ is the process of dividing the data set and distributing the data over multiple servers.VerticalShardingPartitionAll of the mentionedAnswer:bExplanation: Sharding is the process of partitioning very large data sets into smaller and easily manageable chunks called shards.

      9 A sharded cluster is _______ to provide high availability.ReplicatedPartitionedClusteredNone of the aboveAnswer:aExplanation: Replication makes the system fault tolerant since the data is not lost when an individual node fails as the data is redundant across the nodes.

      10 NoSQL databases exhibit ______ properties.ACIDBASEBoth a and bNone of the aboveAnswer:b

      1  What is a distributed file system? A distributed file system is an application that stores the files across cluster nodes and allows the clients to access the files from the cluster. Though physically the files are distributed across the nodes, logically it appears to the client as if the files are residing on their local machine.

      2  What is failover? Failover is the process of switching to a redundant node upon the

Скачать книгу