CIRCLE. The Big data transforms the way organizations collect data
and use decision-making tools, giving access to information that improve productivity,
innovation and competitiveness. So it's no surprise that he is high on the agenda for many
IT organizations. However, implementing the Big data has some challenges.
The Big data initiatives lead to large databases, elastic horizontally deployed on clusters can span thousands of servers. The network that connects these servers and design are key. A poorly designed network can inhibit the launch and expansion of big data initiatives, penalizing the value they could provide. Here are four questions you need to ask yourself (and your network equipment manufacturer).
1) How do I ensure the reliability of data?
The file system Hadoop Distributed File System (HDFS) places each data on a node in a rack, then the replica on two different nodes in different racks. HDFS having knowledge of the topology, the replicas are placed on nodes connected to different switches. No additional intelligence does not need to be integrated with switches.
In case of failure, HDFS is based on the network to maintain reliability. However, data replication can take hours. For example, it takes about 66 minutes to transfer 30 TB of data on a network 1GbE - taking into account network latency and according to the actual performance of the network, it can take much longer. If the network does not have enough bandwidth, the risk is that the data node loses connectivity with the naming node and then lose HDFS reliability.
With a 10Gbps infrastructure, replication of a disk 3 TB takes only 7 minutes. Moreover, network performance is far superior (see below), ensuring the availability of bandwidth required for reliable connection data nodes and naming.
2) How can I ensure that performance will be good enough?
Traditional multi-tier networks are inefficient when it comes to building support application performance multirack Big data. The latency of a typical frame of switch may be only one microsecond, but the latency of the switches and distribution network heart is significantly greater.
The problem is compounded by the oversubscription introduced at the distribution and network heart. A frame switch is operable with an oversubscription ratio of 3: 1. But if we consider an oversubscription ratio of 4: 1 at the distribution switch, we get an overall oversubscription of 10: 1. Therefore, the actual available bandwidth for replication, a data node configured with a connection 20 Gbps, is only 2 Gbps.
In addition, with Big data application, it is not acceptable to lose packets when the data is moved within the data center. On an Ethernet network, the answer to this problem is usually provided by the technology data Center Bridging (DCB). But all switches currently available on the market do not support it.
Adequate infrastructure should not use intermediate switches so that each server is only a hop from the other, which significantly reduces latency. Eliminate intermediate switches also reduced substantially oversubscribed. Which indicates the lowest delay requirements of the traffic and an increase in flow rates.
3) How to make sure my big data project could be extended?
Elasticity is essential for big data initiatives. However, the network becomes more extensive, more specific restrictions on traditional infrastructure induce latency and oversubscription.
But we must also consider the issue of data collection. Who says big data initiative, said collection of structured and unstructured data in real time. Therefore, the network must support direct access, which is ideal, storage networks; that some network architectures do not allow.
Choose an architecture capable of elasticity to levels 2 and 3, which can be optimized for FCoE SANs, iSCSI, and NFS, ensure you do not have to worry about your growing needs.
4) How can I easily administer?
Traditional multi-tier networks are inherently complex to administer. The number of intermediate switches increases with the number of servers and racks. Therefore, both the administration and maintenance of the network become more complex.
Authentic matrix based on a single operating system, behaves as a converged Ethernet switch and can be administered as such. Provisioning, administration and maintenance are greatly simplified.
Ask the right questions
There is no doubt that the big data initiatives will yield the necessary information to the piloting of new activities and the emergence of new ways of doing business, when they begin to materialize. But to ensure that networks do not constitute an obstacle, organizations must ensure that they have asked the right questions about the design of their network.
The Big data initiatives lead to large databases, elastic horizontally deployed on clusters can span thousands of servers. The network that connects these servers and design are key. A poorly designed network can inhibit the launch and expansion of big data initiatives, penalizing the value they could provide. Here are four questions you need to ask yourself (and your network equipment manufacturer).
1) How do I ensure the reliability of data?
The file system Hadoop Distributed File System (HDFS) places each data on a node in a rack, then the replica on two different nodes in different racks. HDFS having knowledge of the topology, the replicas are placed on nodes connected to different switches. No additional intelligence does not need to be integrated with switches.
In case of failure, HDFS is based on the network to maintain reliability. However, data replication can take hours. For example, it takes about 66 minutes to transfer 30 TB of data on a network 1GbE - taking into account network latency and according to the actual performance of the network, it can take much longer. If the network does not have enough bandwidth, the risk is that the data node loses connectivity with the naming node and then lose HDFS reliability.
With a 10Gbps infrastructure, replication of a disk 3 TB takes only 7 minutes. Moreover, network performance is far superior (see below), ensuring the availability of bandwidth required for reliable connection data nodes and naming.
2) How can I ensure that performance will be good enough?
Traditional multi-tier networks are inefficient when it comes to building support application performance multirack Big data. The latency of a typical frame of switch may be only one microsecond, but the latency of the switches and distribution network heart is significantly greater.
The problem is compounded by the oversubscription introduced at the distribution and network heart. A frame switch is operable with an oversubscription ratio of 3: 1. But if we consider an oversubscription ratio of 4: 1 at the distribution switch, we get an overall oversubscription of 10: 1. Therefore, the actual available bandwidth for replication, a data node configured with a connection 20 Gbps, is only 2 Gbps.
In addition, with Big data application, it is not acceptable to lose packets when the data is moved within the data center. On an Ethernet network, the answer to this problem is usually provided by the technology data Center Bridging (DCB). But all switches currently available on the market do not support it.
Adequate infrastructure should not use intermediate switches so that each server is only a hop from the other, which significantly reduces latency. Eliminate intermediate switches also reduced substantially oversubscribed. Which indicates the lowest delay requirements of the traffic and an increase in flow rates.
3) How to make sure my big data project could be extended?
Elasticity is essential for big data initiatives. However, the network becomes more extensive, more specific restrictions on traditional infrastructure induce latency and oversubscription.
But we must also consider the issue of data collection. Who says big data initiative, said collection of structured and unstructured data in real time. Therefore, the network must support direct access, which is ideal, storage networks; that some network architectures do not allow.
Choose an architecture capable of elasticity to levels 2 and 3, which can be optimized for FCoE SANs, iSCSI, and NFS, ensure you do not have to worry about your growing needs.
4) How can I easily administer?
Traditional multi-tier networks are inherently complex to administer. The number of intermediate switches increases with the number of servers and racks. Therefore, both the administration and maintenance of the network become more complex.
Authentic matrix based on a single operating system, behaves as a converged Ethernet switch and can be administered as such. Provisioning, administration and maintenance are greatly simplified.
Ask the right questions
There is no doubt that the big data initiatives will yield the necessary information to the piloting of new activities and the emergence of new ways of doing business, when they begin to materialize. But to ensure that networks do not constitute an obstacle, organizations must ensure that they have asked the right questions about the design of their network.
Post a Comment