Current location - Quotes Website - Team slogan - Does big data hadoop need raid5?
Does big data hadoop need raid5?
Whether big data hadoop needs raid5 word count is one of the simplest programs that can best reflect MapReduce thought. It can be called "Hello World" of MapReduce version, and the complete code of this program can be found in the "src/examples" directory of Hadoop installation package.

Exploring Big Data: Do you want to take a postgraduate entrance examination? What is the reason for your postgraduate entrance examination? Maybe you'll find the answer yourself ... for peace.

Raid5 data problem can't be solved, just like a new disk needs to be repartitioned and formatted in disk management, but in that case, the data written in Raid mode will be gone, because the data written in RAID is scattered on a disk, and even the file system of this disk is scattered incompletely. Fortunately, this disk has relatively complete file system metadata, which means that you can see files on other computers, but it is definitely wrong to open these files. Maybe you are lucky enough to see a text document that can be opened, but it is estimated to be less than 4KB. . . The file size is smaller than the RAID stripe size divided by the number of RAID disks, and it is also related to the file system cluster size. Basically, you can't see this disk on other computers. Having said that, I hope you can get a deeper understanding of the general situation.

Big data tells you whether to be a civil servant or not, which is normal. Examination time 120 minutes, excluding the time for drawing the answer sheet, the average time for each question is only more than 50 seconds. For our candidates, we must first concentrate our superior forces and do all the questions that can be done, done and done well to ensure the correct rate.

What is the principle of RAID5 RAID5 data recovery?

This kind of problem is more complicated. The structure of server hard disk is complicated. Simply put, RAID5 needs at least three hard disks, which need to be combined with the same model and capacity. If the server is broken, it is necessary to mark the location of the hard disk in the memory, so that the server used for later data recovery is broken and at least two hard disks are broken. Therefore, if it is broken, don't operate it and protect the scene. Find professional data recovery personnel to recover data. Generally speaking, data can be recovered. We recommend Xi 'an Wang Jun data and professional data recovery institutions. If only the whole server disk is broken, there is great hope of recovery.

Does Big Data Need Hadoop? Yes There is no technology to replace Hadoop at present.

Use the mand command of hdfs to view files on HDFS, or use the default hadoop web manager to view them. Starting from hadoop0.2.23, hadoop has designed a set of Rest-style interfaces to browse and manipulate data on HDFS through protocol access.

Big data tells you whether to take a driver's license during college. Generally speaking, there are very few work units that need a driver's license to find a job, so there is no need to take a driver's license test. But generally speaking, especially boys, they definitely want to drive by themselves. Therefore, the best time to take a driver's license test is to study for one or two months in a winter and summer vacation during college, and you can basically get a driver's license. If you don't get your driver's license in time at school, I'm afraid you won't have much time to take the exam in the future. So, if the university can get a driver's license, try to get it. If you can't, don't push yourself too hard.

Of course, there is another problem: you should take enough extra-curricular credits before you graduate from college. If there are not enough extra-curricular credits, you can take a driver's license test, which can be used as two extra-curricular credits to increase credits for graduation, so it is best to take a driver's license test during college.

Big Data: Introduction to Hadoop What is Big Data?

(1.) Big data refers to data that cannot be captured, managed and processed by conventional software in a certain period of time. In short, the amount of data is so large that it cannot be processed by conventional tools, such as relational databases and data warehouses. What is the order of magnitude of "big" here? For example, Alibaba processes 20PB (2097 1520GB) of data every day.

2. The characteristics of big data:

(1.) Huge. According to the current development trend, the volume of big data has reached PB level or even EB level.

(2) There are various data types of big data, mainly unstructured data, such as online magazines, audio, video, pictures, geographical location information, transaction data, social data and so on.

(3) Low value density. Valuable data accounts for only a small part of the total data. For example, in a video, only a few seconds of information is valuable.

(4) Fast generation and demand processing. This is the most striking feature in the field of big data and traditional data mining.

3. Besides, there are other processing systems that can handle big data.

Hadoop (open source)

Spark (open source)

Storm (open source)

MongoDB (open source)

IBM PureDate (business edition)

Oracle Exadata Database Machine (Commercial)

SAP Hana (business)

Teradata AsterData (business)

EMC GreenPlum (commercial)

HP Vertica (Commercial)

Note: Only Hadoop is introduced here.

Two: Hadoop architecture

Hadoop source code:

Hadoop originated from three papers on GFS(Google File System), MapReduce and BigTable published by Google in 2003 and 2004, and was founded by Doug Cutting. Hadoop is now the top project of Apache Foundation. "

Hadoop is a fictional name. Doug Cardin's baby is named after his yellow toy elephant.

The core of Hadoop:

(1.)HDFS and MapReduce are the two cores of Hadoop. The underlying support for distributed storage is realized through HDFS, so as to realize high-speed parallel reading and writing and large-capacity storage expansion.

(MapReduce supports distributed tasks to ensure high-speed partition processing of data.

3.Hadoop subproject:

(1.)HDFS: Distributed file system, the cornerstone of the whole Hadoop system.

(2) MapReduce/Yarn: Parallel programming model. YARN is the second generation MapReduce framework. Since Hadoop version 0.23.0 1, MapReduce has been rebuilt, usually called MapReduce V2, and the old MapReduce is also called MapReduce V 1.

(3.)Hive: a data warehouse built on Hadoop, which provides a query method similar to SQL voice to query data in Hadoop.

(5.)h base:Hadoop database is Hadoop's distributed and column-oriented database, which comes from Google's paper on BigTable and is mainly used for random access and real-time reading and writing of big data.

(6.)ZooKeeper: It is a coordination service designed for distributed applications, which mainly provides synchronization, configuration management, grouping and naming services for users, reducing the coordination tasks undertaken by distributed applications.

There are many other projects, which will not be explained here.

Third, install Hadoop running environment.

User created:

(1.) Create a Hadoop user group and enter the command:

groupadd hadoop

(2) create an hduser and enter the command:

useradd–p Hadoop HD user

(3) set the password of hduser and enter the command:

Password hduser

Enter the password twice as prompted.

(4) add permissions for hduser, and enter the command:

# Modify permissions

chmod 777 /etc/sudoers

# Edit sudoers

Gedit /etc/sudoers

# Restore default permissions

chmod 440 /etc/sudoers

First, modify the sudoers file permissions, find the line "root ALL=(ALL)" in the text editing window, and then update and add the line "hduser ALL=(ALL) ALL" to add hduser to sudoers. Remember to restore the default permissions after adding, otherwise the sudo command will not be allowed.

(5) After the setup is completed, restart the virtual machine and enter the command:

Sudo restart

Switch to hduser login after reboot.

Install JDK

(1.) Download jdk-7u67-linux-x64.rpm and enter the download directory.

(2) Run the installation command:

sudo rpm–IVH JDK-7u 67-Linux-x64 . rpm

When finished, check the installation path and enter the command:

rpm–QA JDK–l

Remember this road,

(3) Configure environment variables and enter commands:

Sudo gedit /etc/profile

Open the profile and add the following at the bottom of the file

Export java _ home =/usr/java/jdk.7.0.67.

Export classpath =$ JAVA_HOME/lib:$ CLASSPATH

Export path =$ JAVA_HOME/bin:$PATH

Close the file after saving, and then enter the command to make the environment variable take effect:

Source /etc/ profile

(4) Verify the JDK and enter the command:

Java-–version

If the correct version appears, the installation is successful.

Configure password-free login for local SSH:

(1.) Use ssh-keygen to generate private key and public key files, and enter the command:

ssh-keygen–t RSA

(2) The private key is left in this machine, and the public key is sent to other hosts (now localhost). Enter the command:

Ssh-copy-id local host

(3) Log in with the public key and enter the command:

Ssh local host

Configure SSH secret login for other hosts.

(1.) Clone twice. Right-click the virtual machine in the left column of VMware and select Manage-Clone from the shortcut menu that pops up. Select "Create Full Clone" when cloning a type, click "Next" and press the button until it is finished.

(2) Start and enter three virtual machines respectively, and use ifconfig to query the IP address of the host.

(3) Modify the host name and host file of each host.

Step 1: Modify the host name and enter the command in each host.

Sudo gedit /etc/sysconfig/neork

Step 2: Modify the host file:

sudo gedit /etc/hosts

Step 3: Modify the IP of three virtual machines.

The IP of the node 1 corresponding to the first virtual machine is192.168.1.130.

The IP of the second node2 virtual machine is192.168.5438+0.11.

The IP of the third node3 virtual machine is192.168.438+0.438+032.

(4) Because the key pair has been generated on node 1, all you need to do now is to enter the command on node 1:

ssh-copy-id node2

Ssh-copy-id node 3

In this way, the public key of node 1 can be published to node2 and node3.

(5) Test SSH, and enter the command on node 1:

Ssh node 2

# Logout

export

Ssh node 3

export

Four: Hadoop fully distributed installation

1.Hadoop has three modes of operation:

(1.) Stand-alone mode: Hadoop is regarded as an independent Java process, running in non-distributed mode without configuration.

(2) Pseudo-distributed: a cluster with only one node, namely a master (master node, master server) and a slave (slave node, slave server). You can use different java processes on this node to simulate various nodes in the distributed system.

(3) Completely distributed: For Hadoop, different systems will have different ways to divide nodes.

Install Hadoop

(1.) Get the Hadoop compressed package hadoop-2.6.0.tar.gz. After the download is completed, you can use VMWare Tools to share the folder through * * *, or you can use Xftp tools to send it to node 1. Enter node 1 to extract the compressed package into the /home/hduser directory, and enter the command: # to enter the home directory, that is, "/HOME/hduser".

cd ~

tar–zxvf hadoop-2.6.0.tar.gz

(2) Rename hadoop input command:

mv hadoop-2.6.0 hadoop

(3) Configure Hadoop environment variables and input commands:

Sudo gedit /etc/profile

Add the following script to the configuration file:

#hadoop

Export Hadoop _ home =/home/hduser/Hadoop.

Export path =$HADOOP_HOME/bin:$PATH

Save and close, and finally enter the command to make the configuration take effect.

Source /etc/ profile

Note: node2 and node3 should be configured according to the above configuration.

Configure Hadoop

(1.)hadoop-env.sh file is used to specify the JDK path. Enter the command:

[hduser @ node 1 ~]$ CD ~/Hadoop/etc/Hadoop

[hduser @ node 1 Hadoop]$ gedit Hadoop-env . sh

Then add the following to specify the jDK path.

Export Java _ home =/usr/Java/JDK1.7.0 _ 67.

(2) Open the specified JDK path and enter the command:

Export Java _ home =/usr/Java/JDK1.7.0 _ 67.

(4) core-site.xml: This file is the Hadoop global configuration. Open it and add configuration attributes to the element, as shown below:

Fs.defaultFShdfs: node1:9000 hadoop.tmp.dirfile:/home/hduser/hadoop/tmp There are two commonly used configuration properties, fs.defaultfs represents the default path prefix when clients connect to hdfs, and 9000 is the working port of HDFS. If Hadoop.tmp.dir is not specified, it will be saved to the system default temporary file directory /tmp. (5.)hdfs-site.xml: This file is the configuration of hdfs. Open and add configuration attributes to the element. (6) Mapred-site.xml: This file is the configuration of MapReduce, which can be copied and opened from the template file mapred-site.xml.template and added to the element. (7.)YARN-site.xml: If the yarn framework is configured in mapred-site.xml, the yarn framework will use the configuration in this file to open and add the configuration attribute to the element. (8) Copy these seven commands to Node 2 and Node 3. Enter the following command: scp–r/home/hduser/Hadoop/etc/Hadoop/hduser @ node2:/home/hduser/Hadoop/etc/scp–r/home/hduser/Hadoop/etc/Hadoop/hduser @ node3:/home/hduser/Hadoop/etc/4. Verification: Let's verify whether hadoop is correct (1. ) and format the NameNode(node 1) on the master host. Enter the command: [hduser @ node 1 ~] $ CD ~/Hadoop [hduser @ node1Hadoop] $ bin/hdfsnamenode–format (2) Close node1,node2, node3, system firewall and restart the virtual machine. Enter the command: service iptables s sudo chkconfig iptables off reboot (3. Enter the following to start HDFS: [hduser @ node1~] $ cd ~/Hadoop (4. ) start all [hduser@node 1 hadoop]. $ *** in/start-all.sh (5。 ) view the cluster status: [hduser @ node1Hadoop] $ bin/hdfsdfsadmin–report (6. ) View the running status of hdfs in the browser: node 1:50070 (7. ) Stop Hadoop. Enter the command: [hduser @ node 1 Hadoop] $ * * in/s-all.shv. Hadoop-related shell operations (1. ) create file 1.txt, /home/hduser/file directory in the operating system of file2.txt, and you can create file2.txt by using the graphical interface. File 1.txt input: Hello World hi HADOOP file2.txt input Hello World hi CHIAN (2. ) Start hdfs/Input2 and create a directory [hduser @ node1Hadoop] $ bin/Hadoop FS–mkdir/Input2 (3. ) save file1.txt.file2.txt in hdfs: [hduser @ node1Hadoop] $ bin/Hadoop fs–put-/file/file *. txt/input2/(4。 )[hduser @

I wonder what the cost of RAID5 data recovery is? I have been to Aite Data Recovery Agency for recovery before, and it cost less than 2000 yuan. It seems to depend on what the problem is. My problem is more complicated, and it can be solved with 2000 yuan, which is also very unexpected.