Installing Apache Hive in Ubuntu

2 December 2018

In this blog post we will install Apache Hive in Ubuntu Machine(Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-36-generic x86_64)). Once installation is complete we will run Hive queries using Hive Query Language(HQL) to Verify the installation.

Ubuntu Version
Figure: Ubuntu Version

Prequisites for Hive Installation

Before Installing hive ,we need to make sure that both Java and Hadoop is installed and configured in cluster.

Install Java

First Update the Ubuntu with latest software and patches if available

sudo apt-get update && sudo apt-get -y dist-upgrade

Use the below command to Install open jdk version of Java.

sudo apt-get -y install openjdk-8-jdk-headless

Install Apache Hive

Download and Uncompress Hive

First Download the latest available Hive installation archive from the mirror site.

cd /tmp
sudo wget https://www-eu.apache.org/dist/hive/stable-2/apache-hive-2.3.4-bin.tar.gz
download_hive
Figure: Downloading Hive

Once the file is downloaded, Uncompress the Tar file and move to installation location

tar -xvf apache-hive-2.3.4-bin.tar.gz
mv apache-hive-2.3.4-bin /usr/local/hive

Change Permission to installation directory

If you want to run hive besides root user you need to change ownership of hive directory to desired user and hive proper permission .

For my case Apache Hive is being installed for user hduser at location /usr/local/hive.

## Give 755 Permisiion to Folder
chmod 755 -R /usr/local/hive

## Change ownership
 chown -R hduser /usr/local/hive

Skip this step if you are installing hive as default user.

Set the HIVE_HOME in system Path

Now we have moved the hive installation file to /usr/local/hive.We need to add this path to Ubuntu system Path if we wanto access hive from anywhere in that Ubuntu.

In Debain based system .bashrc is is a shell script that Bash runs whenever it is started interactively. It initializes an interactive shell session.

Use the text editor like vim or nano to open and edit the file.

nano ~/.bashrc

Set the Hive Home Path in the .bashrc file like below.

#HIVE Path
export HIVE_HOME=/usr/local/hive
export HIVE_CONF_DIR=/usr/local/hive/conf
export PATH=$HIVE_HOME/bin:$PATH

Now, to make the Hive path available ,we need to reload the .bashrc file using the source command

source ~/.bashrc

Check Hadoop and Java Path in .bashrc

Before running Hive we need to make sure that Apache Hadoop and Java is set up in path and running properly.

#HADOOP VARIABLES START
export HADOOP_HOME="/usr/local/hadoop"
export PATH="$HADOOP_HOME/bin:$PATH"
export PATH="$HADOOP_HOME/sbin:$PATH"
export HADOOP_MAPRED_HOME="$HADOOP_HOME"
export HADOOP_COMMON_HOME="$HADOOP_HOME"
export HADOOP_HDFS_HOME="$HADOOP_HOME"
export YARN_HOME="$HADOOP_HOME"
export HADOOP_COMMON_LIB_NATIVE_DIR="$HADOOP_HOME/lib/native"
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
#HADOOP VARIABLES END

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export PATH=$JAVA_HOME/bin:$PATH

Now use the jps and hadoop version command to check if Apache Hadoop is running or not.

Check Hadoop
Figure:Check Hadoop and Hive Version

Create Hive Warehouse directory and initialize Derby

Let’s configure the diretcory information in Hadoop Distributed File System(HDFS) where hive can store its data.

hdfs dfs -mkdir -p /user/hive/warehouse

Now give proper permission to the warehouse

hdfs dfs -chmod 755 /user/hive/warehouse

Now let’s inform hive about the database that it should use for its schema definition. The below command tells hive to use derby database as its metastore database. We can also specify this in the hadoop hive configuration file ‘hive-site.xml’ file.

$HIVE_HOME/bin/schematool -initSchema -dbType derby
Init Derby
Figure:Initialize Derby database

Run Hive Queries(Hive Query Langauge)

Start the Hive Shell

hive
hive-session
Figure: Hive Shell

Create Database in Hive

We will create a new database named niten_test and display all existing databses using SHOW DATABASES command.

CREATE DATABASE IF NOT EXISTS niten_test;

SHOW DATABASES;
create-database
Figure: Create Hive Database

Create Hive Table

We have just created our own database, which we can used to create table.

so switch to the databse you just created.

USE niten_test;

Now create a table inside this databse with below fields.

CREATE TABLE IF NOT EXISTS niten_table(
id INT,
first_name String,
last_name String,
website String);
Create Hive Table
Figure: Create Hive Table

Once table is successfully created ,we can display the tables and the schema of the table.

show tables;

desc niten_table;
show-describe-hive-table
Figure: Show and Describe Hive Table

Insert Records into Hive Tables


INSERT INTO TABLE niten_table VALUES(1,'Nitendra','Gautam','nitendragautam.com');
insert-record-hive-table
Figure: Insert Record Hive Table

Display the record

SELECT * FROM niten_test;
display-record-hive-table
Figure: Display Record

To conclude we have installed and validated Apache Hive in Ubuntu server.

Share: Twitter Facebook Google+ LinkedIn
comments powered by Disqus