DW: Common Commands of HiveQL
HiveQL, the SQL-like language for querying data in Apache Hive, empowers users to interact with structured data in a distributed computing environment. Below are some frequently used commands:
Data Definition Language (DDL) Commands
1. CREATE TABLE
Creates a new table in Hive.
1 |
|
2. ALTER TABLE
Modifies an existing table by adding, renaming, or dropping columns.
1 |
|
3. DROP TABLE
Deletes a table from the database.
1 |
|
Data Manipulation Language (DML) Commands
1. INSERT INTO
Adds new rows of data into a table.
1 |
|
2. SELECT
Retrieves data from one or more tables based on specified conditions.
1 |
|
3. UPDATE
Modifies existing data in a table.
1 |
|
4. DELETE
Removes rows from a table based on specified conditions.
1 |
|
5. JOIN
Combines rows from two or more tables based on a related column between them.
Inner Join
Retrieves rows that have matching values in both tables.
1 |
|
Left Join (or Left Outer Join)
Retrieves all rows from the left table and matching rows from the right table. If there’s no match, NULL values are returned for the right table.
1 |
|
Right Join (or Right Outer Join)
Retrieves all rows from the right table and matching rows from the left table. If there’s no match, NULL values are returned for the left table.
1 |
|
Full Join (or Full Outer Join)
Retrieves all rows when there is a match in either the left or right table. If there’s no match, NULL values are returned for the unmatched side.
1 |
|
6. UNION
Combines the results of two or more SELECT statements into a single result set.
Additional Commands
1. SHOW DATABASES
Lists all databases in the Hive environment.
1 |
|
2. SHOW PARTITIONS
Displays partition information for a specified table.
1 |
|
3. SET
Configures Hive parameters by setting values.
1 |
|
4. QUIT
Exits the Hive environment.
1 |
|
These detailed HiveQL commands provide a comprehensive understanding of data manipulation techniques, particularly JOIN operations like Inner Join, Left Join, Right Join, and Full Join. Utilizing these operations in Apache Hive enables users to handle complex data relationships and merge data sets efficiently in distributed environments.