Hive is a data warehousing framework built on top of Hadoop. One of the core features of Hive is the ability to store and query data using tables. In Hive, there are three types of tables: managed tables, external tables, and temporary tables. In this section, we will discuss each of these table types in detail.
Managed Tables
Managed tables, also known as internal tables, are tables where the data and metadata are managed by Hive. In other words, when a managed table is created in Hive, the table schema and data are stored in a default directory managed by Hive. Managed tables are the default table type in Hive, and they are used to store data in a structured format for efficient querying.
The following code snippet demonstrates how to create a managed table in Hive:
CREATE TABLE my_table (
id INT,
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
In the above example, we create a managed table called my_table
with three columns: id
, name
, and age
. The ROW FORMAT DELIMITED
and FIELDS TERMINATED BY ','
statements specify that the data is stored in text format and is delimited by a comma.
External Tables
External tables are tables where the data is managed outside of Hive. When an external table is created in Hive, the table schema is stored in the Hive metastore, but the data is stored in an external location. This location can be a file system like HDFS, or it can be an external storage system like Amazon S3.
The following code snippet demonstrates how to create an external table in Hive:
CREATE EXTERNAL TABLE my_external_table (
id INT,
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/hive/my_external_table';
In the above example, we create an external table called my_external_table
with three columns: id
, name
, and age
. The ROW FORMAT DELIMITED
and FIELDS TERMINATED BY ','
statements specify that the data is stored in text format and is delimited by a comma. The LOCATION
statement specifies the external location where the data is stored.
One of the benefits of using external tables is that it allows for sharing of data between different Hive tables or other systems. It also provides more flexibility for managing data, as it can be moved or deleted without affecting the metadata in the Hive metastore.
Temporary Tables
Temporary tables are tables that are created for a specific session and are deleted at the end of the session. Temporary tables can be used to hold intermediate results during a complex data processing job. They can be created and dropped on the fly, and they do not persist beyond the session.
The following code snippet demonstrates how to create a temporary table in Hive:
CREATE TEMPORARY TABLE my_temp_table (
id INT,
name STRING,
age INT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',';
In the above example, we create a temporary table called my_temp_table
with three columns: id
, name
, and age
. The ROW FORMAT DELIMITED
and FIELDS TERMINATED BY ','
statements specify that the data is stored in text format and is delimited by a comma.
Conclusion
In this article, we discussed the different types of tables in Hive. Managed tables are tables where the data and metadata are managed by Hive, external tables are tables where the data is managed outside of Hive, and temporary tables are tables that are created for a specific session and are deleted at the end of the session. Managed tables are the default table type in Hive and are used to store data in a structured format for efficient querying. External tables allow for sharing of data between different Hive tables or other systems and provide more flexibility for managing data. Temporary tables can be used to hold intermediate results during a complex data processing job and do not persist beyond the session. Hive's support for different table types allows for flexible and powerful data processing capabilities in data warehousing and big data analysis.