What do you like best?
Hive provides an ease to the user who wants to store bulk data, in a tabular manner.
It works on the same queries like SQL, making it easy for using the traditional database system.
Because of this reason, people need not have to study some new language and can still adapt to the Big Data Culture.
Also it has features like partition, and bucketing, helping in segregation of data.
Data can directly be loaded into hive, by HDFS, using the CSV files of the same format, or from Hbase by making a pointer to the Hbase table, providing a link within Hadoop.
What do you dislike?
For small amount of data also, it runs map reduce job, which consumes some time, and thus is not efficient for the same.
We do not have a concept of primary key in Hive, so we can have redundant entries.
Also till the older version, update and delete were not possible, and now also in the new version, if we want to use the update and delete commands, the performance of the tool gets degraded.
Recommendations to others considering the product:
For storing bulk amount of data in a tabular manner, and where there's no need need of primary key, or just in case, if redundant data is received, it will not cause a problem.
What problems are you solving with the product? What benefits have you realized?
We are using Hive for storing logs, of data, being generated, in our business.
Further we will be using these logs for reconciliation purpose, helping in keeping a track of data.