What do you like best?
1) Speed- Hbase helps in running map reduce programs which can load peta bytes of data to Hbase in parallel which speeds our work.
2) NoSQL Database- As Hbase is a noSQL database,we can store data in terms of key-value pairs in list of column families,this helps in quering faster and can have multiple values in a column
3) Storage- Hbase can store both metadata as well as content which reduces multiple resources in use.
4) Oozie Scheduler- Hbase has its own built in oozie scheduler which can schedule jobs using specified cron expression,Oozie has its own UI where in you can see the status of the job and refer to logs as well in case of failures which is very helpful.
4) Hadoop eco system also has distributed file sytem which partitions data among different data nodes which helps in storing huge files. I have used to store hfiles in HDFS and finally load them to Hbase using map reduce program in java.
5)Phoenix which supports sql on Hbase, We are using this to query Hbase which is fastens the scan of tables.
6) we are also using checksum to check for whether the data entered the base is corrupted or not which helps in tracking and reconciliation of data.
What do you dislike?
1) As Hbase is a noSQL database,quering with joins is not possible.
2) When we map phoenix with Hbase table, we are unable to create a column other than in Hbase and update it, Only vice versa is possible.
3) The restriction of Oozie scheduler which can be run only by yarn users and can pick files only from HDFS.
Recommendations to others considering the product:
1) Hbase- NoSql database which helps in storage of big data and even helps in quering.
2) Supports parallel programming using map reduce which fastens our work.
3) Phoenix runs SQL on Hbase which scans faster.
4) Has built in scheduler-oozie
What problems are you solving with the product? What benefits have you realized?
1) Hbase has helped in loading peta bytes of data using map reduce programs in java .We were able to load metadata as well as content to Hbase.
2) We are able to apply retention policy on records by scanning multiple tables using phoenix.
3) We have used Kerborisation to enhance the security of our modules in Hadoop eco system as well.