Druid is amazingly fast and has built-in connectors for most of the popular datasources .
It supports variety of dashboards which makes druid a perfect choice for any Real Time Streaming Application . Review collected by and hosted on G2.com.
Druid natively queries in Json format which is hard to pick up for a SQL user.
Rollover queries are not dynamic . Example - If you want to roll up for a specific time of one day to a specific time of another day , that might not be possible .
Web GUI is also not so user friendly for a business user .
Missing operations friendly cluster manager console.
Druid needs a dedicated server and cannot utilise existing Hadoop resources. Review collected by and hosted on G2.com.
Druid is best for low latency analytics, as it combines the best qualities of a column store and inverted indexing. With column stores, the druid can minimize I/O costs for analytical queries.
It supports OLTP and OLAP.
Real-Time Aggregation.
Batch & Real-Time Ingestion Review collected by and hosted on G2.com.
1. No fault-tolerance on the query execution path. ex: A single query be processed on hundreds of historical nodes — it completely lacks any fault-tolerance on the query execution path.
2. Straggling sub-queries on the historical nodes takes a lot of time.
3. Back filling takes lot of time. But its understandable as to update old segment and update it takes lot of time. I wouldn't consider it as a drawback.
4. As Druid Brokers need to keep the view of the whole cluster in memory , it require significantly more memory and also cause lot lot JVM GC pause.
5. In case of large queries, it saturate the processing capacity of the entire historical layer for up to tens of seconds. Review collected by and hosted on G2.com.
The community behind Druid and its docs are great. The scale at which Druid can ingest and query data is impressive. Review collected by and hosted on G2.com.
Only recent versions have support for joins between data sources. Some log messages could be more verbose. Review collected by and hosted on G2.com.
It excellently supports horizontal scalability, The deep storage functionality improves data resilience and makes it easy to add a new node. Since the data is partitioned by time out of the box, time-based queries perform exceedingly well. It can ingest a large amount of data very quickly. It has multiple plugins to suffice your need and it can integrate with many cloud infrastructure out of the box. Review collected by and hosted on G2.com.
Need to provide better features to accommodate multi-tenants. Updates to existing data are currently supported by rebuilding the corresponding time segment entirely from the true source, Instead, it should support tenant id based updates. Same-day updates are a little bit tricky and need to iron it out.
One of the places we use it to calculate demographic-based suppression of data and it is slow in that particular scenario. Review collected by and hosted on G2.com.
Apache Druid works very well if you need basic aggregations across immutable time series data. It has some really useful approximations such as HyperLogLog for fast cardinality estimations that converge to exact counts for small datasets. It also now supports Druid Sql as a query language which doesn't have the steep learning curve native Druid query language requires. Review collected by and hosted on G2.com.
Apache Druid becomes hard to use and very inefficient when your data is 1) updated 2) ingested out of order (based on timestamp) or 3) requires joins. Unfortunately this greatly limits the number of use-cases that Druid readily supports. Tooling can be built around it to support things like out of order ingestion but it makes Druid very inefficient.
Druid also has inherent bottlenecks in its design: each cluster can have only one coordinator and one overlord. We found that this made it impossible to scale a single cluster out to meet our needs. Review collected by and hosted on G2.com.
Real-time ingestion and querying capability
Sub-second query performance
Time Series based datastore
Slice N Dice support
Data Compression Review collected by and hosted on G2.com.
Inability to support nested data
Partial Join Support
Setup to bring it up for the first time Review collected by and hosted on G2.com.
it is Column oriented and open source distributed data store .it is awesome in ingesting massive amount of even driven data and provide low latency queries on the data Review collected by and hosted on G2.com.
limitations with auto scaling(scale up & scale down of the druid servers on the basis of demand ). Review collected by and hosted on G2.com.
Druid is very fast to query results and libraries like pydruid help increase the usability Review collected by and hosted on G2.com.
The errors are not very intuitive for instance if more than one dimensions have high cardinality and the query times out, error do not hint the same! Review collected by and hosted on G2.com.
Horizontal scalable
Support of Druid Kafka indexer task to ingest data directly from Kafka
Support for schema less datasource Review collected by and hosted on G2.com.
Once metadata is corrupted then it's very difficult to recover. Review collected by and hosted on G2.com.
I have hoped on using Druid very early in the day, using it from early 2018 , the potential it unlocks with all the easy to use and inbuilt capabilities of looking at different analytics perspectives is amazing. All the options of flexible filters, approximate algorithms, exact calculations etc makes our life lot simpler. Review collected by and hosted on G2.com.
Due to the initial days, we had our challenges in working with Druid ,but is fast evolving and enabling so much more new functionally Review collected by and hosted on G2.com.
easy integration with existing framework , good fit for realtime analytics which need to be performant Review collected by and hosted on G2.com.
The major drawback of this solution is that with commodity deep storage (Amazon S3) and network, it would make the majority of queries in our use case run for 10 of seconds, instead of current 0 — 3 seconds. I think decoupling of storage and compute is the future including time series databases. Review collected by and hosted on G2.com.
- easy integration with other 3rd party opensource and proprietary software
- easy to setup and maintain
- good community Review collected by and hosted on G2.com.
- sometimes data needs to be reindexed if its too large.
- provides approx numbers, and sometimes exact counts are required. However, this is by design. Review collected by and hosted on G2.com.
The ability to power realtime dashboards well, and ofcourse that its open source (so I can skip messy accounting approvals) Review collected by and hosted on G2.com.
Sometimes filtering using HiveQL can cause bugs and unexpected errors to pop up. I have also heard of indexing issues which sometimes occur. Review collected by and hosted on G2.com.
Out-of-the-box integration with Kafka, AWS S3, HDFS. Data visibility is quite instantaneous. Review collected by and hosted on G2.com.
The ability to modify its configuration could cause a serious threat to the security.
Creation of personalized protocol also would mean that new bugs will be created. So we will need more debuggers. Review collected by and hosted on G2.com.
Low latency querying, ease of loading data and retrieving data Review collected by and hosted on G2.com.
Inefficiency in bulk data extraction, would love to use spark or other big data tools for bulk data extraction and processing from spark Review collected by and hosted on G2.com.