Dear Emmanuel,
Many Thanks for the kind comments about TIMi!
I particularly like your last sentence when you compare the speed of Databricks (Scala code) versus TIMi. This type of numeric and objective comparison is useful for many readers.
May i add to your review that you wrote me (inside a private email) that that you did try during 3 months with Databricks to get the results that you actually got in 18 hours with Anatella/TIMi? ...And, from what I understood, these 18 hours do actually include the time required to install TIMi/Anatella on your PC, learn Anatella, design the different Anatella flows and run them!
Also, in your review, this is not really a fair comparison because we are comparing TIMi running on a simple PC versus Scala running on the whole Databricks in Azure cluster from the whole "National Bank of Belgium" (and that's a big cluster running since a few years already). On that note, I think that would be interesting to know the RAM memory consumption of each tool!? My guess is: Around 4GB RAM for TIMi and 200GB on each node for Databricks! 😁😁
Also, if you are interested about more in-depth speed comparison between Databricks and TIMi, you can have a look at this github repository where we ran these 2 tools against a standard academic benchmark (tpc-h). (Spoiler: TIMi running on 1 server is faster than Databricks running on more than 100 nodes):
https://github.com/Kranf99/TPC-H-Benchmarck-Anatella-Spark
I think that your contribution here will help many other open-minded data scientists (like yourself) to better chose the right tool for their job! Many thanks for that from the whole data science community ! 👍👍 You are doing the right thing.
...I won't conclude now that the best tool for any analytical job or any Big Data job is TIMi. ..But it's becoming really damn close! 😄 😜
See how TIMi improved