# Best Machine Learning Data Catalog Software

  *By [Shalaka Joshi](https://research.g2.com/insights/author/shalaka-joshi)*

   Machine learning data catalogs allow companies to categorize, access, interpret, and collaborate around company data across multiple data sources, while maintaining a high level of governance and access management. Artificial intelligence is key to many features of machine learning data catalogs, enabling functionality such as machine learning recommendations, natural language querying, and dynamic data masking for enhanced security purposes.

Companies can utilize machine learning data catalogs to maintain data sets in a single location so that searching for and discovering data is simple for everyday business users and analysts alike. Users have the ability to comment on, share, and recommend data sets so colleagues can have an immediate understanding of what they are querying. Additionally, IT administrators can put into place user provisioning to ensure unauthorized employees are not accessing sensitive data.

Machine learning data catalogs are most frequently implemented by companies that have multiple data sources, are searching for one source of truth, and are attempting to scale data usage company-wide. These products are generally administered by IT departments, who can maintain organization and security, but data can be accessed by data scientists or analysts and the average business user. The data can then be transformed, modeled, and visualized either directly in the machine learning data catalog or through an integration with [business intelligence software](https://www.g2.com/categories/business-intelligence).

It should be noted that not all machine learning data catalogs provide data preparation capabilities and may require an integration with a [business intelligence platform](https://www.g2.com/categories/business-intelligence-platforms). Additionally, these tools differ from [master data management software](https://www.g2.com/categories/master-data-management-mdm) due to their enhanced governance, collaboration, and machine learning functionality.

To qualify for inclusion in the Machine Learning Data Catalog category, a product must:

- Organize and consolidate data from all company sources in a single repository
- Provide user access management for security and data governance purposes
- Allow business users to search and access the data from within the catalog
- Offer collaboration features around data sets, including categorizing, commenting, and sharing
- Give intelligent recommendations based on machine learning for quicker access to relevant data 





## Category Overview

**Total Products under this Category:** 89


## Trust & Credibility Stats

**Why You Can Trust G2's Software Rankings:**

- 30 Analysts and Data Experts
- 1,700+ Authentic Reviews
- 89+ Products
- Unbiased Rankings

G2's software rankings are built on verified user reviews, rigorous moderation, and a consistent research methodology maintained by a team of analysts and data experts. Each product is measured using the same transparent criteria, with no paid placement or vendor influence. While reviews reflect real user experiences, which can be subjective, they offer valuable insight into how software performs in the hands of professionals. Together, these inputs power the G2 Score, a standardized way to compare tools within every category.


## Best Machine Learning Data Catalog Software At A Glance

- **Leader:** [Atlan](https://www.g2.com/products/atlan/reviews)
- **Highest Performer:** [decube](https://www.g2.com/products/decube/reviews)
- **Easiest to Use:** [AWS Glue](https://www.g2.com/products/aws-glue/reviews)
- **Top Trending:** [Atlan](https://www.g2.com/products/atlan/reviews)
- **Best Free Software:** [Alation](https://www.g2.com/products/alation/reviews)


---

**Sponsored**

### QuerySurge

QuerySurge is an enterprise-grade data quality platform that leverages AI to continuously automate data validation across your entire ecosystem ‐ from data warehouses and big data lakes to BI reports and enterprise applications. With AI-powered test creation, scalable architecture, and the leading DevOps for Data CI/CD integration, QuerySurge ensures data integrity at every stage of the pipeline. Automated Data Validation Use Cases: QuerySurge provides a smart, AI-driven, data validation &amp; ETL testing solution for your automated testing needs. - Data Warehouse / ETL Testing - DevOps for Data / Continuous Testing - Data Migration Testing - Business Intelligence (BI) Report Testing - Big Data Testing - Enterprise Application Data Testing What QuerySurge Provides: - Automation of your manual data validation and testing process - Ease-of-use, low-code/no-code features - Generative AI capabilities for test creation - Testing across 200+ data platforms - Integration into your CI/CD DataOps pipeline - Acceleration of your data analysis - Ensurance of regulatory compliance Key Features: - Data Connection Wizard provides an easy way to link to your data stores - Visual Query Wizard builds table-to-table and column-to-column tests without writing SQL - Generative AI module automatically creates transformation tests in bulk - DevOps for Data provides a RESTful API with 110+ calls and Swagger documentation and integrates into CI/CD pipelines - Create Custom Tests and modularize functions with snippets, set thresholds, stage data, check data types &amp; duplicate rows, full text search, and asset tagging - Schedule tests to run immediately, at a predetermined date &amp; time, or after any event from a build/release, CI/CD, DevOps, or test management solution - Multi-project support in a single instance, new Global Admin user, assign users and agents, import and export projects, and user activity log reports - Webhooks provide real-time integrations with DevOps, CI/CD, test management, and alerting tools - Ready-for-Analytics provides seamless integration with QuerySurge and your BI tool or open-source Metabase to create custom reports and dashboards and gain deeper, real-time insights into your data validation and ETL testing workflows - Data Analytics Dashboards and Data Intelligence Reports track, analyze, and communicate data quality



[Visit website](https://www.g2.com/external_clickthroughs/record?secure%5Bad_program%5D=ppc&amp;secure%5Bad_slot%5D=category_product_list&amp;secure%5Bcategory_id%5D=1383&amp;secure%5Bdisplayable_resource_id%5D=108&amp;secure%5Bdisplayable_resource_type%5D=Category&amp;secure%5Bmedium%5D=sponsored&amp;secure%5Bplacement_reason%5D=neighbor_category&amp;secure%5Bplacement_resource_ids%5D%5B%5D=2686&amp;secure%5Bprioritized%5D=false&amp;secure%5Bproduct_id%5D=54942&amp;secure%5Bresource_id%5D=1383&amp;secure%5Bresource_type%5D=Category&amp;secure%5Bsource_type%5D=category_page&amp;secure%5Bsource_url%5D=https%3A%2F%2Fwww.g2.com%2Fcategories%2Fmachine-learning-data-catalog&amp;secure%5Btoken%5D=1eda949fae7ddb0070a1190eb3d88b8a1a39ffbd86f8e8991caae7f4fa9b8c14&amp;secure%5Burl%5D=https%3A%2F%2Fwww.querysurge.com%2Fget-started%2Fprivate-demo%3Futm_source%3DG2%26utm_medium%3Dcpc%26utm_campaign%3DG2-reviews&amp;secure%5Burl_type%5D=book_demo&amp;secure%5Bvisitor_segment%5D=180)

---

## Top-Rated Products (Ranked by G2 Score)
### 1. [Atlan](https://www.g2.com/products/atlan/reviews)
  Atlan is the context layer for enterprise AI. It continuously reads your warehouses, databases, pipelines, BI tools, and business systems to reverse construct an enterprise data graph that captures assets, lineage, entities, metrics, policies, and relationships. On top of that graph, it enriches and curates machine-readable semantics — descriptions, popular joins, KPI and metric definitions, ontologies, and business rules — and organizes them into governed, versioned context repos: bounded bundles of context that reflect how your company defines key concepts and makes decisions. These context repos are then exposed through open interfaces (SQL, APIs, SDKs, OSI/MCP-style protocols) so that agents, copilots, and AI applications can call the same trusted context in real time, rather than each team hard-coding its own logic. Human-on-the-loop governance workflows for conflict resolution, deprecation, feedback, and certification keep that context trustworthy as the business, data, and models evolve.


  **Average Rating:** 4.5/5.0
  **Total Reviews:** 125

**User Satisfaction Scores:**

- **Ease of Use:** 9.0/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 9.1/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.3/10 (Category avg: 8.4/10)
- **Data Lineage:** 9.3/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Atlan](https://www.g2.com/sellers/atlan)
- **Year Founded:** 2019
- **HQ Location:** New York, US
- **Twitter:** @AtlanHQ (9,720 Twitter followers)
- **LinkedIn® Page:** https://in.linkedin.com/company/atlan-hq (580 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Financial Services, Computer Software
  - **Company Size:** 53% Mid-Market, 40% Enterprise


#### Pros & Cons

**Pros:**

- Ease of Use (18 reviews)
- User Interface (12 reviews)
- Features (11 reviews)
- Data Lineage (10 reviews)
- Easy Setup (10 reviews)

**Cons:**

- Learning Curve (5 reviews)
- Limited Functionality (5 reviews)
- User Interface Issues (5 reviews)
- Difficult Learning (4 reviews)
- Integration Issues (4 reviews)

### 2. [Alation](https://www.g2.com/products/alation/reviews)
  Alation is the data intelligence company. Founded in 2012 and headquartered in Redwood City, California—with global offices in London and Sydney—Alation serves more than 650 enterprise customers across 34 industries. The company pioneered the modern data catalog by combining machine learning with human insight to connect people with questions to people with answers. Today, more than 40% of the Fortune 100 rely on Alation to power data and AI initiatives at scale. Alation’s platform unifies cataloging, governance, and data quality with new AI-native capabilities built on one essential foundation: metadata. Metadata provides the context AI models lack, delivering outputs that are accurate, explainable, and trustworthy. With capabilities like Agent Studio, CDE Manager, and Data Quality Agent, organizations can build agents that understand their unique definitions, rules, and quality standards. Embedded readiness checks and continuous evaluation ensure every AI workflow is grounded in the right metadata context, making enterprise AI reliable enough for real production use.


  **Average Rating:** 4.4/5.0
  **Total Reviews:** 89

**User Satisfaction Scores:**

- **Ease of Use:** 8.3/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.7/10 (Category avg: 8.5/10)
- **Metadata Management :** 7.9/10 (Category avg: 8.4/10)
- **Data Lineage:** 7.2/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Alation](https://www.g2.com/sellers/alation)
- **Company Website:** https://alation.com
- **Year Founded:** 2012
- **HQ Location:** Redwood City, CA
- **Twitter:** @Alation (3,573 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/3231829/ (624 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Information Technology and Services, Financial Services
  - **Company Size:** 57% Enterprise, 27% Mid-Market


#### Pros & Cons

**Pros:**

- Ease of Use (16 reviews)
- Data Discovery (10 reviews)
- User Experience (10 reviews)
- Data Cataloging (9 reviews)
- User Interface (9 reviews)

**Cons:**

- Slow Performance (8 reviews)
- Missing Features (6 reviews)
- Limited Functionality (4 reviews)
- Lineage Limitations (4 reviews)
- User Interface Issues (4 reviews)

### 3. [AWS Glue](https://www.g2.com/products/aws-glue/reviews)
  AWS Glue is a serverless data integration service that makes it easier for analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and application develop-ment. You can discover and connect to 70+ diverse data sources, manage your data in a centralized data catalog, and visually create, run, and monitor ETL pipelines to load data into your data lakes. You can im-mediately search and query catalogued data using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.


  **Average Rating:** 4.3/5.0
  **Total Reviews:** 191

**User Satisfaction Scores:**

- **Ease of Use:** 8.4/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.9/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.6/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.7/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Amazon Web Services (AWS)](https://www.g2.com/sellers/amazon-web-services-aws-3e93cc28-2e9b-4961-b258-c6ce0feec7dd)
- **Year Founded:** 2006
- **HQ Location:** Seattle, WA
- **Twitter:** @awscloud (2,223,984 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/amazon-web-services/ (156,424 employees on LinkedIn®)
- **Ownership:** NASDAQ: AMZN

**Reviewer Demographics:**
  - **Who Uses This:** Data Engineer, Software Engineer
  - **Top Industries:** Information Technology and Services, Computer Software
  - **Company Size:** 48% Enterprise, 29% Mid-Market


#### Pros & Cons

**Pros:**

- Ease of Use (6 reviews)
- Data Integration (3 reviews)
- ETL Solutions (3 reviews)
- Features (3 reviews)
- Simple (3 reviews)

**Cons:**

- Slow Performance (3 reviews)
- Debugging Difficulty (2 reviews)
- Difficult Debugging (2 reviews)
- Performance Issues (2 reviews)
- Time-Consuming (2 reviews)

### 4. [Google Cloud Data Catalog](https://www.g2.com/products/google-cloud-data-catalog/reviews)
  A fully managed and highly scalable data discovery and metadata management service.


  **Average Rating:** 4.4/5.0
  **Total Reviews:** 25

**User Satisfaction Scores:**

- **Ease of Use:** 8.7/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.5/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.1/10 (Category avg: 8.4/10)
- **Data Lineage:** 7.8/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Google](https://www.g2.com/sellers/google)
- **Year Founded:** 1998
- **HQ Location:** Mountain View, CA
- **Twitter:** @google (31,885,216 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/1441/ (336,169 employees on LinkedIn®)
- **Ownership:** NASDAQ:GOOG

**Reviewer Demographics:**
  - **Top Industries:** Computer Software
  - **Company Size:** 46% Small-Business, 29% Mid-Market


### 5. [Appen](https://www.g2.com/products/appen/reviews)
  Appen collects and labels images, text, speech, audio, video, and other data to create training data used to build and continuously improve the world’s most innovative artificial intelligence systems. We offer a state of the art, licensable data annotation platform to annotate training data use cases in computer vision and natural language processing. Our platform enhances accuracy and efficiency through our Smart Labeling and Pre-Labeling features which use Machine Learning to ease human annotations. You choose the level of service and security you want for data collection and annotation, from white-glove managed service to flexible self-service. Our expertise includes having a global crowd of over 1 million skilled contractors who speak over 235 languages and dialects, in over 70,000 locations and 170 countries, and the industry’s most advanced AI-assisted data annotation platform. Our reliable training data gives leaders in technology, automotive, financial services, retail, healthcare, and governments the confidence to deploy world-class AI products. Founded in 1996, Appen has customers and offices globally.


  **Average Rating:** 4.2/5.0
  **Total Reviews:** 32

**User Satisfaction Scores:**

- **Ease of Use:** 8.2/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.2/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.0/10 (Category avg: 8.4/10)
- **Data Lineage:** 7.8/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Appen](https://www.g2.com/sellers/appen)
- **Year Founded:** 1996
- **HQ Location:** Kirkland, Washington, United States
- **LinkedIn® Page:** https://www.linkedin.com/company/appen (19,630 employees on LinkedIn®)
- **Ownership:** ASX:APX
- **Total Revenue (USD mm):** $244,900

**Reviewer Demographics:**
  - **Top Industries:** Information Technology and Services
  - **Company Size:** 56% Small-Business, 26% Enterprise


#### Pros & Cons

**Pros:**

- Useful (2 reviews)
- Ease of Use (1 reviews)
- Flexibility (1 reviews)

**Cons:**

- Work Interruptions (3 reviews)
- Low Compensation (2 reviews)
- Complexity (1 reviews)
- Connectivity Issues (1 reviews)
- User Interface Issues (1 reviews)

### 6. [decube](https://www.g2.com/products/decube/reviews)
  Decube is a Context Layer platform specifically designed for the AI era, providing organizations with the ability to give their data meaning, memory, and trust. This innovative system integrates various components such as metadata management, automated lineage tracking, data quality assurance, and observability to create a comprehensive real-time map of data dynamics. By understanding how data operates, flows, and its reliability, Decube empowers enterprises to make informed decisions and effectively manage AI workloads. Targeted primarily at enterprises that rely heavily on data-driven decision-making, Decube addresses a critical challenge faced by many organizations: the lack of contextual understanding of their data. In an age where data is abundant, the real issue lies in the ability to interpret and utilize that data effectively. Decube provides a connected understanding of the entire data ecosystem, which helps eliminate blind spots and enhances governance. This contextual awareness is essential for organizations looking to leverage AI technologies and ensure that their models, dashboards, and agents operate with greater intelligence and safety. Key features of Decube include its robust metadata management capabilities, which allow users to track and manage data lineage effortlessly. This feature ensures that organizations can trace the origins and transformations of their data, thereby enhancing transparency and accountability. Additionally, Decube’s focus on data quality means that users can trust the information they are working with, reducing the risk of errors in critical decision-making processes. The observability aspect of the platform further enables organizations to monitor data flows in real-time, ensuring that any issues can be identified and addressed promptly. The benefits of using Decube extend beyond mere data management. By providing a living, interconnected understanding of data, Decube enhances the overall operational confidence of organizations. This platform not only strengthens governance but also facilitates smarter decision-making by ensuring that all data-driven models are built on a foundation of reliable and contextualized information. As businesses increasingly depend on trustworthy data and AI-ready infrastructure, Decube stands out as a vital tool that equips them with the necessary context to navigate the complexities of the modern data landscape.


  **Average Rating:** 4.6/5.0
  **Total Reviews:** 24

**User Satisfaction Scores:**

- **Ease of Use:** 9.4/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 9.7/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.7/10 (Category avg: 8.4/10)
- **Data Lineage:** 9.6/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Decube Data](https://www.g2.com/sellers/decube-data)
- **Company Website:** https://decube.io
- **Year Founded:** 2022
- **HQ Location:** Kuala Lumpur
- **Twitter:** @decube_data (114 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/decube-data/ (44 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Information Technology and Services
  - **Company Size:** 38% Mid-Market, 33% Small-Business


#### Pros & Cons

**Pros:**

- User Interface (8 reviews)
- Ease of Use (7 reviews)
- Features (7 reviews)
- Data Quality (6 reviews)
- Insights (6 reviews)

**Cons:**

- Limited Functionality (3 reviews)
- Complex Setup (2 reviews)
- Limited Features (2 reviews)
- Missing Features (2 reviews)
- Poor Customer Support (2 reviews)

### 7. [Collibra](https://www.g2.com/products/collibra/reviews)
  Try Collibra for free @ Collibra.com/tour Collibra is for organizations with complex data challenges, hybrid data ecosystems—and big ambitions for data and AI. We help organizations who are trying to accelerate data and AI use cases while ensuring compliance, but are struggling with fragmented governance and visibility across the whole hybrid data ecosystem. Collibra unifies governance for data and AI across every system, data source and user—to create safe autonomy and a foundation for scaling AI and data use cases. With Collibra, you can accelerate all your data and AI use cases, safely and with well–understood data. That’s Data Confidence.


  **Average Rating:** 4.2/5.0
  **Total Reviews:** 99

**User Satisfaction Scores:**

- **Ease of Use:** 8.0/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.3/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.0/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.0/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Collibra](https://www.g2.com/sellers/collibra)
- **Company Website:** https://www.collibra.com
- **Year Founded:** 2008
- **HQ Location:** New York, New York
- **Twitter:** @collibra (5,735 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/288365/ (1,082 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Financial Services, Banking
  - **Company Size:** 71% Enterprise, 19% Mid-Market


#### Pros & Cons

**Pros:**

- Features (14 reviews)
- Ease of Use (13 reviews)
- Data Management (12 reviews)
- Data Governance (9 reviews)
- Integrations (9 reviews)

**Cons:**

- Limited Functionality (8 reviews)
- Complexity Issues (7 reviews)
- Complexity (6 reviews)
- Improvement Needed (6 reviews)
- Complex Setup (5 reviews)

### 8. [Cloudera Data Platform](https://www.g2.com/products/cloudera-cloudera-data-platform/reviews)
  At Cloudera, we believe data can make what is impossible today, possible tomorrow. We deliver an enterprise data cloud for any data, anywhere, from the Edge to AI. We enable people to transform vast amounts of complex data into clear and actionable insights to enhance their businesses and exceed their expectations. Cloudera is leading hospitals to better cancer cures, securing financial institutions against fraud and cyber-crime, and helping humans arrive on Mars — and beyond. Powered by the relentless innovation of the open-source community, Cloudera advances digital transformation for the world’s largest enterprises


  **Average Rating:** 4.1/5.0
  **Total Reviews:** 131

**User Satisfaction Scores:**

- **Ease of Use:** 8.3/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.9/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.1/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.8/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Cloudera](https://www.g2.com/sellers/cloudera)
- **Year Founded:** 2008
- **HQ Location:** Palo Alto, CA
- **Twitter:** @cloudera (106,618 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/229433/ (3,387 employees on LinkedIn®)
- **Phone:** 888-789-1488

**Reviewer Demographics:**
  - **Who Uses This:** Data Engineer, Software Engineer
  - **Top Industries:** Information Technology and Services, Computer Software
  - **Company Size:** 42% Enterprise, 32% Small-Business


### 9. [Select Star](https://www.g2.com/products/select-star/reviews)
  Select Star is a modern data governance platform that helps organizations manage and understand their data at scale, enabling AI, analytics, and self-service across the business. It automatically catalogs datasets, traces end-to-end lineage, and builds a shared business glossary and semantic layer, so teams can confidently work with trusted data. With a user-friendly data portal and built-in automation, Select Star supports use cases including data democratization, data governance, semantic layers, and cloud data migrations serving as a foundational layer for enterprise AI and data initiatives.


  **Average Rating:** 4.5/5.0
  **Total Reviews:** 55

**User Satisfaction Scores:**

- **Ease of Use:** 8.9/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.2/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.7/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.9/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Select Star](https://www.g2.com/sellers/select-star)
- **Year Founded:** 2020
- **HQ Location:** San Francisco, CA
- **Twitter:** @selectstarhq (391 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/selectstarhq/ (20 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Information Technology and Services, Real Estate
  - **Company Size:** 51% Mid-Market, 38% Enterprise


#### Pros & Cons

**Pros:**

- Ease of Use (10 reviews)
- Data Lineage (9 reviews)
- User Interface (7 reviews)
- Data Discovery (5 reviews)
- Data Cataloging (4 reviews)

**Cons:**

- Limited Functionality (2 reviews)
- Lineage Limitations (2 reviews)
- Complex Setup (1 reviews)
- Difficult Learning (1 reviews)
- Expertise Required (1 reviews)

### 10. [Secoda](https://www.g2.com/products/secoda/reviews)
  Secoda is an AI-powered data governance platform designed to help organizations explore, understand, and utilize their data effectively. By providing a comprehensive platform that connects to 75+ data sources, pipelines, warehouses, and visualization tools, Secoda aims to create a unified source of truth for businesses. This functionality is particularly valuable for organizations looking to enhance their self-serve analytics, streamline operations, and improve decision-making. Targeted at data teams, business stakeholders, and organizations of all sizes, Secoda serves as an essential tool for those who need to manage and interpret large volumes of data. Its user-friendly interface ensures that individuals with varying levels of technical expertise can leverage the platform to gain actionable insights. Companies such as Vanta, Cardinal Health, ID.me, and Dialpad have adopted Secoda to monitor the health of their data ecosystems, enhance the efficiency of their data teams, and scale AI readiness. One of Secoda’s core advantages is its ability to unify data cataloging, enterprise governance, and observability into a single, streamlined platform. This consolidation not only reduces the overhead of managing multiple tools but also powers Secoda AI with rich, connected context, enabling teams to focus on insights instead of infrastructure. Secoda automates key data management tasks including documentation, tagging, glossary term creation, and policy creation. This automation enables users to quickly discover and access relevant data and insights without extensive manual effort. By streamlining these processes, Secoda not only saves valuable time but also empowers teams to make confident, data-driven decisions based on current, well-organized information, ultimately driving better business outcomes. Overall, Secoda stands out in the data management landscape by offering a comprehensive, AI-driven solution that caters to the needs of both technical and non-technical users. Its ability to create a single source of truth, coupled with its integration of multiple functionalities into one platform, positions it as a valuable asset for organizations aiming to harness the full potential of their data.


  **Average Rating:** 4.5/5.0
  **Total Reviews:** 55

**User Satisfaction Scores:**

- **Ease of Use:** 8.2/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 9.3/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.5/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.9/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Secoda](https://www.g2.com/sellers/secoda)
- **Year Founded:** 2021
- **HQ Location:** Toronto, CA
- **Twitter:** @SecodaHQ (934 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/secodahq/about (21 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Computer Software, Financial Services
  - **Company Size:** 65% Mid-Market, 18% Small-Business


#### Pros & Cons

**Pros:**

- Ease of Use (31 reviews)
- Features (25 reviews)
- Customer Support (21 reviews)
- Data Lineage (19 reviews)
- Integrations (16 reviews)

**Cons:**

- Bug Issues (11 reviews)
- Bugs (11 reviews)
- Technical Issues (9 reviews)
- Learning Curve (5 reviews)
- Missing Features (5 reviews)

### 11. [IBM InfoSphere Information Governance Catalog](https://www.g2.com/products/ibm-infosphere-information-governance-catalog/reviews)
  IBM® Information Governance Catalog is an interactive, web-based tool that allows users to explore, understand and analyze information. Users can create, manage and share a common business language, document and enact policies and rules and track the usage and consumption of data within a lineage report providing trusted information for compliance and insights. Learn More: https://ibm.co/2xmfLsK


  **Average Rating:** 4.0/5.0
  **Total Reviews:** 16

**User Satisfaction Scores:**

- **Ease of Use:** 7.6/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [IBM](https://www.g2.com/sellers/ibm)
- **Year Founded:** 1911
- **HQ Location:** Armonk, NY
- **Twitter:** @IBM (709,023 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/1009/ (324,553 employees on LinkedIn®)
- **Ownership:** SWX:IBM

**Reviewer Demographics:**
  - **Company Size:** 53% Enterprise, 26% Mid-Market


### 12. [Coalesce Catalog (formerly CastorDoc)](https://www.g2.com/products/castor-doc/reviews)
  Coalesce Catalog is a collaborative, automated data discovery &amp; catalog tool. We believe that data people spend way too much time trying to find and understand their data. Coalesce Catalog redesigns how data people collaborate. It provides a single source of truth to reference and document all the knowledge related to data within your company. If you are looking for a table related to your customers, just look for it as you would in Google, and Coalesce Catalog provides you with all the context you will need for your analysis. Inspired by internal tools developed by Uber, Airbnb, Lyft, and Spotify, Coalesce Catalog has developed a plug-and-play solution that deploys in minutes to drive value for companies of all sizes. Discover and catalog your data today with Coalesce Catalog.


  **Average Rating:** 4.7/5.0
  **Total Reviews:** 63

**User Satisfaction Scores:**

- **Ease of Use:** 9.6/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 9.9/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.9/10 (Category avg: 8.4/10)
- **Data Lineage:** 9.9/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Coalesce](https://www.g2.com/sellers/coalesce)
- **Company Website:** https://coalesce.io/
- **Year Founded:** 2020
- **HQ Location:** San Francisco, CA
- **LinkedIn® Page:** https://www.linkedin.com/company/coalesceio/ (127 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Information Technology and Services, Financial Services
  - **Company Size:** 59% Mid-Market, 27% Enterprise


#### Pros & Cons

**Pros:**

- Ease of Use (3 reviews)
- Collaboration (2 reviews)
- Connectivity (2 reviews)
- Data Lineage (2 reviews)
- Useful (2 reviews)

**Cons:**

- Connector Issues (1 reviews)
- Integration Issues (1 reviews)
- Limitations (1 reviews)

### 13. [data.world](https://www.g2.com/products/data-world/reviews)
  data.world is the most-adopted data catalog and governance platform on the market. Built on a unique knowledge graph foundation, data.world seamlessly integrates with your existing systems. We set the standard for swift, people-centric governance. We don&#39;t just manage data; we unlock its potential, paving the way for responsible AI adoption and data-driven decision-making at scale. data.world is a Certified B Corporation and public benefit corporation and home to the world’s largest collaborative open data community with more than two million members, including ninety percent of the Fortune 500.


  **Average Rating:** 4.2/5.0
  **Total Reviews:** 12

**User Satisfaction Scores:**

- **Ease of Use:** 8.8/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 9.2/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.8/10 (Category avg: 8.4/10)
- **Data Lineage:** 9.3/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [data.world](https://www.g2.com/sellers/data-world)
- **Year Founded:** 2016
- **HQ Location:** Austin, Texas
- **Twitter:** @datadotworld (5,515 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/data.world/ (107 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Company Size:** 67% Small-Business, 25% Mid-Market


#### Pros & Cons

**Pros:**

- Analytics (1 reviews)
- Data Discovery (1 reviews)
- Data Management (1 reviews)
- Data Visualization (1 reviews)
- Ease of Use (1 reviews)

**Cons:**

- Poor Customer Support (1 reviews)
- Poor Support Services (1 reviews)

### 14. [Sifflet](https://www.g2.com/products/sifflet/reviews)
  About Sifflet Sifflet is a business-aware data observability platform that moves data teams from reactive firefighting to proactive decision intelligence. Powered by an intelligent system of AI agents—Sentinel, Sage, and Forge—Sifflet autonomously detects anomalies, diagnoses root causes, and suggests code resolutions. By enriching technical alerts with full-stack lineage and downstream business usage, Sifflet allows data engineers and leaders to prioritize incidents based on business risk rather than technical severity. Trusted by industry leaders like Carrefour or Penguin Random House, Sifflet bridges the gap between data quality and business impact, ensuring your data is always safe for executive decisions and AI consumption. Learn more at siffletdata.com.


  **Average Rating:** 4.4/5.0
  **Total Reviews:** 45

**User Satisfaction Scores:**

- **Ease of Use:** 8.5/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.3/10 (Category avg: 8.5/10)


**Seller Details:**

- **Seller:** [Sifflet](https://www.g2.com/sellers/sifflet)
- **Company Website:** https://www.siffletdata.com/
- **Year Founded:** 2021
- **HQ Location:** Paris, Ile-de-France
- **Twitter:** @Siffletdata (393 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/sifflet/ (48 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Computer Software
  - **Company Size:** 78% Mid-Market, 24% Enterprise


#### Pros & Cons

**Pros:**

- Efficiency Improvement (37 reviews)
- Ease of Use (36 reviews)
- Monitoring (36 reviews)
- Data Lineage (32 reviews)
- Alerting System (31 reviews)

**Cons:**

- Limited Customization (17 reviews)
- Complex Setup (11 reviews)
- Alert Management (10 reviews)
- Limited Integration (10 reviews)
- Lineage Issues (10 reviews)

### 15. [IBM watsonx.data intelligence](https://www.g2.com/products/ibm-watsonx-data-intelligence/reviews)
  IBM watsonx.data intelligence revolutionizes the way organizations curate, manage, and utilize data by leveraging the power of AI to simplify data delivery across hybrid ecosystems. IBM watsonx.data intelligence is a comprehensive solution that integrates capabilities such as data governance (formerly IBM Knowledge Catalog), data lineage (formerly IBM Manta Data Lineage), data sharing, and data quality management. It empowers organizations to discover, trust, and access meaningful data, providing consumers with reliable data products. Explore Demo Library - https://www.ibm.com/products/watsonx-data-intelligence/demo-library Start your free trial - https://dataplatform.cloud.ibm.com/registration/stepone?context=df&amp;apps=all&amp;uucid=1227cc9e37cb9292&amp;preselect\_region=true


  **Average Rating:** 4.2/5.0
  **Total Reviews:** 24

**User Satisfaction Scores:**

- **Ease of Use:** 8.4/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 7.5/10 (Category avg: 8.5/10)
- **Metadata Management :** 7.5/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.3/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [IBM](https://www.g2.com/sellers/ibm)
- **Year Founded:** 1911
- **HQ Location:** Armonk, NY
- **Twitter:** @IBM (709,023 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/1009/ (324,553 employees on LinkedIn®)
- **Ownership:** SWX:IBM

**Reviewer Demographics:**
  - **Company Size:** 38% Small-Business, 34% Enterprise


#### Pros & Cons

**Pros:**

- Automation (3 reviews)
- Data Lineage (3 reviews)
- Data Quality (2 reviews)
- Ease of Use (2 reviews)
- Efficiency (2 reviews)

**Cons:**

- Complex Implementation (3 reviews)
- Complexity (2 reviews)
- Expensive (2 reviews)
- Expertise Required (2 reviews)
- Extra Costs (2 reviews)

### 16. [Oracle Enterprise Metadata Management](https://www.g2.com/products/oracle-enterprise-metadata-management/reviews)
  Oracle Enterprise Metadata Management (OEMM) is a comprehensive metadata management platform. OEMM can harvest and catalog metadata from virtually any metadata provider, including relational, Hadoop, ETL, BI, data modeling, and many more.


  **Average Rating:** 3.7/5.0
  **Total Reviews:** 16

**User Satisfaction Scores:**

- **Ease of Use:** 5.6/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 5.7/10 (Category avg: 8.5/10)
- **Metadata Management :** 6.0/10 (Category avg: 8.4/10)
- **Data Lineage:** 5.7/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Oracle](https://www.g2.com/sellers/oracle)
- **Year Founded:** 1977
- **HQ Location:** Austin, TX
- **Twitter:** @Oracle (827,310 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/1028/ (199,301 employees on LinkedIn®)
- **Ownership:** NYSE:ORCL

**Reviewer Demographics:**
  - **Company Size:** 44% Enterprise, 38% Small-Business


### 17. [Common Voice dataset](https://www.g2.com/products/common-voice-dataset/reviews)
  Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 1,368 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines. The dataset currently consists of 1,087 validated hours in 18 languages, but we&#39;re always adding more voices and languages.


  **Average Rating:** 4.5/5.0
  **Total Reviews:** 11

**User Satisfaction Scores:**

- **Ease of Use:** 8.2/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 6.8/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.2/10 (Category avg: 8.4/10)
- **Data Lineage:** 6.8/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Mozilla](https://www.g2.com/sellers/mozilla)
- **Year Founded:** 2005
- **HQ Location:** San Francisco, CA
- **Twitter:** @mozilla (262,146 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/13948/ (1,749 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Company Size:** 64% Small-Business, 27% Mid-Market


### 18. [Informatica Enterprise Data Catalog](https://www.g2.com/products/informatica-enterprise-data-catalog/reviews)
  A machine-learning-based data catalog that allows to classify and organize data assets across cloud, on-premises, and big data. It provides maximum value and reuse of data across enterprise.


  **Average Rating:** 4.3/5.0
  **Total Reviews:** 19

**User Satisfaction Scores:**

- **Ease of Use:** 7.8/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 7.7/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.0/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.3/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Informatica](https://www.g2.com/sellers/informatica)
- **Year Founded:** 1993
- **HQ Location:** Redwood City, CA
- **Twitter:** @Informatica (99,880 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/3858/ (5,337 employees on LinkedIn®)
- **Ownership:** NYSE: INFA

**Reviewer Demographics:**
  - **Top Industries:** Information Technology and Services, Computer Software
  - **Company Size:** 53% Enterprise, 26% Mid-Market


### 19. [Coginiti](https://www.g2.com/products/coginiti/reviews)
  Coginiti is a SQL-first collaborative data operations platform that empowers teams to build, publish, and consume quality data products, streamlining the data analytics lifecycle from inception to insights. Integrating with the widest variety of data platforms and tools, Coginiti enables analysts, engineers, and data scientists to collaborate in real-time, breaking down silos and fostering innovation. Its intuitive interface simplifies managing complex data workflows, ensuring governance and consistency across projects. Key Features: - Realtime Collaboration - Flexible Data Modeling - Data Quality Testing - Visualize Data Lineage - Native Scheduling - Powerful APIs - AI Assistant Coginiti facilitates a seamless transition from data preparation to actionable intelligence. It’s not just about refining your data strategy or scaling your analytics capabilities; it’s about empowering your organization to harness the full potential of data for informed decision-making. Discover the power of Coginiti and transform your data operations. Coginiti offers products for individual analysts, data teams, and enterprises.


  **Average Rating:** 4.5/5.0
  **Total Reviews:** 29

**User Satisfaction Scores:**

- **Ease of Use:** 9.4/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.9/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.8/10 (Category avg: 8.4/10)
- **Data Lineage:** 8.7/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Coginiti Corp](https://www.g2.com/sellers/coginiti-corp)
- **Year Founded:** 2020
- **HQ Location:** Atlanta , GA
- **Twitter:** @coginiti (70 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/coginiti (33 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Company Size:** 66% Enterprise, 28% Mid-Market


### 20. [BMC AMI Data](https://www.g2.com/products/bmc-ami-data/reviews)
  BMC AMI Data is a portfolio of intelligent data management and performance optimization solutions for IBM Z environments. It helps enterprises optimize, protect, and modernize mission-critical mainframe data, including Db2, IMS, and VSAM, while reducing cost, risk, and operational complexity. The solution automates data maintenance, analyzes system behavior, and provides predictive insights to reduce CPU usage, minimize operational risk, and keep critical workloads running without disruption. By modernizing how mainframe data is managed, BMC AMI Data enables enterprises to control data growth, optimize costs, and support high-volume, always-on business applications.


  **Average Rating:** 4.3/5.0
  **Total Reviews:** 24

**User Satisfaction Scores:**

- **Ease of Use:** 8.2/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [BMC Software](https://www.g2.com/sellers/bmc-software)
- **Company Website:** https://www.bmc.com
- **Year Founded:** 1980
- **HQ Location:** Houston, TX
- **Twitter:** @BMCSoftware (48,048 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/1597/ (9,008 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Computer Software
  - **Company Size:** 50% Small-Business, 25% Mid-Market


#### Pros & Cons

**Pros:**

- Analytics (1 reviews)
- Automation (1 reviews)
- Ease of Use (1 reviews)
- Easy Integrations (1 reviews)
- Features (1 reviews)

**Cons:**

- Expensive (1 reviews)
- Installation Difficulty (1 reviews)
- Learning Curve (1 reviews)
- Limited Compatibility (1 reviews)
- Limited Customization (1 reviews)

### 21. [DataHub](https://www.g2.com/products/datahub/reviews)
  DataHub is an event-driven AI and Data Context Platform designed to unify discovery, governance, and observability across an organization’s entire data estate. Unlike traditional data catalogs, DataHub Cloud offers real-time updates, automatic policy enforcement, and seamless integration with over 100 data sources. This ensures that organizations can maintain data quality, compliance, and AI-readiness at scale, addressing the complexities of modern data management. Targeted at data teams, governance professionals, and AI practitioners, DataHub serves a diverse audience that includes data engineers, analysts, data stewards, and compliance officers. The platform is particularly beneficial for organizations that require a centralized source of truth for all metadata across various environments, such as data warehouses, lakes, business intelligence platforms, machine learning systems, and AI agents. By consolidating data management processes, DataHub enhances collaboration and efficiency within data teams, enabling them to work more effectively. One of the standout features of DataHub is its automated data lineage tracking, which operates down to the column level. This capability allows teams to quickly assess the impact of any upstream changes, facilitating faster debugging of quality issues and helping to avert costly incidents before they escalate to production. Additionally, the platform employs AI-powered functionalities to manage repetitive tasks associated with metadata, such as documentation generation, intelligent glossary classification, and sensitive data tagging. This automation empowers data professionals to concentrate on higher-value activities, thereby increasing overall productivity. For data governance and compliance teams, DataHub offers robust tools for continuous policy enforcement, role-based access controls, and personally identifiable information (PII) detection. The platform is designed to support regulatory standards such as GDPR, HIPAA, and PCI, all while minimizing manual oversight. This ensures that organizations can maintain compliance without the burden of extensive manual processes. Furthermore, for AI and ML teams, DataHub provides the reliable data context essential for developing trustworthy AI agents and models, fostering innovation and improving outcomes. With backing from prominent investors like Bessemer Venture Partners, LinkedIn, and 8VC, DataHub has gained the trust of leading organizations, including Netflix, Visa, Slack, and Pinterest. This widespread adoption underscores the platform&#39;s effectiveness in transforming data operations and enhancing the overall data management landscape. For more information, visit datahub.com.


  **Average Rating:** 4.4/5.0
  **Total Reviews:** 8

**User Satisfaction Scores:**

- **Ease of Use:** 8.5/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [DataHub](https://www.g2.com/sellers/datahub)
- **Company Website:** https://datahub.com/
- **Year Founded:** 2013
- **HQ Location:** Palo Alto, California
- **Twitter:** @DataHubCloud (676 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/datahub-cloud/ (18 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Company Size:** 63% Mid-Market, 25% Enterprise


#### Pros & Cons

**Pros:**

- Ease of Use (3 reviews)
- Connectivity (2 reviews)
- Open Source (2 reviews)
- Accuracy (1 reviews)
- Affordable (1 reviews)

**Cons:**

- Integration Issues (2 reviews)
- Dependency Issues (1 reviews)
- Difficult Interface (1 reviews)
- Lack of Features (1 reviews)
- Large Data Management (1 reviews)

### 22. [ServiceNow Workflow Data Fabric](https://www.g2.com/products/servicenow-workflow-data-fabric/reviews)
  Workflow Data Fabric is the AI‑ready data foundation of the ServiceNow AI Platform. It connects to any data—structured, unstructured, and streaming—contextualizes it with business meaning and governance, and controls it with lineage and policies so employees and AI agents can confidently act on real‑time information to prevent disruptions, resolve requests faster, and optimize operations—all on one platform. How Workflow Data Fabric turns data into instant action Connect Unify data from systems like Salesforce, SAP, Workday, data lakes, and event streams in real time without duplication or fragile point‑to‑point integrations. With Zero Copy Connectors, Stream Connect, External Content Connectors, and Integration Hub, WDF simplifies architecture and cuts integration cost and time. Contextualize Give data business meaning and make it trustworthy with an active Data Catalog, embedded governance, and lineage. Use Knowledge Graph to map relationships (e.g., customers, assets, orders) so AI agents and workflows understand context and make accurate decisions in the flow of work. Control Apply policies, permissions, and compliance guards across connected sources so the right people and AI agents access the right data, at the right time, with full auditability and traceability—no more shadow copies or opaque pipelines.


  **Average Rating:** 4.3/5.0
  **Total Reviews:** 103

**User Satisfaction Scores:**

- **Ease of Use:** 8.0/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.3/10 (Category avg: 8.5/10)
- **Metadata Management :** 5.0/10 (Category avg: 8.4/10)
- **Data Lineage:** 7.1/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [ServiceNow](https://www.g2.com/sellers/servicenow)
- **Company Website:** https://www.servicenow.com/
- **Year Founded:** 2004
- **HQ Location:** Santa Clara, CA
- **Twitter:** @servicenow (54,113 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/29352/ (32,701 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Who Uses This:** Software Engineer
  - **Top Industries:** Information Technology and Services, Computer Software
  - **Company Size:** 44% Enterprise, 30% Mid-Market


#### Pros & Cons

**Pros:**

- Ease of Use (37 reviews)
- Integrations (34 reviews)
- Automation (30 reviews)
- Efficiency Improvement (26 reviews)
- Data Management (25 reviews)

**Cons:**

- Complex Setup (23 reviews)
- Difficult Setup (17 reviews)
- Expensive (15 reviews)
- Slow Performance (14 reviews)
- Complexity (13 reviews)

### 23. [Talend Data Catalog](https://www.g2.com/products/talend-data-catalog/reviews)
  Data Catalog automatically crawls, profiles, organizes, links, and enriches all your metadata. Up to 80% of the information associated with the data is documented automatically and kept up-to-date through smart relationships and machine learning, continually delivering the most meaningful data to the user.


  **Average Rating:** 4.2/5.0
  **Total Reviews:** 12

**User Satisfaction Scores:**

- **Ease of Use:** 8.0/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 6.7/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.4/10 (Category avg: 8.4/10)
- **Data Lineage:** 9.4/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Qlik](https://www.g2.com/sellers/qlik)
- **Year Founded:** 1993
- **HQ Location:** Radnor, PA
- **Twitter:** @qlik (64,285 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/10162/ (4,529 employees on LinkedIn®)
- **Phone:** 1 (888) 994-9854

**Reviewer Demographics:**
  - **Company Size:** 42% Mid-Market, 33% Enterprise


#### Pros & Cons

**Pros:**

- Data Cataloging (1 reviews)
- Data Discovery (1 reviews)
- Ease of Use (1 reviews)
- Intuitive (1 reviews)
- Intuitive Use (1 reviews)

**Cons:**

- Interface Complexity (1 reviews)
- Poor Interface Design (1 reviews)
- Poor UI Design (1 reviews)
- User Interface Issues (1 reviews)
- UX Design (1 reviews)

### 24. [Zeenea](https://www.g2.com/products/zeenea/reviews)
  &quot;Zeenea is the Data Discovery Platform built for everyone to find, trust, and unlock the value of enterprise data. The cloud platform features two modern user experiences: Zeenea Studio is the application designed for data experts to save time managing, documenting, and governing data with maximum automation; while Zeenea Explorer enables business users to gain productivity by finding the data assets they need across all enterprise information. Zeenea’s built-in scanners and APIs enable organizations to automatically collect, consolidate, and link metadata from their data ecosystem. With a powerful knowledge graph and smart search engine, data teams can activate all enterprise metadata through a single source of truth. Zeenea helps dozens of organizations worldwide democratize data, including BPCE Group, Club Med, Generali, Renault, Société Générale, Solactive and Stellantis. Zeenea&#39;s SOC 2 Type II-certified solutions include a Data Catalog, a Business Glossary, Data Lineage, Data Quality, Data Governance, Data Stewardship, Data Privacy, Regulatory Compliance, Cloud Transformation.&quot;


  **Average Rating:** 4.4/5.0
  **Total Reviews:** 12

**User Satisfaction Scores:**

- **Ease of Use:** 8.3/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 8.3/10 (Category avg: 8.5/10)
- **Metadata Management :** 8.8/10 (Category avg: 8.4/10)
- **Data Lineage:** 7.5/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [Zeenea](https://www.g2.com/sellers/zeenea)
- **Year Founded:** 2017
- **HQ Location:** Paris, √éle-de-France
- **Twitter:** @ZeeneaSoftware (251 Twitter followers)
- **LinkedIn® Page:** http://www.linkedin.com/company/zeenea (26 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Company Size:** 50% Mid-Market, 25% Enterprise


### 25. [DataGalaxy](https://www.g2.com/products/datagalaxy/reviews)
  Founded in France and rapidly expanding across Europe and the United States, DataGalaxy is trusted by over 200 global enterprises, including Dior, Airbus, and SwissLife. The company is committed to driving data culture and literacy by helping organizations deliver metadata to the agents and value to the people. The platform emphasizes the importance of metadata, ensuring that all stakeholders have access to the necessary context and information to make informed decisions. The platform features two primary products: DataGalaxy Catalog and DataGalaxy Portfolio. DataGalaxy Catalog serves as a comprehensive metadata repository, providing users with the context needed to build trust in their data assets while ensuring compliance with relevant regulations. This centralized hub allows organizations to manage their metadata efficiently, making it easier for teams to find, understand, and leverage data for strategic initiatives. On the other hand, DataGalaxy Portfolio acts as a value management tool that tracks the ROI impact of data and AI initiatives on business performance. It enables organizations to track and demonstrate the value created from their data investments, fostering alignment from C-level executives all the way through to business stakeholders. By visualizing the outcomes of data-driven projects, DataGalaxy Portfolio helps organizations prioritize their efforts and allocate resources effectively, ensuring that data initiatives are aligned with business objectives. Targeted towards enterprises looking to enhance their data governance and management practices, DataGalaxy is particularly beneficial for organizations operating in complex environments where data is abundant but underutilized. By integrating data governance with business strategy, DataGalaxy stands out in its category as a solution that not only addresses the technical aspects of data management but also emphasizes the human element of data utilization. This holistic approach ensures that organizations can maximize the value of their data assets while fostering collaboration across teams, ultimately driving better business outcomes.


  **Average Rating:** 4.8/5.0
  **Total Reviews:** 62

**User Satisfaction Scores:**

- **Ease of Use:** 9.5/10 (Category avg: 8.6/10)
- **Business and Data Glossary:** 10.0/10 (Category avg: 8.5/10)
- **Metadata Management :** 9.6/10 (Category avg: 8.4/10)
- **Data Lineage:** 9.6/10 (Category avg: 8.6/10)


**Seller Details:**

- **Seller:** [DataGalaxy](https://www.g2.com/sellers/datagalaxy)
- **Company Website:** https://www.datagalaxy.com
- **Year Founded:** 2015
- **HQ Location:** Lyon, Rhone-Alpes
- **Twitter:** @DataGalaxy (866 Twitter followers)
- **LinkedIn® Page:** https://www.linkedin.com/company/datagalaxy/ (98 employees on LinkedIn®)

**Reviewer Demographics:**
  - **Top Industries:** Insurance, Banking
  - **Company Size:** 55% Enterprise, 42% Mid-Market


#### Pros & Cons

**Pros:**

- Ease of Use (10 reviews)
- Integrations (6 reviews)
- User Interface (5 reviews)
- Collaboration (4 reviews)
- Automation (3 reviews)

**Cons:**

- Limited Functionality (3 reviews)
- User Interface Issues (3 reviews)
- Missing Features (2 reviews)
- Product Immaturity (2 reviews)
- User Difficulty (2 reviews)



## Parent Category

[IT Infrastructure Software](https://www.g2.com/categories/it-infrastructure)



## Related Categories

- [Data Governance Tools](https://www.g2.com/categories/data-governance-tools)
- [DataOps Platforms](https://www.g2.com/categories/dataops-platforms)
- [Active Metadata Management Software](https://www.g2.com/categories/active-metadata-management)



---

## Buyer Guide

### What You Should Know About Healthcare Claims Management Software

### What is a Machine Learning Data Catalog?

Machine learning data catalog (MLDC) is an automated data catalog that carries out tasks like crawling metadata, cataloging, and classifying personally identifiable information (PII) data. Machine learning data catalogs organize the dataset inventory using metadata.

Data catalogs help companies know where the data is stored, thus reducing the time taken to identify data and making it easily accessible for analytics. They are inventories of assets like tables, schema, files, and charts in organizations, aiding in solving a company&#39;s data discovery, quality, and governance challenges.

### What does MLDC Stand For?

MLDC is an acronym for Machine Learning Data Catalog.&amp;nbsp;

### What are the Common Features of Machine Learning Data Catalogs?

Machine learning data catalogs simplify the manual functions of a data catalog. A data catalog is an essential part of the data management strategy of any organization. Some of the features of machine learning data catalogs are:

**Data ingestion and discovery:** Machine learning data catalogs must have prebuilt adapters to connect to different company systems like applications, databases, files, and external APIs. These adapters help in discovering metadata from systems. Metadata can be table names, attribute names, and constraints. The feature helps build native connectivity like integrations for data sources, business intelligence (BI) solutions, and data science tools.

**Business glossary:** Although a good amount of data is stored in the repository, it is also essential for the users to understand what the stored data means. The glossary feature links this data to business terms giving it more meaning.&amp;nbsp;

**Automated data labeling:** Data labeling is a prerequisite for machine learning algorithms. Automated data labeling is more accurate than manual since it eliminates human errors. Data labeling usually involves annotators identifying objects in images to build quality artificial intelligence (AI) training data. Automated labeling eliminates the challenges posed by the tedious annotation cycles.

**Data lineage:** Data lineage is the process that helps the users know who, why, when, and where changes are made to the data. It is a part of metadata management. MLDCs automate the data lineage process. Data lineage helps determine when new or changed data require retraining machine learning models. MLDCs usually parse through query logs into data lakes and other data sources automatically to create a data lineage map.

**Data quality monitoring and anomaly detection:** Data quality monitoring helps users understand if the data came from a trusted source. The machine learning data catalog also has a feature to identify sudden changes in data using machine learning algorithms. The users are immediately alerted to any changes or anomalies that are detected.&amp;nbsp;

**Semantic search for data sets:** Machine learning data catalogs provide users with visual and intuitive searches like search engines. Almost every user in any organization is a data user, but not everyone can use SQL queries to use data. The semantic search feature makes it easier for all users to discover data sets.

**Compliance capabilities:** This feature ensures that sensitive data is not exposed and that the user can trust the data. It further helps keep data governance policies in place and strengthen data management in the organization. Data stewards can identify low-quality data and restrict access to sensitive data, thus helping comply with regulations such as the General Data Protection Regulation (GDPR).

**Data profiling:** Data profiling helps check the data from the data source and collects information about it. This process helps in knowing data quality issues much better, thus making the data management process more efficient.

### What are the Benefits of Machine Learning Data Catalogs?

A machine learning data catalog provides several benefits to different types of users in the organization. These include:

**Ease in data curation:** Data curation is a process of collecting, organizing, labeling, and cleaning data. Machine learning data catalogs validate metadata and organize insights into correct repositories using machine learning algorithms.

**Ease of search:** Because of semantic search, it becomes easier for non-technical users to search and discover data for use since they do not have to use SQL queries every time to access data.

**Ease in data collaboration:** Machine learning data catalogs help the users collaborate, use, and share data sets because machine learning data catalogs ease finding and storing siloed data.

### Who Uses Machine Learning Data Catalogs?

Machine learning data catalogs centralize metadata for various data assets. By organizing the metadata, MLDCs help organizations to govern data access.

**Data analysts:** Data analysts use MLDC to discover, classify, and manipulate data for their analytics processes. They can also discover AI or machine learning models, understand how they work, and import them into their BI tools. Data catalogs help data analysts make companies into self-service organizations. Self-service analytics is important for any organization that wants to be driven by insights. Machine learning data catalogs help the users know the means to find, understand, and trust data.

**Marketers:** Marketing teams use the machine learning data catalog more commercially. They obtain insights for making better decisions using data catalogs.

**Data scientists:** Data scientists usually publish their models for reuse. Data scientists always look for one platform that centralizes data for different projects.&amp;nbsp;

### Challenges with Machine Learning Data Catalogs

Although machine learning data catalogs help solve major challenges in traditional data catalogs like data discovery and data lineage, MLDCs also come with challenges.&amp;nbsp;&amp;nbsp;

**Scalability:** It is tricky for all MLDCs to support a huge metadata volume. Sometimes, the data catalogs break down due to performance issues when overloaded with enormous amounts of metadata. Initially, data used to be stored in the company&#39;s mainframe data center. However, due to today&#39;s big data, machine learning data catalogs must keep track of data in both cloud and data lakes.

**Fragmentation in evaluating a product:** If a data catalog is too bulky, it causes fragmentation in the user&#39;s journey of evaluating a product. Too much data makes users use too many tools, thus breaking a seamless experience into fragments.

### How to Buy Machine Learning Data Catalogs

#### Requirements Gathering (RFI/RFP) for Machine Learning Data Catalogs

The machine learning data catalog offers many features to help users identify usable data. A buyer can choose the right MLDC software depending on the organization&#39;s needs. RFP/RFIs help the organization look for pricing, product features, and guidelines.

#### Compare Machine Learning Data Catalog Products

**Create a long list**

The first step is to look for all the possible players in the space. This gives an advantage of evaluating the vendors for the price, product features, and customer service.&amp;nbsp;

**Create a short list**

After evaluating the potential vendors, the company can narrow the list to those who check all their boxes.

**Conduct demos**

Demos help in understanding the product as a whole. A team of IT professionals and data scientists should join these demos to understand the product&#39;s functionality, whereas the marketing team can join in to analyze the business use of the software in the projects.

#### Selection of Machine Learning Data Catalogs

**Choose a selection team**

A team of marketing professionals with data scientists and IT professionals can communicate any queries related to the MLDC product with the vendors. A data scientist would be more interested in knowing the technical features of the software. A marketing manager would be curious to know how the marketing team could use MLDC for any project. An IT professional would want to understand the software installation procedure.

**Negotiation**

Once the vendor quotes the price, the negotiations begin. The price is fixed based on the cost of other similar products available in the market and the extent to which the product can solve the challenges.

**Final decision**

The final decision is based on agreements between the vendor and the buyer.




