Best Machine Learning Data Catalog Software

Machine learning data catalogs allow companies to categorize, access, interpret, and collaborate around company data across multiple data sources, while maintaining a high level of governance and access management. Artificial intelligence is key to many features of machine learning data catalogs, enabling functionality such as machine learning recommendations, natural language querying, and dynamic data masking for enhanced security purposes.

Companies can utilize machine learning data catalogs to maintain data sets in a single location so that searching for and discovering data is simple for everyday business users and analysts alike. Users have the ability to comment on, share, and recommend data sets so colleagues can have an immediate understanding of what they are querying. Additionally, IT administrators can put into place user provisioning to ensure unauthorized employees are not accessing sensitive data.

Machine learning data catalogs are most frequently implemented by companies that have multiple data sources, are searching for one source of truth, and are attempting to scale data usage company-wide. These products are generally administered by IT departments, who can maintain organization and security, but data can be accessed by data scientists or analysts and the average business user. The data can then be transformed, modeled, and visualized either directly in the machine learning data catalog or through an integration with business intelligence software.

It should be noted that not all machine learning data catalogs provide data preparation capabilities and may require an integration with a business intelligence platform. Additionally, these tools differ from master data management software due to their enhanced governance, collaboration, and machine learning functionality.

To qualify for inclusion in the Machine Learning Data Catalog category, a product must:

  • Organize and consolidate data from all company sources in a single repository
  • Provide user access management for security and data governance purposes
  • Allow business users to search and access the data from within the catalog
  • Offer collaboration features around data sets, including categorizing, commenting, and sharing
  • Give intelligent recommendations based on machine learning for quicker access to relevant data
Star Rating

Machine Learning Data Catalog reviews by real, verified users. Find unbiased ratings on user satisfaction, features, and price based on the most reviews available anywhere.

Compare Machine Learning Data Catalog Software

G2 takes pride in showing unbiased ratings on user satisfaction. G2 does not allow for paid placement in any of our ratings.
Results: 21
Filter Results
Filter by:
Sort by
Star Rating
Sort By:
Results: 21
    Optimized for quick response

    IBM Watson® Knowledge Catalog is a unified data catalog that can help your data users quickly find, curate, categorize and share data, analytical models and their relationships with other members of your organization. It serves as a single source of truth for data engineers, data stewards, data scientists and business analysts to shop for data they can trust. With active policy management, it helps your organization protect and govern data, so it’s ready for AI at scale. Learn More: https://ibm

    Aginity transforms the way world-leading companies compete on analytics. Aginity Amp software creates, catalogs and manages all analytics (analytic logic and data) as assets.

    Oracle Enterprise Metadata Management (OEMM) is a comprehensive metadata management platform. OEMM can harvest and catalog metadata from virtually any metadata provider, including relational, Hadoop, ETL, BI, data modeling, and many more.

    Alation is a data catalog designed to empower analysts to search, query & collaborate on data to gain faster, more accurate insights.

    Data Steward Studio (DSS) is a DataPlane Service that empowers users to understand, secure, and govern data across enterprise data lakes.

    Intel(R) Machine Learning Scaling Library (Intel(R) MLSL) is a library providing an efficient implementation of communication patterns used in deep learning.

    Cloudera Navigator is a complete data governance solution for Hadoop, offering critical capabilities such as data discovery, continuous optimization, audit, lineage, metadata management, and policy enforcement. As part of Cloudera Enterprise, Cloudera Navigator enables performance agile analytics, supporting continuous data architecture optimization, and meeting regulatory compliance requirements.

    Each entry in the dataset consists of a unique MP3 and corresponding text file. Many of the 1,368 recorded hours in the dataset also include demographic metadata like age, sex, and accent that can help train the accuracy of speech recognition engines. The dataset currently consists of 1,087 validated hours in 18 languages, but we're always adding more voices and languages.

    Unifi is a single data interface for the enterprise.

    Altair Knowledge Hub is an enterprise data prep solution that empowers individuals and organizations to intelligently tap into more data to drive faster insight and better value. Knowledge Hub provides clear lineage, evidence of integrity, and organizational governance controls as well as cross-team sharing and collaboration in a centralized marketplace where users can publish their output to any analytics or reporting platform.

    A Semantic Layer for the Enterprise. Enabling Connected Data Access and Analytics on Demand. Anzo Smart Data Lake (ASDL) connects to both internal and external data sources, including cloud or on-premise Hadoop based data lakes to rapidly ingest and catalog large volumes of structured and unstructured data through horizontally scaled, automated Extract, Transform and Load (ETL) processes that can be mapped to establish a Semantic Layer of business meaning.

    Appen is a global leader in the development of high-quality, human-annotated datasets for machine learning and artificial intelligence. Appen brings over 20 years of experience capturing and enriching a wide variety of data types including speech, text, image and video.

    signal processing, machine learning, and AI to solve real-world business challenges including in financial services

    Collibra Data Governance Center is an enterprise-wide data governance solution that puts people and processes first, automating data governance and management to quickly and securely deliver trusted data to the business users who need it.

    Data3Sixty facilitates answers to fundamental questions about data, such as source, use, meaning, ownership, and quality through a robust suite of governance solutions, including business glossary, data dictionary, data catalog, data lineage, and metadata management. Customizable dashboards and zero-code workflows ensure users can quickly and easily leverage data to maximum advantage.

    machine-learning-based data catalog lets you classify and organize data assets across cloud, on-premises, and big data. It provides maximum value and reuse of data across your enterprise.

    Immuta is the fastest way for algorithm-driven enterprises to accelerate the development and control of machine learning and advanced analytics. The company's hyperscale data management platform provides data scientists with rapid, personalized data access to dramatically improve the creation, deployment and auditability of machine learning and AI.

    A machine-learning-based data catalog that allows to classify and organize data assets across cloud, on-premises, and big data. It provides maximum value and reuse of data across enterprise.

    Reltio Cloud delivers enterprise data-driven applications together with a modern data management Platform as a Service (PaaS), guiding customers to take the right actions, based on the right insights, to achieve the right results.

    Data Catalog automatically crawls, profiles, organizes, links, and enriches all your metadata. Up to 80% of the information associated with the data is documented automatically and kept up-to-date through smart relationships and machine learning, continually delivering the most meaningful data to the user.

    Waterline Data Fingerprinting works by analyzing the data values in each data set and profiling the data. Waterline Data then uses that information to create a fingerprint for each column of data—using machine learning to intelligently and automatically tag and match data fingerprints to glossary terms and populate the data catalog. Users can then refine matched terms, and remaining unmatched terms, through crowdsourcing.

    Latest Machine Learning Data Catalog Articles