Latka logo

Top 5 Machine Learning Data Catalog Software SaaS Companies in May 2026

As of May 2026, there are 5 SaaS companies in Machine Learning Data Catalog Software. They have combined revenues of $109.7M and employ 549 people. They have raised $1.1B and serve - customers combined.

Machine Learning Data Catalog Software serves as a specialized framework designed to enhance the management, discovery, and utilization of data specifically for machine learning projects. These solutions facilitate real-time data discovery by automating the cataloging of datasets, enabling organizations to effectively organize and manage their data assets. In doing so, they allow data scientists and machine learning engineers to locate relevant datasets quickly, thereby accelerating the development of machine learning models. Typical features of Machine Learning Data Catalog Software include automated metadata ingestion, lineage tracking, and advanced search capabilities powered by machine learning algorithms. This facilitates easier dataset evaluation and improves collaboration across teams, as stakeholders can access effective data documentation and understand the provenance of their data. Common buyer personas include data scientists, machine learning engineers, data governance professionals, and IT managers, all of whom seek efficient ways to manage and utilize large volumes of data for analytical and operational purposes.

Companies
5
Revenue
$109.7M
Funding
$1.1B
Employees
549

Filters

Sorting: Highest -> Lowest

Filters

Top Machine Learning Data Catalog Software Companies

Showing 10 of 1 companies ranked by annual revenue.

1
SambaNova Systems

Palo Alto, California, United States

SambaNova is the leading Enterprise AI company that delivers a full-stack infrastructure from silicon to software, specializing in machine learning and big data analytics platforms.

Revenue
$100M
Customers
-
Year founded
2017
Funding
$982M
Team size
417
Growth
-

Inclusion Criteria

- Must offer automated metadata management to simplify data organization - Should provide advanced search functionalities to enable quick data discovery - Must include lineage tracking to visualize data flow and relationships - Should facilitate collaboration among teams by offering clear documentation and accessibility - Must cater specifically to machine learning use cases, not just general data management - Should integrate seamlessly with existing data tools and platforms used by the organization - Not just a data storage solution; must actively support data discovery and utilization features