Latka logo

Top 5 Machine Learning Data Catalog Software SaaS Companies in May 2026

As of May 2026, there are 5 SaaS companies in Machine Learning Data Catalog Software. They have combined revenues of $109.7M and employ 549 people. They have raised $1.1B and serve - customers combined.

Machine Learning Data Catalog Software serves as a specialized framework designed to enhance the management, discovery, and utilization of data specifically for machine learning projects. These solutions facilitate real-time data discovery by automating the cataloging of datasets, enabling organizations to effectively organize and manage their data assets. In doing so, they allow data scientists and machine learning engineers to locate relevant datasets quickly, thereby accelerating the development of machine learning models. Typical features of Machine Learning Data Catalog Software include automated metadata ingestion, lineage tracking, and advanced search capabilities powered by machine learning algorithms. This facilitates easier dataset evaluation and improves collaboration across teams, as stakeholders can access effective data documentation and understand the provenance of their data. Common buyer personas include data scientists, machine learning engineers, data governance professionals, and IT managers, all of whom seek efficient ways to manage and utilize large volumes of data for analytical and operational purposes.

Companies
5
Revenue
$109.7M
Funding
$1.1B
Employees
549

Filters

Sorting: Highest -> Lowest

Filters

Top Machine Learning Data Catalog Software Companies

Showing 10 of 1 companies ranked by annual revenue.

1
Deepstributed

Wrocław, Poland

Deepstributed is an app. It makes unimaginably easy to cope with three ML processes that usually make life harder. 1. Finding GPU recources At first we help you to connect your GPUs to our app. Yes, you can manage your experiments from Deepstributed even on your own computing resources. And then, if you want to run more, you can reach for the community GPUs or GPU-as-a-Software resources. 2. Configuring them Then we let you specify which resources and preconfigured runtime frameworks you want to use for a certain experiment. PyTorch, Tensorflow, Caffe? Yes, we have them all. And many more. 3. Running experiment smoothly Finally, you upload your code, data and run the experiment on the cheapest resources possible and no configuration effort. Nice, huh?

Revenue
$119.3K
Customers
-
Year founded
2018
Funding
-
Team size
1
Growth
-

Inclusion Criteria

- Must offer automated metadata management to simplify data organization - Should provide advanced search functionalities to enable quick data discovery - Must include lineage tracking to visualize data flow and relationships - Should facilitate collaboration among teams by offering clear documentation and accessibility - Must cater specifically to machine learning use cases, not just general data management - Should integrate seamlessly with existing data tools and platforms used by the organization - Not just a data storage solution; must actively support data discovery and utilization features