The Five Qualities of Collaborative Data Ecosystems

Over the past decade, organizations have focused heavily on implementing the culture, tools, and processes to create value from data with data science and machine learning within the enterprise. But this transformation doesn't intend to stop at corporate boundaries. Read the article to find out more
Published 7 September 2022

In this article, we take a closer look at the five qualities of collaborative data ecosystems and show how today’s business capabilities need to transform.

Strategy and Culture

Collaborative data ecosystems (CDEs) are emerging, but they cannot be built on the strategy or former value propositions from individual organizations. They must undergo a transformation from the inward to the outward. By creating a shared value proposition, you incorporate the strengths and capabilities of all participants in an ecosystem.

A collaborative data ecosystem is an alignment of business goals, data and technology, among two or more participants, to collectively create value that is greater than each can create individually.

Another important aspect is the cultural shift, as the required mindset in ecosystems is quite different from operating only internally. In the article “Are You Ready to Become An Ecosystem Player?” by BCG, the authors identified four shifts in mindset that companies must accomplish if they want to become successful in the new era:

From efficiency to imagination: One of the strengths of an ecosystem is that it can address challenges and enable value propositions that no individual company could achieve alone. Finding such opportunities requires counterfactual thinking and imagination.

From inward focus to outward focus: As the focus of value creation and innovation moves from the company to the ecosystem, firms must look beyond their own boundaries to explore business opportunities and secure the required resources and capabilities.

From capturing value to creating value: Instead of asking “How can we make money?”, successful ecosystems start with the question “How can we create value together?”. They understand that if they collaborate effectively, every partner will be better off.

From hierarchical control to collaboration: The ecosystem must be built on trust. This requires consistently demonstrating competence (the ability to deliver on its promise), fairness (in treating all stakeholders of the ecosystem), and transparency (providing true, reliable, and unambiguous information that enables stakeholders to monitor behavior and results).

As with every cultural change, this takes time but is rewarding for those who have initiated it.


In the context of collaborative data ecosystems, we are talking about required capabilities for the business stakeholder that are responsible for creating value.

Prioritize and align on the most valuable use cases: Collaborating and sharing data or insights in ecosystems allows you to develop a new, unique, and exclusive use case portfolio of data products and applications that deliver the highest value. The scope of potential use cases can be massively extended. What kind of use cases were previously impossible to realize, maybe because the datasets were too small, the use case being considered was too much of a risk, or were there other major roadblocks that one company alone could not solve?

Deliver business value with data and analytics or machine learning: Building and deploying data science applications contains risks and challenges. Not every participant in a collaborative data ecosystem has the capabilities to operationalize machine learning (ML) and bring it to market. The beauty of collaborative data ecosystems is that it isn’t even required. If one participant (usually the orchestrator / data consumer) is well-versed with operationalization, other partners accessing its capabilities can contribute with sharing data, insights, or other resources.

Focus on innovation over monetization: Data science is highly experimental, as data and models are in continuous feedback loops. All business stakeholders need to keep in mind that iteration, realignment on use cases, and adapting to market dynamics are part of the nature of CDEs. That is why business models and metrics should be set accordingly: BCG suggests focusing on establishing the value proposition for customers before putting too much emphasis on monetization, and to grow the market before distributing the value created.

Data and Technology

Challenges in collaborative research and data sharing are manifold: Valuable data is siloed, inaccessible, incomplete, or non-standardized. To foster innovation and to create business value, the ecosystem must be simple to enter and to contribute, scalable, and based on trust. Emerging technologies have reached market maturity and enable new capabilities within ecosystems, such as more granular data sharing- and privacy-enhancing technologies (PETs).

Federated Analytics and Federated Learning: Both approaches allow for granular insights, or to train ML models on distributed data. This removes the requirement of centralizing the data, resulting in increased machine learning performance, while respecting data ownership and privacy.

Federated Architecture: By building and managing cloud-based platforms that orchestrate data curation and analytics across a federation of distributed data partners, new partners can be onboarded much quicker without further adding technical complexity and overhead.

Data Interoperability and Data Management: Common Data Models must be established to enable collaboration between all parties. The CDE should also offer standardized data curation capabilities (transform, query, log, validate) and an end-to-end data modification audit trail.

Analytics and Machine Learning: A closed-loop of the entire data and ML lifecycle in cross-organizational tech stacks is paramount - from data capturing to the operationalization of models. This requires that different data science tools, packages, and languages work seamlessly together, to be able to derive insights from data with advanced analytics, AI, and ML at scale.

Security, privacy, and IP protection: There is no universal privacy-preserving data sharing mechanism. Being confident that private and proprietary information or assets are secure can only be achieved by the modular application and orchestration of PETs, combined with access controls and additional security layers.

Operating Model

Most successful ecosystems today are built on digital platforms and rely on the efficient exchange and analysis of copious amounts of data. This requires new capabilities and cross-organizational operating models that work for all participants of the CDE.

Define responsibilities, ways of working, and organizational structures: The depth of the collaboration is not only defined by the data interoperability or integration of tech stacks. Making sense of data requires deep domain knowledge. A CDE must enable ways to securely co-create and to co-innovate without violating trade secrets or IP.

Federated and continuous development practices: As DevOps or DataOps approaches, MLOps seeks to increase automation and improve the quality of production models, while focusing on business and regulatory requirements. A challenge of these practices is that they are limited to internal processes only. Federated MLOps is the capability that drives consistency and governability between collaborating companies and improves productivity with integrated workflows across data science tools, infrastructure, and Privacy-enhancing Technologies.

Trust and Governance

According to BCG, trust-related issues are a major cause of business ecosystem failure and play an even larger role in CDEs. Without trust, it is impossible for ecosystems to mature while the collaboration on sensitive data and AI has the highest requirements on data privacy and IP. Many companies have developed responsible AI principles, but very few have changed how they internally operate.

Traceability and Auditability: Ensuring trust between all participants of a CDE requires the adoption of a common set of principles. A notable example of a transparent framework is the Five Safes (safe people, safe projects, safe settings, safe data, safe outputs), which is being used in the development and assessment of Trusted Research Environments (TREs). TREs are becoming the architectural backbone and gold standard for analyzing and working with health data in many organizations.

Security, Data Privacy, and IP protection: As mentioned before, privacy-enhancing technologies play a crucial role in enabling CDEs. Approaches such as Fully Homomorphic Encryption (FHE), synthetic data, federated learning, differential privacy, and functional encryption make it possible for organizations to reap the benefits of data-sharing without sacrificing privacy or IP.

Platforms for Collaborative Data Ecosystems

Profound change towards co-creation, co-innovation, and fully functional data ecosystems is a long-term effort. But one thing is certain: those who embrace it today will gain a significant competitive advantage in the future. By starting with a small-scale initiative, you get a head start and can then sustain this advantage by scaling up use cases to their full potential.

Machine learning & AI
Federated learning & analytics
Data & analytics
Share blog post to Linked In Twitter

Insights delivered to your inbox monthly