Mappings as first-class citizens
Over the years, mappings have been a recurring topic in our conversations with our partners and communities. For example, our earliest issues and feature requests with the word "mapping" date back to 2017/2018. And our partners consistently identified support for mappings as a priority feature in all of our open roadmapping sessions.
That's why I'm very happy to share that we have reached a new milestone this month: mappings are now first-class citizens in Semantic Treehouse.
Why did it take so long?
Supporting mappings has been a long held wish but they were seldom the highest priority. The exception for us has been our work on e-invoicing standards. In that domain, mappings between the European semantic data model (EN16931) and common syntaxes like UBL and UN/CEFACT CII became part of the mandatory European norm. This meant that mapping information needed to be documented, published, and presented in a web-friendly/user-friendly way. Also, practitioners working with e-invoices were accustomed to thinking in terms of UBL or UN/CEFACT CII fields, so helping them relate these familiar structures to their abstracted counterparts in the semantic data model was important for adoption and understanding.
As a result, years ago we created a kind of implicit support for mappings based on this need. Here's an example, with the semantic data model on the left and the UBL invoice syntax to the right:
The tree-based mapping view works well for e-invoicing syntax bindings, where mappings are relatively fixed and the highlighting clearly shows "it's over here!"
I think this exception proved the rule: mappings matter most when they bridge the gap between abstract semantic models and the concrete formats people actually work with.
What are mappings, exactly?
Let's clarify what we mean by "mappings". We follow definitions from the semantic interoperability literature, particularly Jérôme Euzenat and Pavel Shvaiko's work and CROW's mapping whitepaper:
A mapping is the oriented, or directed, version of an alignment: it maps the entities of one data model to at most one entity of another data model. A mapping can be seen as a collection of mapping rules all oriented in the same direction—from one specification to another—with each element of the source appearing at most once.
Mappings help you answer questions like:
- "How does my organization's internal data model relate to the industry standard?"
- "Which fields in format A correspond to which elements in format B?"
- "How can I transform data from one representation to another?"
These questions are fundamental to achieving interoperability. We see this with our Dutch partners in sector initiatives implementing BOMOS. These are usually mature networks with high trust, shared business contexts, and familiar legal frameworks. But mappings become even more critical in geographically distributed or less mature networks, like many European Data Spaces. Where convergence to a single shared model is further away or less likely, explicit mapping support can be a catalyst for practical interoperability.
What does "first-class citizen" mean?
Basically this means mappings are no longer hidden attachments to the data models, but actively managed objects that are just as important as the data models themselves.
In software design, a first-class citizen is an entity that supports all the operations generally available to other entities in that system. For Semantic Treehouse, this means that mappings have become a full specification type with all the capabilities you'd expect:
- Issue tracking: create and link issues to specific mapping rules, just like you can with message model elements
- Version control: maintain different versions of mapping specifications with versioning history
- Project integration: mappings appear in the specification index and belong to projects, so that they're discoverable and manageable
- Access control: role-based permissions apply to mappings, including group-level access control
- Import/Export: mappings can be imported and exported in standard formats for portability and reuse
Why now?
Two developments converged to make this the right moment for fully integrating mapping support:
1. Backend refactor complete
Our recent backend refactor has streamlined how we handle different specification types. Where implementing mapping support would previously have been a complex, multi-month project, it became straightforward and fast. What's exciting (at least to me!) is that this means any extension, addition, or change to STH is now less costly and time-consuming. This lets us set higher goals! We can be more ambitious about what we commit to in projects and more responsive to our communities' needs (we're already looking at process specifications next).
2. AI and semantic interoperability
Recent developments in AI have created new possibilities for mapping creation and curation. We want to explore what AI can mean for reducing the manual effort of creating and maintaining mappings.
Since Semantic Treehouse serves as our living lab, we leverage internal R&D projects to explore emerging technologies and test their practical value. Through our recent HAVOC research project we learned that reliable AI mapping requires a 'human-in-the-loop' workflow. This insight led us to develop this new mapping infrastructure in STH, because we needed an environment where AI can suggest mappings, but where experts can validate and manage them with some confidence.
Real-world impact: Digital Product Passports in CIRPASS-2
The practical value of mapping specifications became immediately clear in our work with the CIRPASS-2 project, a Horizon Europe initiative developing Digital Product Passport solutions across multiple industries.
The scenario is common in European standardization work: industry pilots develop practical, bottom-up data models based on their specific needs, while parallel efforts create top-level, cross-sectoral ontologies. The challenge is bridging these two worlds.
In the textile DPP pilot, partners including Kezzler, Atma.io (Avery Dennison), and TripleR collaboratively developed a shared data model containing approx. 15 key data attributes. This started as an Excel sheet, which is a common starting point for these kind of project consortia. We imported this into Semantic Treehouse as a formal message model specification.
The next step was mapping their model to the official CIRPASS-2 core DPP ontology. When we demonstrated the mapping functionality during a follow-up meeting, they got it right away. They confirmed that visualizing the connection between their practical Excel models and the abstract ontology was exactly the missing link they needed to make the two converge.
What we showed looked something like this:
The mapping rules are now explicit (on the left), with source and target specifications side-by-side and detailed mapping information in the center.
Both viewing approaches remain available in STH because they still serve different needs. The simpler view without explicitely listing the mapping rules works well for fixed syntax bindings like e-invoicing, since in that case the mapping is established and not subject to much discussion. The new view better supports scenarios where mappings are being collaboratively developed and require more detailed justification and discussion.
With mapping specifications in STH, these consortia can now:
- Connect industry-specific models with cross-sectoral ontologies
- Share mappings among pilot partners and other stakeholders
- Version and maintain mappings as both industry models and EU standards evolve
- Generate transformation specifications for actual data conversion and testing
- Show how sector standards align with regulatory requirements
While CIRPASS-2 is an EU project, the same 'bottom-up meets top-down' challenge exists for national and sectorial initiatives, where software vendors need to map industry standards to their own (API) implementations. We call these "implementation profiles". More about that soon.
What's next?
For our Dutch sector communities using STH, like Ketenstandaard, SETU and SUTC, this means better supprt for one of the most-requested features. For our European partners in data space initiatives, this underscores STH's position as a mature vocabulary hub implementation that is capable of handling the heterogeneity that is typical for federated data spaces.
Of course the work is far from done. As with any new capability in our living lab, we're already identifying improvements and refinements. We work as an agile team and have to balance consultancy, research, and platform development in a project-based organisation (TNO). This means features are development further where our projects and communities need them most. But we've spotted several UX improvements we'll be working through in any case.
We're eager to see how communities will use mapping specifications to solve their interoperability challenges, and we welcome your feedback as we continue developing this functionality.