Common Data Model (CDM)

Common Data Model (CDM)
At the core of the platform lies a common data model to enable interoperability between the different components. In order to interact with the platform, a component should know how to interact with at least a subset of the CDM. The model describes all the commonly used data that is dealt with in the platform, and therefore covers at least taxonomic names and concepts; literature references; authors; (type) specimen; structured descriptive data; and species related content of any kind like economic use or conservation status. Nearly all this data has already been described by existing or upcoming TDWG standards. Unfortunately, there are still major gaps in compatibility, so a new integrated data model has to be developed in order to quickly yield results.

The model is being developed using the UML. Java classes are derived that have XML bindings and contain persistency annotations (JPA). An XML schema incarnation of the CDM will be used to validate data exchange and thus is the normative format that needs to be understood by the different components. As mentioned before, there is no such integrated TDWG schema yet, so the schema will try to incorporate as much of the existing standards as possible; especially TCS looks promising.

The latest TDWG approach using RDF and OWL ontologies for LSIDs (termed "biodiversity bus" by GBIF and EoL) has been considered but rejected as the foundation of the platform, because it is impossible to set cardinality constraints that are very important for applications. There will be services translating the EDIT CDM into TDWG RDF and vice versa, but TDWG RDF violating CDM cardinalities will be lost in this process, so no full round-tripping is possible.

A similar modelling effort has been undertaken by the CATE project. The platform development aims at creating a shared domain model library in Java that can be used as a foundation for many other java based biodiversity projects including CATE. Apart from the pure data model it will contain an XML serialisation (marshalling) and persistency layer as well as some basic business logic.