As Model-Based Software Engineering continues to gain traction in our industry, data models will become more commonplace in software architectures and acquisition programs. This paper proffers a framework for considering the maturity of data models. Not only does this provide modelers and architects a way to characterize their models, it also enables program managers and contracting officers to write requirements for the creation (and subsequent evaluation) of models. This paper will also relate each of the models to existing technology and indicate where they align with (and are enabled by) the FACE Data Architecture standard.

Introduction

Abstract models that organize data elements and standardize how they relate to one another and to real world properties are commonly referred to as data models. The architecture used to govern these data models (data model architecture) has evolved to include advancements in both theory and practice. Over these years of evolution, however, it has become more and more challenging to differentiate best practices from the use of various tools, standards and other approaches that may be used to achieve similar goals. This is particularly understandable given the competing nature of goals, costs, and requirements across the vastly different programs and projects in which data model architectures are applied. To more effectively engage with new data model architecture ideas, it would be logical to step back from the intricacies of the approaches and remind ourselves why data models are important and what they strive to accomplish.

If we consider an interface, it is the anticipated use and the anticipated extent of use of a given interface that drives the rigor and specificity of its interface documentation. This is because, at a fundamental level, it is the documentation of interfaces that enables an architecture’s interfaces to be managed, understood, shared and extended.

As the strength of a chain is determined by its weakest link, the success of the communication between an interface and its consumer is dependent upon its weakest point of documentation. In other words, the minimal rigor to which the interface must be documented is dependent upon the highest level of comprehension required. As the number of integrators and / or organizations increase, so does the required rigor and specificity of the interface documentation.

Since interface documentation is critical to integration, management and scalability of that documentation are two primary areas of importance to examine further. In fact, these two factors can provide a solid litmus test for categorically identifying levels of interoperability.

Management of interface documentation has notoriously been difficult. To remain effective, documentation must be maintained as interfaces change and updates are made. As systems grow larger and more complex, this process becomes more daunting. In an attempt to mitigate the difficulty, a single authoritative body is often authorized with control over the single documented representation of those interfaces. While this may be sufficient for isolated, small or one-off interface implementations, trends in distributed teaming makes a single nexus of control increasingly difficult, if not impossible. The need for distributed control of documentation grows year by year.

Scalability of interface documentation refers to the ability for that documentation to grow, change, and evolve over time, while retaining traceability to its original provenance. Traceability allows for a characterization of a data’s meaning to be retained throughout all future iterations of that documentation, without the need for it to be recharacterized.

Additionally, scalability impacts the documentation’s portability, or ability of interface documentation to be reused and shared. This prevents the need for the physical documentation to grow at a proportional rate to the data being characterized by the documentation. Only new concepts need to be fully characterized in ongoing iterations of the documentation since certain fundamental concepts such as location stand the test of time. For example, the concepts of knowing where you are, locating and identifying things and items of interest near oneself, deciding what to do about those things, and then actually doing something have been the same for a long time. The fact that how we go about doing these things has evolved does not change the concept of what it is we are doing. 1

1 As an exercise, consider the differences in a conceptual model describing an engagement for a modern naval combatant versus a 200-year old ship of the line. Not the interfaces and the specifics of the data, but the structure of the conceptual concepts.

Interface Documentation Maturity Levels (IDMLs)

The ease with which interface documentation may be managed and scaled provides critical insight into the level of system’s interoperability enabled through the use of that documentation. It is this level of interoperability that determines an architecture’s place in the desired progression towards advanced approaches and best practices.

Interface Documentation Maturity Levels (IDMLs) provide a basis for qualifying and relating levels of interoperability based on interface documentation rigor and specificity. By identifying the specific IDML for a given interface or set of interfaces, it is possible to classify what level of interoperability that interface will have with all other interfaces also quantified by IDML.

The IDML Model with its seven levels represents an additive maturity concept progressing towards today’s most advanced practices in interface documentation. As levels in the model increase, the rigor and specificity of the documentation increases, as does its usability, management and scalability. (It should be noted that the notion of an IDML scale is conceptually taken from the long-standing concept of Technology Readiness Levels (TRLs) (Sadin) but is not intended to carry any particular connotations or usages of TRL.)

Recall that we earlier likened interface documentation to links in a chain and interoperability to the strength of that chain. Although an interface may be categorized at a particular IDML, it will be minimally bound by the extent of functionality of not only the interface itself but also the minimum functionality of the interface interacting with it. The level of interoperability between any two interfaces is limited by the lowest IDML between those two interfaces. For example, if an IDML6 interface interacts with an IDML5 interface, the interoperability will be limited to IDML5 capabilities.

As an interface’s requirements change, it may be necessary to further specify the interface documentation, requiring a higher IDML. The progression to a higher IDML ensures that the interface now meets higher

standards; new interfaces at this level, may now be discovered and utilized. Depending on the current IDML, the transition from that current level to a desired level may require substantial data model architecture changes to accomplish the transition. Such possibilities need to be considered carefully during architecture planning. Being that very few interfaces will only require interoperability with other low-level IDML interfaces, best practice recommends that an interface strive to reach the highest IDML reasonably attainable.

Early: Levels 1-2

When a program is small, relatively static and/or wholly managed internally with no need for external interfaces, early maturity level documentation may suffice. Such documentation often meets the needs of the program as management is simple and scalability is not desired. The most basic forms of interface documentation are characterized by these Early Maturity levels which are limited to the specification of content. In this early stage, interface documentation may be captured through Source Code or an ICD / IDD that simply captures the content of the interface without machine-readable syntax or semantic meaning.

These two levels of documentation are not addressed by the FACE Data Architecture.

1: Source Code

In IDML1, one could reasonably argue that no overt interface documentation exists, since the source code provides the extent of the documentation. The code explicitly specifies the algorithm, but only implicitly suggests syntax and semantics. Change and tracing of change happens only in the source code, and no additional documentation is available to communicate the intent or use of that code. As long as the original intent is fully understood by modifying the code, this approach is sufficient. Source code is the simplest possible form of interface documentation.

Figure 1: example source code

2: Interface Control / Description Document (ICD / IDD)

An ICD / IDD is a familiar engineering artifact and a key element in systems engineering. It is often used as tool to communicate information about interfaces, documenting the inputs and outputs of system or the interfaces between systems. This solution provides additional information beyond the source code including the meaning of the interface, how the data is transmitted, and the data formats used.

The ICD provides a useful vehicle for the communication of interface specifications beyond source code. However, use of the ICD also requires a knowledgeable user to read, interpret, and implement aligned code structures and type manipulation routines that are in alignment with the IDML2 documentation.

Figure 2: sample IDD template

Mid: Levels 3-5

As complexity and scope increase, interface management becomes more challenging. External interfaces necessitate more rigorous documentation and, once users have experienced the (sometimes prolonged) pain of a paper-document-based architecture, a push is often made for the creation of “The Data Model.”

This Mid Phase is characterized by the capacity to be machine readable. Machine readability becomes possible at this level due to the introduction of syntax in addition to content in the form of electronic data definitions. As such, syntax may be reused and shared among other systems. Sharing becomes particularly useful when moving beyond a controlled set or when communication with external entities becomes necessary. Perhaps the program has become larger or more complex, and it becomes necessary to record revisions, updates and to trace changes. The management team may be more distributed or experience more turnover. It is more important at this level to pass clear and detailed information regarding the interfaces onto others.

The FACE Data Architecture addresses all subsequent forms of data modeling. This includes electronic data representations (e.g. syntax of data presented at the Transport Service APIs) as well as the modeling structures necessary to document semantics. There are four levels of maturity dedicated to characterizing data models. Although it may be possible to build valid (or even standard-conformant) data models, all data models are not created equal as some provide greater flexibility and robustness than others.

3: Electronic Data Definition

Electronic data definitions encompass many common types of electronic information such as interface definition language (IDL), extensible markup language (XML), XML schema definition (XSD), and other types of syntax-based documentation.

An electronic data definition adds the rigor of syntax, enabling the interface documentation to be machine- readable and thus providing the opportunity to leverage additional automated processing of the interface documentation. The electronic data representation can be utilized for in-memory type representations as well as encapsulation specifications for signal exchange.

Electronic definitions also provide the benefit of rudimentary reusability. Specific portions of the documentation, including classes and nodes may be shared. Documentation of such information prevents unnecessary duplication of effort as components are reusable.

Additionally, electronic data definitions can often be used to generate data type manipulation software, which is based solely on the structure where no semantic understanding of the data is expected. While the addition of syntax has huge benefits over its predecessor, this format must still be accompanied by an ICD or IDD document in order to provide the additional information necessary to actually implement it.

Figure 3: Electronic data definition example in the form of a schema

4: Interface-centric Model

An interface-centric model is a data model built with a focus on the interface definition (commonly a message). The entities in this type of data model directly mirror the interfaces. In this way, programs can share information where message sets and model components are reusable.

However, the model’s reuse is still severely limited. Because the model was built with the interface’s patterns in mind, when an interface changes, the model should be rebuilt in order to continue to mirror its corresponding interface.

It is possible to realize this type of data model using the FACE Data Architecture. To create such a model, one would create entities that mirror the interfaces across all levels of abstraction within the data model.

Figure 4: Example of an Interface-centric Data Model

Although this construction is sufficient to pass the model conformance checks (i.e. this satisfies the meta- model and OCL rules), it does not add any semantic information to the approach.

Consider for a moment that the interface is a cooking recipe. As the model is designed to reflect specific messages, an IDML4 grocery store is designed to reflect specific recipes. The grocery store is organized by recipe with one aisle for chili, a second aisle for meatloaf, and a third for lasagna. The fact that all these recipes require ground beef or tomatoes, or onions is irrelevant. Each ingredient is found in each aisle. With this design, although recipes can be reused, changes to or variations on a recipe require a change to the grocery store layout. When we want to add a new chili recipe, we might need one aisle for White Chili and another aisle for Texas Chili.

It is because of this intrinsic weakness of the interface-centric model that, beyond IDML4, traditional methods of data modeling begin to break down.

5: Entity Model with Direct Projections

In order to advance any further in maturity, a message model must be wholly redesigned and rebuilt as a true entity model. At IDML5, the organization of data is reconsidered.

In IDML 5, the grocery store is now organized through commonality – one aisle for fresh produce, a second for meats, and a third for spices. This allows for reuse, not only of a recipe, but of the ingredients that make up a recipe. As such, in the model we can not only reuse entities but also their corresponding attributes. And when we change the recipe, it does not affect the layout of the grocery store. Just as, when we change our desired message set, our model need not necessarily change.

In the Entity Model with Direct Projections, entity attributes directly project to the message attributes they represent. To prepare the recipe, we go directly to the aisles and gather the ingredients. Because ingredients are organized by commonality, we know where to find each ingredient quickly and efficiently. The same holds true for entities and their attributes. This reorganization now allows for basic data model reusability, across multiple message sets, all utilizing the same building blocks.

Figure 5: Example of a Data Model with Simple Semantics

This style of model is accommodated by the FACE Data Architecture. As each interface is documented, it is projected directly from the entity it represents. Although these types of projections are not highly expressive, they do capture the basic semantic of the attribute. For example, this might capture the mass of an aircraft.

Advanced: Levels 6-7

The last two Maturity Levels covered in the scope of this paper consist of what may be referred to as semantic architecture solutions. These architectures are characterized by the introduction of semantics. It is these semantics that provide context for the existing content and syntax and give meaning to the message sets. It is at this point in the IDML that the model becomes more rigorous, testable and fully reusable.

With the addition of this rigor, standards such as FACE begin to emerge to provide the ability to leverage new and interesting ways to characterize semantically unambiguous data. No longer are the models constrained by message-centric structures. Instead, the models are now free to be constructed with universally agreed upon structures that can be flexibly navigated with as much specificity as needed.

While IDMLs are on a continuum of improvement of data specificity, it helps to present practical examples of what these levels look like. The emergence of FACE 2.1 provided a first step into IDML6 as a practical example of how this additional rigor can be realized and utilized. It is here that we begin to see how specific representations of data can be leveraged in new and exciting ways. With the additional improvements in the FACE 3.0 architecture, we begin to see a movement towards IDML7. The leveragability of semantic specificity is exponentially expanded, providing even more rigor, flexibility and capability.

IDML6: Entity Model with Containment

An Entity Model with Containment is a well-structured data model that, for the first time, provides semantic meaning or context to the message sets. Entities in the model reflect their real-world analog, and message attributes project from their corresponding attributes through composed entities. It is this containment that builds the context of the attribute.

To use the recipe analogy, to get what we need for a specific recipe, we identify a grocery store, enter that store and then go to the aisles necessary for specific ingredients. The grocery store provides context. In this case, it provides the name and / or location where we purchase the ingredients. In a more relevant example, when requesting an engine temperature, a temperature of “an” engine provides no context or meaning.

However, if that temperature is requested from a specific engine on a specific air vehicle, we better understand the context and therefore the message itself.

IDML7: Entity Model with Relationships

An Entity Model with Relationships extends the model concept even further. At this level, additional semantic behavior becomes fully machine readable and understandable. Not only are we able to trace messages through their containment to gain added context, but we can now also trace them through related contexts.

In the case of the grocery store, it may be important to us to “Shop Local.” We have a desire to purchase locally sourced produce for our recipe. Knowing that we can purchase the listed ingredients from the corner grocery store no longer provides sufficient information. We want to know on what farm those ingredients were grown. Many grocery stores now provide shoppers with information about the relationship they have with their food sources… “McCutcheon’s Farm Tomatoes” or “Locally Raised Eggs.” Understanding the relationship adds context and helps us meet the requirements of our recipe. In IDML7, the message is traced, not only through its containment (the grocery store) but also through a related context (a relationship with the source).

Similarly, a message may be interested in a video product generated by a specific camera on a specific air vehicle. In this case, IDML6 falls short since it requires relying solely on containment for its context. While the specific camera on a specific air vehicle can be characterized by containment through composition, the semantics of the relationship between the specific camera and its video product cannot be captured in the same way. A video product is not “contained” or composed into a specific camera, but rather the video product has a relationship to that camera through the generation of that video. In this way, the message could receive the context of the containment of the specific camera in the specific air vehicle and also the context of relationship between the specific camera and the video product.

But wait… There’s more!

While this paper ends with IDML7 as the highest level of interoperability, this by no means implies there are no levels beyond what is described here. As with database construction and technology, there are normal forms that can be applied over and above IDML7 to data architectures that can open up new and exciting ways to leverage the future of interoperability. We’ll introduce these in forthcoming follow-on paper.

Putting it in Context

This paper briefly introduces the IDML framework for consideration and feedback. As we collectively move towards increasingly complex models and content, the need intensifies for a common framework through which we can characterize the solutions we build. It is worthwhile to spend time considering how these models differ from one another in purpose, construction and application as well as to examine the utility and limitations of each.

The objective of the IDML is not to endorse IDML7 as the “Self-Actualization” in Maslow’s hierarchy (Maslow) of data architectures. Rather, the characterizations provided in the IDML are intended to help implementations thoughtfully design and build models that are truly fit-for-purpose. The levels have been introduced to provide a firm footing and understanding of the capabilities and limitations of each particular solution.

As FACE continually works to improve data architecture designs, the IDML proposes a framework with the goal of helping to enrich our conversations and aid in our efforts to build the practical data architecture designs we need today and to anticipate the advanced best data architecture practices of tomorrow.

References

Maslow, A. H. A Theory of Human Motivation (1943). A Theory of Human Motivation. Psychological Review, 50(4), pp.370-396.

Sadin, Stanley R.; Povinelli, Frederick P.; Rosen, Robert (October 1, 1988). "The NASA technology push towards future space mission systems, presented at the IAF, International Astronautical Congress, 39th, Bangalore, India, Oct. 8-15, 1988".

Interface Documentation Maturity Levels (IDML): An Introduction