CLOSURE Toolchain User Manual for C++ Language

Prototype Release Version (v1.0)

Peraton Labs

May 8, 2024

1 CLOSURE Toolchain Overview

1.1 What is CLOSURE?

DARPA’s Guaranteed Architecture for Physical Systems (GAPS) is a research program that addresses software and hardware for compartmentalized applications where multiple parties with strong physical isolation of their computational environment, have specific constraints on data sharing (possibly with redaction requirements) with other parties, and any data exchange between the parties is mediated through a guard that enforces the security requirements.

Peraton Labs’ Cross-domain Language extensions for Optimal SecUre Refactoring and Execution (CLOSURE) project is building a toolchain to support the development, refactoring, and correct-by-construction partitioning of applications and configuration of the guards. Using the CLOSURE approach and toolchain, developers will express security intent through annotations applied to the program, which drive the program analysis, partitioning, and code auto-generation required by a GAPS application.

Problem: The machinery required to verifiable and securely establish communication between cross-domain systems (CDS) without jeopardizing data spillage is too complex to implement for many software platforms where such communication would otherwise be desired. To regulate data exchanges between domains, network architects rely on several risk mitigation strategies including human fusion of data, diodes, and hypervisors which are insufficient for future commercial and government needs as they are high overhead, customized to specific setups, prone to misconfiguration, and vulnerable to software/hardware security flaws. To streamline the design, development, and deployment of provably secure CDSs, new hardware and software co-design tools are needed to more effectively build cross-domain support directly into applications and associated hardware early in the development lifecycle.

Solution: Peraton Labs is developing CLOSURE (Cross-domain Language-extensions for Optimal SecUre Refactoring and Execution) to address the challenges associated with building cross-domain applications in software. CLOSURE extends existing programming languages by enabling developers the ability to express security intent through overlay annotations and security policies such that an application can be compiled to separable binaries for concurrent execution on physically isolated platforms.

The CLOSURE compiler toolchain interprets annotation directives and performs program analysis of the annotated program and produces a correct-by-construction partition if feasible. CLOSURE automatically generates and inserts serialization, marshaling, and remote-procedure call code for cross-domain interactions between the program partitions.

1.2 The C++ Challenge

As stated in [1], “writing a tool capable of assuring a property that falls out of the scope of the standard type checking performed by a compiler is a task comparable to writing the compiler itself.” This is especially true with respect to C++ where an extensive list of object-oriented features makes it difficult to conduct the required program analysis to track the control and data dependencies incurred by multiple inheritance, polymorphism, templates, exceptions, overrides, etc. Unlike C and Java for which extensive research and tools for formal dependency modeling exists [2] [3], the set of such tools for C++ are sparse. While a large subset of the C language has been formalized [4] [5], formalisms for analysis of C++ programs [1] [6] handle a limited subset of the C++ language. For example, cpp2v [7] does not support templates, virtual inheritance, floating point numbers, and many other critical features. Other approaches to formalize C++ for analysis exhibit other limitations that fall short of their applicability to real-world C++ programs. The alternative approach of defining type extensions and rewriting the program based on those extensions so that the type checker can check for properties is not pragmatic due to the learning curve, lack of tooling support for non-standard extensions, and cost of rewriting legacy programs in the new dialect. Much of the CLOSURE analyses occur at the LLVM-IR level, and that generated by C++ is far more difficult to understand without proper tools (names are mangled, classes and objects are refactored, etc.). Further complicating the effort is the lack of reflection in C++ (afforded to us in Java) for explicitly describing the types of fields and structure of opaque objects as they are marshalled and serialized for streaming communications across guard devices.

In the final months of GAPS Phase 3, we turned our focus to that of a preliminary CLOSURE C++ toolchain proof-of-concept design. In the following sections, we document our findings, initial approach, and early implementation. Our proof-of-concept toolchain builds around a modern C++ frontend AST, though we stick to programs of C++98 complexity, incrementally adding support for modern features as they become of interest. Several of the toolchain steps are conducted manually with plans to automate them in future research.

Our approach follows the same high-level workflow that has proven successful for us in C and Java:

  • Build a program model representation that consists of information extracted from the AST and points-to analysis
  • Extend application of CLOSURE Language Extensions (CLE) to Object-Oriented constructs (building upon our work for a subset of Java) to cover a subset of C++ adequate to meet the transition partner code
  • Extend the formal constraint model for C++ constructs
  • Code generation to manage cross-domain object instantiation, destruction, and methods

1.3 Architecture

The CLOSURE architecture for C++ follows the same workflow that has proven successful for C and Java with language-specific differences in annotation, modeling, constraint, and code-generation steps as illustrated in Figure :

Figure: CLOSURE Architecture

The key sub-modules of the C++ toolchain include:

  • CVI (CLOSURE Visual Interface): The editor built on top of VSCode [8]. Provides the IDE and co-design tools for cross-domain software development.
  • MULES (Multi-Level Security Source Tools): The tools used to analyze the annotated source code to produce enclave assignments. The Clang API, along with SVF for pointer analysis, is used to generate the program dependency model. Constraint solvers (MiniZinc and Z3) take as input the program dependency model and security policy constraints to produce enclave assignments.
  • CAPO (Conflict Analyzer Partition Optimizer): The constraint-based conflict analysis tools used to determine if a partitioning is feasible. Additional tools in CAPO auto-generate the additional logic needed to make a program cross-domain enabled (i.e., data type marshalling/serialization, RPCs for cross-domain data exchange, interfacing to the device drivers of cross-domain guards, DFDL [9] and rule generation, among others). CAPO also includes a post-partitioning verifier which checks that the partitioned program including auto-generated code is functionally equivalent to complies with developer security annotations.
  • MBIG (Multi-Target Binary Generation): Currently, for C++, MBIG is limited to x86 but could be extended to match the features of the C toolchain.
  • HAL (Hardware-Abstraction-layer): Abstracts hardware APIs of different cross-domain hardware devices. Currently, for C++, HAL has been tested in the minimal, localhost-mode using a single machine, though could easily extend to the other hardware supported in C toolchain.

Notable differences between the C++ proof-of-concept toolchain and its C/Java counterparts include:

  • Annotation of fields, methods (including constructors/destructors), and class definitions
  • Leveraging of the Clang AST to facilitate annotation and program analysis
  • A translation layer between LLVM registers and C++ declarations
  • Object-oriented constraints
  • RPC/Serialization code conducive to C++ semantics

Further details of the workflow are presented in following section.

1.4 Workflow

Like the CLOSURE architecture, the CLOSURE workflow for C++ mimics the workflow that has proven successful for C and Java with language-specific differences in tooling. The Clang API is used, along with SVF for pointer analysis, to interpret annotations and generate the program dependency model. Along with MiniZinc, Z3 has been added as a second constraint solver and is useful for diagnostics. The workflow is illustrated in Figure :