This document synthesizes learnings from Microsoft's Ontology-Playground, W3C standards (OWL, RDF), Azure Digital Twins (DTDL), and our own SDK's context graph and ontology extraction pipeline. It is aimed at engineers who want to understand how ontologies work, why they matter for AI agent systems, and how the BigQuery Agent Analytics SDK applies these concepts.
- What Is an Ontology?
- The Three Graph Models
- Ontology Standards Landscape
- Design Patterns for Ontologies
- Microsoft Ontology-Playground: Case Study
- Context Graphs for AI Agent Systems
- How the BQAA SDK Implements These Concepts
- Comparing Approaches
- Recommended Reading
An ontology is a formal, explicit specification of a shared conceptualization of a domain. In practical terms, it defines:
- Entity types (classes) — the kinds of things that exist (Customer, Order, Product)
- Properties (attributes) — typed data each entity carries (name: string, amount: decimal)
- Relationships — how entity types connect to each other (Customer places Order, Order contains Product)
- Constraints — rules about cardinality, valid values, required fields, and type hierarchies
The key distinction from a database schema: an ontology captures meaning, not just structure. It answers "what does this thing represent in the real world?" rather than "what columns does this table have?"
Without an ontology, the word "diagnosis" could mean a medical finding or a software analysis. With one, the type system disambiguates. Research shows ontology-grounded knowledge graphs achieve 98% accuracy vs. 37% for ungrounded LLMs, reducing hallucination from 63% to 1.7% (PubMed 2025, clinical study).
Ontologies give AI agents:
- Semantic grounding — formal definitions constrain LLM output
- Multi-hop reasoning — traversal follows typed relationships, not just vector similarity
- Auditability — every extracted entity has a type, provenance, and position in the graph
Three distinct but related concepts are often conflated. Understanding their differences is essential.
A graph where both nodes and edges carry key-value properties. Nodes have labels (types); edges have types and direction.
(alice:Customer {name: "Alice", tier: "Gold"})
-[:PLACES {date: "2026-01-15"}]->
(order1:Order {total: 99.50})
- Standards: No universal standard yet (Neo4j, TinkerPop, ISO GQL emerging)
- Strengths: Developer-friendly, performant traversal, rich edge properties
- Used by: Neo4j, BigQuery Property Graphs, Amazon Neptune, TigerGraph
A network of real-world entities grounded in semantics. Instance data structured according to an ontology.
:alice rdf:type :Customer .
:alice :places :order1 .
:order1 :total "99.50"^^xsd:decimal .
- Standards: RDF triples (subject-predicate-object), SPARQL for querying
- Strengths: Formal semantics, automated reasoning, cross-domain linkage
- Used by: Google Knowledge Graph, Wikidata, enterprise knowledge management
The abstract schema that defines what types of things exist and how they relate. Not the data itself.
:Customer rdf:type owl:Class .
:places rdf:type owl:ObjectProperty ;
rdfs:domain :Customer ;
rdfs:range :Order .
ontology + instance data = knowledge graph
An ontology is the schema; a knowledge graph instantiates it with concrete entities. A property graph is an implementation format for either.
The W3C standard for defining ontologies on the semantic web. Built on RDF and RDFS, grounded in Description Logic.
| Construct | Purpose | Example |
|---|---|---|
owl:Class |
Entity type | Customer, Order |
owl:ObjectProperty |
Relationship | places, contains |
owl:DatatypeProperty |
Attribute | hasName, totalAmount |
rdfs:subClassOf |
Inheritance | PremiumCustomer ⊂ Customer |
| Cardinality restrictions | Constraints | "exactly one identifier" |
owl:equivalentClass |
Equivalence | Two classes represent the same concept |
owl:disjointWith |
Exclusion | Customer and Product cannot overlap |
Key characteristic: Open World Assumption — what is not stated is not assumed false. This is the opposite of databases (Closed World Assumption).
OWL has three expressiveness profiles:
- OWL Lite — classification hierarchies, simple constraints
- OWL DL — full Description Logic, decidable reasoning
- OWL Full — maximum expressiveness, undecidable
The data model underlying OWL. Everything is a triple:
(subject, predicate, object). Serialized as RDF/XML, Turtle, JSON-LD,
or N-Triples.
Microsoft's JSON-LD-based modeling language for Azure Digital Twins. Designed for IoT and building management.
| Construct | Purpose |
|---|---|
| Interface | Top-level model unit (like a class) |
| Property | Stateful attribute with schema type |
| Telemetry | Time-series data (no backing storage) |
| Relationship | Typed edge to another Interface |
| Component | Embedded sub-interface (composition) |
| Command | Invocable operation |
Identification: Digital Twin Model Identifiers (DTMIs) —
dtmi:com:example:Room;1 — globally unique and versioned.
Composition:
extendsfor inheritance (up to 10 levels deep)- Components for embedding sub-interfaces
- Feature extensions (v3) for semantic types and units
W3C standard for validating RDF graphs against a set of constraints ("shapes"). Used alongside OWL when you need closed-world validation on an open-world model.
| Dimension | OWL/RDF | DTDL | YAML (BQAA SDK) |
|---|---|---|---|
| Data model | Triple graph | Property graph (JSON-LD) | Property graph |
| Expressiveness | Full DL axioms | Practical/constrained | Minimal/focused |
| Reasoning | Automated (DL solvers) | None | None |
| World assumption | Open | Effectively closed | Closed |
| IoT integration | Requires SSN/SOSA | Native | N/A (agent traces) |
| Developer accessibility | Steep | Moderate (JSON) | Low (YAML) |
| Physical bindings | No | No | Yes (BQ tables) |
These patterns are consistent across Microsoft's Ontology-Playground, Azure Digital Twins, and our own SDK:
| Element | Convention | Good | Bad |
|---|---|---|---|
| Entity types | Singular, TitleCase | Customer |
tbl_cust, Customers |
| Properties | camelCase, descriptive | loyaltyTier |
lt, TIER_LVL |
| Relationships | Verb-based, short | places, contains |
ownership, linked_to |
| IDs | Lowercase slugs | customer, work-order |
CUST_01 |
One entity, one concept: Avoid "god entities" with 30+ unrelated properties. If an entity models multiple real-world things, split it.
Sweet spot sizing: 3-8 entity types per ontology, 1-2 relationships per entity. Larger ontologies should be composed from smaller modules.
Identifier discipline: Every entity type must have at least one
primary key column. The SDK's KeySpec.primary accepts a list, so
composite primary keys are supported (e.g., [region, customer_id]).
This is similarly enforced in the Ontology-Playground and Azure Digital
Twins, though those ecosystems typically use a single @id field.
Extension over modification: Never modify a base ontology directly.
Add new interfaces that extend existing ones. This pattern is
critical for Azure Digital Twins industry ontologies (RealEstateCore)
and maps to OWL's rdfs:subClassOf.
| Pattern | Description | Example |
|---|---|---|
| Inheritance | Child type inherits parent's meaning/labels | PremiumCustomer extends Customer |
| Composition | Embed sub-models as named components | DTDL Components |
| Modular decomposition | Independent ontology modules with defined dependencies | FIBO subsets |
| Progressive building | Start small (3 entities), add 2-3 per iteration | Ontology-Playground lab steps |
- God entity: One class with 30+ properties modeling multiple concepts
- Generic names:
Item,Thing,Record,Data - Noun-based relationships:
ownershipinstead ofowns - Missing identifiers: Entities without primary keys
- Over-modeling: Not every internal table needs an ontology entity
- Circular one-to-ones: Probably the same entity — merge them
The Ontology-Playground is an open-source web application for learning ontology concepts and authoring ontologies for Microsoft Fabric IQ.
The current repository is best understood as a static ontology platform, not just a diagram editor. It is a React + TypeScript + Vite application with zero backend by default and a build-time compilation pipeline:
catalogue/stores curated ontology examples and metadata, which are compiled into a searchable catalogue with deep linkscontent/learn/stores markdown learning content and quizzes, which are compiled into the "Ontology School" experiencescripts/handles catalogue compilation, learning-content compilation, and RDF round-trip validationsrc/implements the graph explorer, visual designer, command palette, routing, and embeddable viewerpublic/hosts static assets for GitHub Pages / static-site deploymentapi/is optional and supports advanced flows like AI-assisted ontology building or GitHub contribution helpers when enabled
This separation is a useful design pattern: ontology content, pedagogy, and visualization are versioned as source artifacts, then compiled into static JSON/HTML assets for distribution.
The current repo exposes several distinct product surfaces:
- Interactive Graph Exploration — Cytoscape.js-based ontology visualization with search, click-to-inspect, and shareable routes
- Visual Ontology Designer — split-pane editor with undo/redo, validation, RDF/XML export, and JSON export
- Ontology School — structured courses, progressive domain learning paths, quizzes, and presentation-mode articles
- Embeddable Widget — a standalone viewer that can be injected into other web pages
- Catalogue Contribution Flow — GitHub sign-in and one-click PR creation for submitting ontology entries
- Natural Language Query Playground — early NL-to-ontology mapping interface for Fabric IQ-style interaction
For our learning document, this matters because the Playground is not just teaching OWL syntax. It is demonstrating how to package ontology knowledge as a developer product: examples, tutorials, embedded viewers, contribution workflows, and validation tooling all in one repo.
interface Ontology {
entityTypes: EntityType[]; // OWL Classes
relationships: Relationship[]; // OWL ObjectProperties
}
interface EntityType {
id: string;
name: string;
properties: Property[]; // OWL DatatypeProperties
// icon (emoji), color (hex) for visualization
}
interface Relationship {
from: string; // source entity ID
to: string; // target entity ID
cardinality: "one-to-one" | "one-to-many" | "many-to-one" | "many-to-many";
attributes?: Property[]; // properties on the edge itself
}The canonical format is RDF/XML. The Playground maps its model to OWL:
| App Concept | OWL Mapping |
|---|---|
| EntityType | owl:Class |
| Property | owl:DatatypeProperty (with rdfs:domain, rdfs:range to XSD types) |
| Relationship | owl:ObjectProperty |
| Ontology metadata | owl:Ontology with rdfs:label, rdfs:comment |
| Custom annotations | ont: namespace (ont:icon, ont:color, ont:cardinality, etc.) |
The ont: custom namespace extends OWL without breaking compatibility.
This is a clean pattern for adding application-specific metadata to
standard ontology formats.
-
Ontology as product, not just schema: The repo combines catalogue, designer, learning paths, embeds, and contribution flow. That is a useful lesson for enterprise ontology work: adoption depends on tooling and pedagogy, not just expressiveness.
-
Relationship attributes: Modeled as
owl:DatatypePropertywith anont:relationshipAttributeOfannotation linking them to theowl:ObjectProperty. This is a pragmatic solution since OWL cannot natively attach properties to object properties. -
Round-trip validation: The build pipeline serializes then re-parses every RDF file to verify fidelity. This catches subtle serialization bugs early.
-
Static-first distribution: The repo is designed to run as a static site with zero backend by default. This keeps ontology learning and visualization easy to deploy, fork, and review.
-
Progressive disclosure: Learning paths start with 3 entities and add 2-3 per step, with diff highlighting showing what changed. This mirrors our V4 ontology design philosophy.
-
Embeddability matters: The standalone viewer makes ontologies portable across docs, courseware, and apps. This is directly relevant to our own demos and educational materials.
| Domain | Example | Entities |
|---|---|---|
| Retail | Cosmic Coffee | Customer, Store, Product, Order, Region, Supplier |
| E-Commerce | Online commerce flow | Customer, Cart, Order, Product, Shipment |
| Healthcare | Clinical System | Patient, Provider, Encounter, Diagnosis, Medication |
| Finance | Banking | Account, Customer, Transaction, Branch, LoanProduct |
| Manufacturing | Industry 4.0 | WorkOrder, Machine, ProductionLine, Part, QualityCheck |
| Education | University | Student, Course, Instructor, Department, Enrollment |
External adapted ontologies: FIBO (financial industry), Schema.org (events, businesses, creative works), Pizza Ontology (classic OWL teaching example).
The "Ontology School" surface in the repo turns these examples into guided learning assets: core ontology fundamentals, six domain-specific learning paths, and a hands-on lab that progressively grows a supply chain ontology from a small seed graph to a richer domain model.
A context graph is a knowledge graph purpose-built for AI consumption. It extends a traditional knowledge graph with three additional layers:
Formal semantic contracts (via OWL classes, YAML specs, or DTDL interfaces) that make facts unambiguous. The ontology constrains what the LLM can extract and how it interprets relationships.
Subgraph extraction formatted for LLM context windows. Instead of dumping raw triples, the retrieval layer uses the ontology to:
- Follow typed relationships (semantic path traversal)
- Filter by entity type relevance
- Format as structured text (JSON-LD, Markdown tables, Turtle)
This is "Ontology RAG" — using the schema to guide retrieval rather than relying solely on vector similarity.
Agent actions, tool calls, decisions, and timestamps are encoded directly into the graph as first-class nodes. This makes agent interactions:
- Auditable — every decision has a typed node with provenance
- Temporally traceable — timestamps on nodes and edges
- Learnable — patterns in the graph reveal agent behavior
| Metric | Without Ontology | With Ontology |
|---|---|---|
| Factual accuracy | 37% | 98% |
| Hallucination rate | 63% | 1.7% |
| Multi-hop reasoning | Vector similarity only | Typed path traversal |
| Governance | Manual re-mapping | Formal definitions |
Source: PubMed 2025 clinical study on ontology-grounded KGs.
Recent research (arXiv, April 2025) proposes a neurosymbolic architecture:
- Role Ontology — what the agent is and can do
- Domain Ontology — entities, relationships, and constraints in the business domain
- Interaction Ontology — conversational patterns and workflow logic
Our SDK's context graph maps to this framework: the agent_events
schema defines the Role Ontology (event types, agent capabilities),
the YAML-driven extraction defines the Domain Ontology, and the
decision semantics (V3) capture the Interaction Ontology.
The SDK has evolved through three generations, each adding ontological sophistication.
The original design with two node types and two edge types:
TechNode (agent_events) --[Caused]--> TechNode
TechNode --[Evaluated]--> BizNode (extracted business entities)
- TechNode: Spans from the
agent_eventstable (LLM calls, tool invocations, agent completions). Keyed byspan_id. - BizNode: Business entities extracted by
AI.GENERATEper-span. Keyed byspan_id:node_type:node_value. - Caused: Parent-child span lineage from OpenTelemetry tracing.
- Evaluated: Cross-link from technical execution to business domain.
Entity types were hard-coded (Product, Campaign, Budget, etc.).
Added two new node types for EU audit compliance:
TechNode --[MadeDecision]--> DecisionPoint
DecisionPoint --[CandidateEdge]--> CandidateNode
- DecisionPoint: Captures where the agent chose between options.
- CandidateNode: Each option considered, with score, status (SELECTED/DROPPED), and rejection rationale.
- World-change detection: Fail-closed safety check for entity drift in long-running A2A tasks.
The current generation generalizes everything into a configuration-driven pipeline:
YAML spec
→ Schema Compiler (output_schema + extraction prompt)
→ AI.GENERATE (per-session extraction)
→ Hydration (JSON → typed nodes + edges)
→ Materialization (BigQuery tables)
→ Property Graph DDL (CREATE PROPERTY GRAPH)
→ GQL queries (MATCH patterns)
A single YAML file defines the entire ontology. The top-level graph:
wrapper is required by load_graph_spec():
graph:
name: ad_campaign_ontology
entities:
- name: Campaign
description: An advertising campaign
extends: MarketingAsset # label inheritance
binding:
source: campaigns # BigQuery table
keys:
primary: [campaign_id] # list — composite keys supported
properties:
- name: campaign_id
type: string
- name: budget
type: double
- name: status
type: string
relationships:
- name: targets
from_entity: Campaign
to_entity: Audience
binding:
source: campaign_audiences
from_columns: [campaign_id]
to_columns: [audience_id]
properties:
- name: bid_amount
type: doubleConfiguration-over-code: The YAML spec is the single source of truth. The same file drives prompt generation, output schema compilation, table DDL, property graph DDL, and GQL generation. No code changes needed to model a new domain.
Two-layer model separation: Spec models (schema-time Pydantic) are cleanly separated from extracted models (runtime Pydantic). This prevents tight coupling between the ontology definition and the AI extraction output.
Label-only inheritance: extends adds a parent label to a node
(like rdfs:subClassOf), but does NOT copy properties or bindings.
This maps directly to BigQuery Property Graph's multi-label support.
Union flattening: Since BigQuery's AI.GENERATE output schema
does not support anyOf / discriminated unions, all entity properties
are flattened into a single schema with an entity_name discriminator.
Type collisions across entities are detected at compile time.
Deterministic key-based IDs: Node IDs follow the pattern
{session_id}:{entity_name}:{k1=v1,k2=v2}. Edge references use the
same scheme via from_keys/to_keys. This makes deduplication
natural (last-write-wins for same key).
Session-scoped idempotency: Delete-then-insert ensures re-running extraction for the same sessions produces identical results.
All graph queries use BigQuery's native GQL on Property Graphs:
-- V2/V3: quantified-path traversal (1-to-N causal hops + business cross-link)
GRAPH agent_context_graph
MATCH (decision:TechNode)-[c:Caused]->{1,20}(step:TechNode)
-[e:Evaluated]->(biz:BizNode)
WHERE decision.session_id = @session_id
-- V4: spec-driven showcase query
GRAPH ontology_graph
MATCH (c:Campaign)-[t:targets]->(a:Audience)
WHERE c.session_id = @session_id| Dimension | Ontology-Playground | Azure Digital Twins | BQAA SDK V4 |
|---|---|---|---|
| Schema language | OWL (RDF/XML) | DTDL (JSON-LD) | YAML (custom) |
| Entity modeling | owl:Class |
DTDL Interface | EntitySpec |
| Properties | owl:DatatypeProperty |
DTDL Property/Telemetry | PropertySpec |
| Relationships | owl:ObjectProperty |
DTDL Relationship | RelationshipSpec |
| Inheritance | Not supported | extends (multi-parent) |
extends (label-only) |
| Physical binding | None | None (twin instance) | BQ table binding |
| Extraction | Manual authoring / AI builder | Manual upload | AI.GENERATE per-session |
| Materialization | Fabric IQ API | Azure Digital Twins API | BQ insert_rows_json |
| Query language | N/A (visualization only) | Azure DT Query Language | BigQuery GQL |
| Target audience | Business analysts | IoT / building engineers | ML / platform engineers |
| AI integration | Optional (GPT-4o-mini builder) | Event routing to downstream | Core (AI.GENERATE extraction) |
The most significant finding across all three systems: the same ontological infrastructure that powers digital twin platforms is precisely what AI agent systems need for semantic grounding. Digital twins model physical environments with formal schemas; context graphs extend this by making the knowledge AI-consumable and by capturing agent behavior as graph metadata.
The key difference is direction of data flow:
- Digital twins: sensors → ontology → digital twin graph → analytics
- Context graphs: agent traces → ontology → context graph → AI grounding
The ontology serves the same role in both: a formal contract that constrains interpretation and enables structured traversal.
- Azure Digital Twins — What Is an Ontology?
- Azure Digital Twins — Extending Ontologies
- Microsoft Ontology-Playground (live)