New Zealand | Building the Right Information Architecture for AI: A Security-First Approach

Join our community of 1,000+ IT professionals, and receive tech tips and updates once a week.

Building the Right Information Architecture for AI: A Security-First Approach

New Zealand | Building the Right Information Architecture for AI: A Security-First Approach

The Foundation That AI Rests On

Every AI system is only as good, and only as safe, as the data it can access. As organisations race to deploy Microsoft Copilot, AI agents, and automation workflows, the information architecture that underlies these systems has become a critical security domain.  

Get it wrong, and AI becomes a precision instrument for your adversaries: a system that knows exactly where your sensitive data lives, and already has the authorised access to retrieve it.  

Why Traditional Data Architecture Is Insufficient  

Legacy information architectures were designed for human-speed access patterns. A finance analyst queries the ERP system for a report. A support agent searches the knowledge base for an answer. These interactions are discrete, logged, and scoped.  

AI agents interact with data differently:  

  • They query at machine speed, potentially accessing large volumes of documents, in some implementations, thousands, in a single session  
  • They operate across boundaries that humans would never cross – simultaneously accessing HR data, customer records, code repositories, and external APIs  
  • Where persistent memory or shared context stores are in use, AI agents can retain and synthesise information across sessions, creating risk of context leakage between users that standard session controls do not address.  
  • They create new data artefacts - embeddings, summaries and even cached reasoning traces, records of the intermediate steps an AI model took to arrive at an answer, all of which may contain sensitive information but are not covered by standard data classification policies  

The Core Pillars of AI-Ready Information Architecture  

1. Data Classification That AI Can Enforce 

Most organisations have data classification frameworks – Public, Internal, Confidential, Restricted. In practice, many enterprise AI deployments do not enforce data classification at the system level, leaving classification labels unenforced at retrieval time.  

A secure AI information architecture maps classification labels to retrieval permissions at the system level. A Retrieval-Augmented Generation (RAG) system, for example, should not retrieve Restricted documents for a session that has only Confidential clearance. This requires:  

  • Metadata-aware vector stores that store classification labels alongside embeddings 
  • Query-time permission checks that filter retrieval results before they reach the model 
  • Dynamic permission scoping that adjusts based on the authenticated user’s role, not a shared service account  

2. Data Lineage and Provenance  

When an AI agent produces an output: a report, a code change, a customer communication – you need to know exactly what data influenced that output. This is not just a compliance requirement; it is a security one.  

Data lineage for AI means tracking:  

  • Which documents were retrieved to ground a response 
  • Which model version processed those documents 
  • What transformation or summarisation was applied 
  • Who initiated the session that produced the output  

Without this, you cannot investigate AI-assisted incidents, you cannot demonstrate regulatory compliance, and you cannot identify when a data breach occurred through an AI channel.  

3. Separation of Training and Inference Data

One of the most under-appreciated risks in enterprise AI is the conflation of training data and inference data. Sensitive operational data – customer PII, internal financials, M&A intelligence – should never enter a training pipeline without explicit de-identification and governance approval.  

Establish clear architectural boundaries:  

  • Inference data (what the AI reads at runtime) should be scoped by the current user’s permissions 
  • Training data (what the AI was trained on) should be treated as a permanent knowledge asset, requiring its own data governance lifecycle  

Feedback loops that use inference outputs to improve training must be explicitly governed – with legal, privacy, and security review    

4. Secure Vector Database Design 

RAG architectures are among the most widely adopted patterns in enterprise AI knowledge systems. They work by converting documents into vector embeddings stored in a vector database, which the AI queries at runtime.  

Vector databases are often the weakest link:  

  • They commonly use a single shared embedding space with no user-level access control
  • They may retain deleted document embeddings (the original is gone, but its semantic content persists)
  • They are rarely covered by standard Data Loss Prevention (DLP) policies  

Design principles for secure vector stores:  

  • Implement per-user or per-role collection-level isolation 
  • Apply right-to-erasure policies that remove embeddings, not just source documents
  • Include vector databases in your Data Loss Prevention (DLP) and Cloud Security Posture Management (CSPM) coverage  

5. Governance Structures That Enable AI

Strong information architecture is not just about access control – it is about creating the governance structures that allow AI to be used productively without becoming ungovernable.  

This means:  

  • An AI Data Governance Policy that defines what data AI systems can access, under what conditions, and with what logging requirements 
  • A Data Access Review process specifically for AI service accounts – reviewed quarterly, not annually 
  • An AI Data Inventory that maps every AI system to every data source it accesses  

The Security Payoff  

Organisations that invest in AI-ready information architecture do not just reduce risk – they unlock capability. When your data is well-classified, lineaged, and access-controlled, you can deploy AI agents with genuine confidence.  

Your AI systems become more accurate (they access the right data), more auditable (you know what they accessed), and more defensible (you can prove compliance).  

Insentra works with organisations to design information architectures that are secure by design, aligned to the AI Momentum Framework, so AI deployments are governed from day one, not retrofitted after the fact. Contact our team to discuss an Architecture Review for your environment. 

Hungry for more?

If you’re waiting for a sign, this is it.

We’re a certified amazing place to work, with an incredible team and fascinating projects – and we’re ready for you to join us! Go through our simple application process. Once you’re done, we will be in touch shortly!

Who is Insentra?

Imagine a business which exists to help IT Partners & Vendors grow and thrive.

Insentra is a 100% channel business. This means we provide a range of Advisory, Professional and Managed IT services exclusively for and through our Partners.

Our #PartnerObsessed business model achieves powerful results for our Partners and their Clients with our crew’s deep expertise and specialised knowledge.

We love what we do and are driven by a relentless determination to deliver exceptional service excellence.

New Zealand | Building the Right Information Architecture for AI: A Security-First Approach

Insentra maintains ISO/IEC 27001:2022 and ISO/IEC 27701:2019 certifications

We are proud to announce that Insentra has successfully maintained its ISO/IEC 27001:2022 and ISO/IEC 27701:2019 certifications