Writing

A Guide to Auditing Generative AI

By Sachin Mehta • 12 Dec 2025 • 7 min read

AI Governance Audit Cloud & Platforms

The rise of generative AI has turned every office chat into a potential Black Swan for risk-aware auditors. Tools like Microsoft Copilot, Power Platform LLM agents, ChatGPT Enterprise or Google’s Gemini can supercharge productivity – but they also introduce hidden tail-risk. As Nassim Taleb might warn, an AI assistant could be a “turkey” enjoying 1,000 days of friendly data before Thanksgiving: one unexpected prompt leak can collapse the business. Recent real-world incidents highlight this danger: for example, a Hong Kong finance employee was fooled into sending $25 million by a deepfake “CFO” on a video call. Generative AI is projected to cost U.S. companies $40B in fraud losses by 2027 (32% CAGR from 2023). The message is clear: auditors must treat enterprise AI like any other high-stakes system – probing confidentiality, integrity and compliance.

Savvy auditors know the tech is only half the story. The regulatory environment is catching up fast. In financial services, the EU’s Digital Operational Resilience Act (DORA) mandates that all ICT risks including AI platforms be managed with the same rigor as critical banking systems. DORA requires encryption, backup, third‑party oversight and periodic resilience testing; generative AI systems fall squarely under its ICT risk and incident‑reporting rules. Similarly, frameworks like Microsoft’s Responsible AI Standard and the NIST AI Risk Management Framework encourage “trustworthiness by design” i.e. bake in transparency, bias checks and auditability from Day 1.

In parallel, the EU AI Act (effective 2025/26) classifies AI systems by risk. For example, “chatbots” and general-purpose LLMs are currently treated as “limited-risk” systems, requiring only transparency obligations. In practice this means any employee interacting with ChatGPT or Gemini should be informed they’re talking to AI. However, if an LLM is used for high-stakes decisions (credit scoring, customer onboarding, interviews and hiring, medical advice, etc.), it may become “high-risk” and face stringent documentation, testing and human‑oversight mandates. Auditors must therefore map each AI use-case to the AI Act’s categories and verify compliance: e.g. confirming that required risk assessments and disclosures are in place.

Regulatory update — August 2025 The GPAI model provisions under EU AI Act Title VIII entered into force August 2025, adding obligations for general-purpose AI models with systemic risk designation (including GPT-4 class models and Gemini Ultra). These obligations include registration with the EU AI Office, technical documentation, training-data transparency, copyright compliance policies, and adversarial testing. The “limited-risk / transparency only” characterisation applies to AI systems in the narrow sense; GPAI models with systemic risk now carry the full Title VIII obligation set.

Generative AI risk often lurks in surprising places. Consider a prompt injection attack: a malicious user could craft input that bypasses safeguards and extracts confidential data or makes the model reveal hidden instructions. Or imagine an engineer using ChatGPT to draft trading algorithms: the AI’s unverified code introduces systemic risk (a Tiny modelling flaw could have cascading effects). Another scenario: a report hallucinated by Gemini omits a crucial risk factor, leading to a blind spot in the balance sheet – a classic missing-antifragility trap. Practical research shows AI tools can leak data if not properly managed. For instance, generative systems often pull from vast context; if an insider prompts a Copilot with sensitive customer info, that data could end up inadvertently revealed (or stored) unless strict DLP is enforced.

We also must heed the “turkey problem”: many AI systems perform well until a sudden, extreme event occurs. The recent trend of ransomware and deep-fake financial frauds illustrates that legacy controls often fail against novel AI tricks. Algorithmic bias and model flaws are other silent dangers – if an AI-assisted loan decision tool systematically skews against a protected class, it could trigger compliance disasters. Auditors should thus challenge assumptions of normality and inspect the tail: deliberately pushing models with edge-case prompts to uncover hidden vulnerabilities.

Logging and Monitoring:

For Microsoft Copilot (Microsoft 365 Copilot, Copilot Studio, Copilot in Power Apps/Automate, etc.), auditability is built in. Microsoft automatically logs all user interactions with Copilot and connected AI apps into the Microsoft 365 Audit log. Each record notes who asked what and which files or data sources were accessed. Auditors should verify that Azure AD audit logging is enabled and that Copilot logs are retained (180 days by default for pay-as-you-go scenarios). Copilot Chat is even more transparent: both prompts and responses are stored in the user’s Exchange mailbox for eDiscovery/audit. These logs are critical evidence; as a control expectation, auditors should ensure logs are periodically reviewed for anomalies (e.g. unusual data accessed by AI).

Data Protection Controls:

Governance Features:

Data Ownership & Isolation:

Enterprise-tier LLM services are designed for corporate use. For ChatGPT Enterprise, OpenAI explicitly does not train on customer data by default, and the business retains ownership of all inputs/outputs. Google’s Gemini for Workspace similarly “keeps all prompts and content within your organization”. It will not share your data with other customers, nor use it to train external models, without permission. As a result, a key control is simply to ensure employees use the enterprise or business edition – not free public versions. Auditors should confirm that only licensed organizational accounts (with SSO/MFA) can access these AI apps.

Security & Compliance:

Both OpenAI and Google maintain robust compliance postures. ChatGPT Enterprise completed a SOC 2 audit and offers encrypted data storage (AES‑256 at rest, TLS in transit). Gemini for Workspace runs on Google Cloud with FedRAMP High authorization and inherits Workspace’s encryption and DLP controls. For example, Gemini automatically applies the organization’s existing data protection policies (scanning for malware/PII, enforcing region restrictions). Auditors should check that these integrations are active: e.g. verify that Gemini searches or email summaries respect Gmail’s confidential mode or DLP rules. They should also ensure logs of AI interactions feed into enterprise SIEM/Audit: Google’s Audit Logs or OpenAI’s activity reports. (For instance, Microsoft Purview now offers auditing for ChatGPT Enterprise as well.)

Human Oversight Controls:

Since outputs can be unpredictable, a core control is human review. Establish policies (and evidence of training) requiring that any AI-generated analysis or content used in decisions be vetted. Nightfall’s (nightfall.ai) guidance suggests having developers avoid feeding proprietary code to the AI and mandating code reviews of AI-written code for security bugs. In practice, auditors might sample AI-generated documents to check they have human approval stamps or to see if factual inaccuracies were corrected. It’s wise to use templated template prompts (with redacted fields) so real data isn’t exposed.

Testing Tip: Use the AI yourself – try querying ChatGPT Enterprise or Gemini with dummy confidential phrases. Confirm that (a) it refuses to reveal a password or access a file you don’t have permissions for, and (b) the interaction is logged in your corporate monitoring tool. Also, simulate a “hallucination” by asking for specific factual details and check if staff recognize the error.

All three platforms emphasize data protection by default, but auditors must verify the implementation: e.g. ensure only authorized users can invoke them, and that AI queries are contained within approved environments. The subtle differences above mostly reflect integration and control points – not ranking which is “best,” but outlining each tool’s context so auditors know what to audit.

In summary, as enterprises rush into the AI era, IT auditors should apply classic control principles to these new tools: enforce least privilege, log everything, keep humans in the loop, and map to standards (DORA’s resilience checks, EU AI Act’s transparency rules, IEEE/NIST AI risk standards, etc.). In Taleb’s terms, we must avoid fragility: anticipate failures and stress-test our AI controls. With thoughtful governance (see table above), auditors can turn generative AI from a liability into a fully controlled advantage.

Sources and Refernces:

Sources and References

Collaborate

Corrections, counterexamples, and build ideas welcome. sachin@rtapulse.com • Discussions • Issues • How to collaborate.

Disclosures

Practitioner opinion. Not legal or regulatory advice. No vendor relationships. Full disclosures.

Request a topic

Agentic AI: The Next Audit Frontier

Generative AI deployments have moved past single-turn chat interfaces. Agentic AI systems take sequences of actions autonomously, call tools, read and write files, send requests to external services, and operate across multiple steps without continuous human instruction. Microsoft Copilot Studio allows business users to build agents that query SharePoint, send emails, update CRM records, and call external APIs. AutoGen and similar frameworks build multi-agent pipelines where one AI model orchestrates the work of several others. In each case, the control surface is fundamentally different from a user typing a prompt and reading a response.

In January 2026, Singapore's MAS published the first formal agentic AI governance framework globally. It introduces a five-tier autonomy taxonomy ranging from human-in-the-loop to fully autonomous operation, with specific governance requirements at each tier. The framework addresses agent orchestration, tool access controls, audit logging for agent actions, and human override mechanisms. No equivalent framework exists in the EU or UK yet, but the direction of travel is established.

For IT auditors, agentic AI introduces four control questions that standard generative AI audit methodology does not cover.

Tool access governance. An agent's effective permissions are the union of all the tools it has been given access to. A Copilot Studio agent with access to SharePoint, Exchange, and an external CRM has a combined access scope that may far exceed what any individual user holds. The access review methodology needs to assess agent tool grants, not just user account permissions.

Action logging and audit trail. When an agent takes an autonomous action, the audit trail needs to capture what the agent did, the instruction that triggered the action, and the outcome. Microsoft 365 logs Copilot interactions in Purview, but coverage for custom agents varies significantly by implementation. The audit programme needs to verify that agent actions are logged at a sufficient level of detail to support investigation.

Prompt injection in agentic workflows. A standard chat interface prompt injection requires a user to interact directly with a compromised input. In an agentic workflow, the agent itself may fetch and process external content as part of its task. A malicious instruction embedded in a webpage, document, or email that the agent retrieves can redirect the agent's actions without any user involvement. This is a live risk in any agentic deployment that processes external content.

Human override and halt mechanisms. The MAS framework requires that agents at higher autonomy tiers have documented human override and halt mechanisms. The audit question is whether those mechanisms exist, whether they are tested, and whether they would work under operational conditions when an agent is mid-task. A halt mechanism that has never been exercised is not a control.

The EU AI Act's prohibited practice provisions entered into force in February 2025. They cover certain agentic AI use cases explicitly: AI systems that deploy subliminal techniques, exploit vulnerabilities of specific groups, or operate social scoring mechanisms are prohibited regardless of autonomy level. For financial institutions building AI agents that interact with customers, the prohibited practice boundary needs to be part of the design review, not an afterthought in the compliance assessment.

Updated March 2026 Agentic AI section added: Copilot Studio agent controls, MAS Singapore five-tier autonomy framework (January 2026), tool access governance, action logging, prompt injection in agentic workflows, and EU AI Act prohibited practice provisions (active February 2025). Core article content unchanged.

← All Field Notes

AI in IT Audits →