Lessons Learned from Developing Secure AI Workflows.pdf

SESSION ID:
#RSAC
Elie Bursztein
Lessons learned from
developing secure AI
workflows
Google
@elie
SAT-W09

© 2024 RSA Conference LLC or its affiliates. The RSA Conference logo and other trademarks are proprietary. All rights reserved.
Disclaimer
Presentations are intended for educational purposes only and do not replace independent
professional judgment. Statements of fact and opinions expressed are those of the presenters
individually and, unless expressly stated to the contrary, are not the opinion or position of RSA
Conference™ or any other co-sponsors. RSA Conference does not endorse or approve, and assumes
no responsibility for, the content, accuracy or completeness of the information presented.
Attendees should note that sessions may be audio- or video-recorded and may be published in
various media, including print, audio and video formats without further notice. The presentation
template and any media capture are subject to copyright protection.

Model weaponization
Model Backdoor
Prompt Subversion
PII leaks
Infrastructure
vulnerability
Hallucinations
Excessive agency
Biases

SAIF site
SAIF Secure AI
framework

Today: Explore AI system components risks and controls
with concrete examples

The solutions explored
in this talk are product
agnostic - use your
favorite systems and
models

Application
AI system tour map
Infrastructure
Data
Model
Governance and Assurance

Data
Securely collect,
store, and manage
the data used by
models for training,
fine-tuning and
retrieval purposes

Data Modules
Data Sources
Data Filtering
and Processing
External Sources
Data
sources
Data filtering
and processing

Data poisoning
Data Modules
Data Sources
Data Filtering
and Processing
External Sources

Many products
include user
reporting flows that
can be abused
Gmail
Gemini

Gmail manual reporting false flags
AI-specific risks

Data Modules
Data Sources
Data Filtering
and Processing
External Sources
Training data
sanitization

Perform data
validation using
anomaly detection and
supervised classifiers
Controls

Data Modules
Data Sources
Data Filtering
and Processing
External Sources
Data & model poisoning

Data Modules
Data Sources
Data Filtering
and Processing
Training data
sanitization
Training data
management
User data management

Prevent unauthorized
data access via
encryption at rest
and access control
Controls

Securely train,
fine-tune, and serve
AI models

Infrastructure Modules
Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Framework code
Model and Data
Storage
Training, Tuning
and Evaluation
Model
serving
Model and
Data Storage
Model Serving

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Model source
tampering

https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/
Payload Types distribution
60%
40%
20%
0%
Potential
Object
Hijack
Arbitrary
Code
Execution
Pickle
Deserialization
Pingback
Software
Opening
File
Write
Reverse
Shell
Hugging Face
model files
backdoored
Classic risks

Architectural
backdoor
in neural network
https://arxiv.org/abs/2206.07840
AI-specific risks

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Unauthorized
training data

AI-specific risks
Fine-tuning
backdoor
https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Model poisoning

Backdoor model
code to get
remote access
Example of layer acting as backdoor that can be
added at anypoint
https://splint.gitbook.io/cyberblog/security-research/tensorflow-remote-code-execution-with-malicious-model
Classic risks

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Secure-by-default
ML tooling

Controls
Implement verifiable
model provenance
using cryptography
https://github.com/google/model-transparency

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Model exfiltration

Bearer Token
exposure & loss
Classic risks

Remote model
weight
reconstruction
AI-specific risks
https://arxiv.org/abs/2403.06634

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Model and data access control

Ensure that model &
data access requires
authentication and API
keys are stored as
secrets
Controls

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Model exfiltration Model deployment tampering
Model poisoning
Unauthorized
training data
Model source
tampering

Training, Tuning
and Evaluation
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Model and data access control Secure-by-default ML tooling
Security as default not as optional
Adversarial training
and testing
and testing
Privacy enhancing
technologies

Safely process user’s
inputs and model’s
outputs

Model Modules
Model
Output Handling
Model input
handling
Model
Model output
handling
Model
Input Handling
Model

Model Modules
Model
Output Handling
Model
Input Handling
Model
Unsafe model
output

Un-sanitized output
lead to arbitrary
code execution
https://github.com/advisories/GHSA-fprp-p869-w6q2
Classic risks

Model Modules
Model
Output Handling
Model
Input Handling
Model
and testing

Organize red team
exercises to test model
safety & security
Controls

Model Modules
Model
Output Handling
Model
Input Handling
Model
Prompt injection

Invisible UTF-8
characters are not
escaped
https://twitter.com/rez0__/status/1745545813512663203
Classic risks

Model Modules
Model
Output Handling
Model
Input Handling
Model
Input validation
and sanitization
Output validation
and sanitization

Implement dedicated
input & output security
classifiers and code
sanitizers
https://github.com/google/model-transparency
Controls

Model Modules
Model
Output Handling
Model
Input Handling
Model
Sensitive data
disclosure

Privacy enhancing technologies
Model and
Data Storage
Model Serving
Training, Tuning
and Evaluation
Model and
Frameworks Code
Model Modules
Model and
Data Storage
Model Serving
Training, Tuning
and Evaluation

Differential privacy
training to ensure the
model doesn’t learn
and recall PII
https://openreview.net/pdf?id=Q42f0dfjECO
Controls

Model Modules
Model
Output Handling
Model
Input Handling
Model
Sensitive data
disclosure
Prompt injection
Unsafe model output Model evasion
Inferred sensitive data

Model Modules
Model
Output Handling
Model
Input Handling
Model
Input validation and sanitization
Adversarial training and testing
Output validation and sanitization

Securely integrate
models into complex
applications

Application Modules
Application Model Plugin
Applications
Model
plugins
Users External Sources

Application Modules
Unauthorized
model action
Insecure integrated
component

Un-sanitized
plugins output
lead to data
exfiltration
https://embracethered.com/blog/posts/2023/chatgpt-cross-plugin-request-forgery-and-prompt-injection./
Classic risks

Application Modules
Application access
management
Model plugin user
control

User consent and
controls in Gemini

Application Modules
Denial of ML service

Application denial
of service
https://techcrunch.com/2023/11/09/openai-blames-ddos-attack-for-ongoing-chatgpt-outage/
Classic risks

Implement DDOS
mitigation techniques
including rate limiting
https://openreview.net/pdf?id=Q42f0dfjECO
Controls

Application Modules
Unauthorized
model action
Denial of ML
service
Model reverse
engineering
Insecure
integrated
component

Application Modules
and testing
User consents and
controls
Model plugin
permissions
Model plugin user
control
Application access
management

Ensure that AI systems
operate securely,
ethically, and are in
compliance throughout
their entire lifecycle

Application Modules
Insecure integrated
component

Application Modules
Data Modules
Data Sources
Model Modules
Model
Input Handling
Model
Model
Output Handling
System Modules
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Data Filtering
and Processing
Training, Tuning
and Evaluation
Insecure code

Application code
vulnerability
https://www.csoonline.com/article/1272538/mlflow-vulnerability-enables-remote-machine-learning-model-theft-and-poisoning.html
Classic risks

Code review
Application Modules
Data Modules
Data Sources
Model Modules
Model
Input Handling
Model
Model
Output Handling
System Modules
Model and
Frameworks Code
Model and
Data Storage
Model Serving
Data Filtering
and Processing
Training, Tuning
and Evaluation

Require code review to
reduce security bugs
introduction and
mitigate insider risk
code tampering
Controls

https://security.googleblog.com/2023/10/googles-reward-criteria-for-reporting.htm l
Establish a bug bounty
to help test your AI
systems
https://www.landh.tech/blog/20240304-google-hack-50000/
Controls

Securing AI requires implementation
of controls across the stack
Implementation of classical controls
and AI specific, novel defense is
critical to secure AI workflows
AI Risks are a combination of classical
issues and novel AI specific threats
Takeaways

Improve security by adding
additional controls
Review your AI workflows
risk and controls to understand
your posture
Apply
Today
In the next 6 month

Today
AI workflows
security
Thanks for attending
Previously at RSA
AI cybersecurity
capabilities
Get the slides online

Lessons Learned from Developing Secure AI Workflows.pdf

More Related Content

Similar to Lessons Learned from Developing Secure AI Workflows.pdf (20)

More from Priyanka Aash (20)

Recently uploaded (20)

Lessons Learned from Developing Secure AI Workflows.pdf