Skip to content

Update: Secure AI/ML Model Ops Cheat Sheet #1781

@maheshkukreja

Description

@maheshkukreja

What is missing or needs to be updated?

Quite a few suggestions, need inputs on how to best handle things - should we break it into multiple cheatsheets or maintain a single cheatsheet?

  • RAG infrastructure security not covered (vector DB isolation, retrieval pipeline hardening, index integrity)
  • Vector/Embedding weaknesses missing (inversion, poisoning, cross-tenant leakage in vector DBs)
  • Unbounded consumption / denial-of-wallet not addressed (runaway chains, token floods, recursion)
  • Model supply-chain gaps (unsafe deserialization/pickle, model-malware scanning, provenance)
  • Privacy/extraction testing absent (training-data extraction, embedding inversion; unlearning playbooks)
  • Compliance mapping outdated (NIST GenAI Profile, ISO/IEC 42001, EU AI Act timelines)
  • Hardware/runtime isolation absent (GPU tenancy, device-memory scrubbing, sandboxing)

How should this be resolved?

  • Add "RAG Infrastructure Security": vector DB tenant isolation, retrieval pipeline hardening, index integrity controls, trust boundary enforcement
  • Add "Vector & Embedding Security": treat embeddings as sensitive; tenant-scoped namespaces; RBAC/MAC; encryption in transit/at rest; bulk-export limits; outlier/poison detection; hybrid retrieval & diversity controls
  • Expand "Inference API Security" with Unbounded Consumption: per-tenant budgets; token/output caps; recursion/chain-depth limits; kill-switches; real-time cost telemetry & alerts
  • Strengthen "Model Storage & Artifacts": prefer safetensors; block unsafe deserialization; pre-ingest malware scanning; signature & hash pinning; registry provenance
  • Enhance "Monitoring & Logging": security telemetry (tokens, tool calls, outbound domains, IPI indicators); privacy-first logging/redaction; retention & purge workflows
  • Expand "Incident Response & Governance": periodic extraction/inversion tests; unlearning/rollback procedures; map controls to NIST AI-600-1 & ISO/IEC 42001; note EU AI Act milestones
  • Add "Hardware & Runtime Isolation": avoid cross-tenant GPU sharing on affected devices; device-memory scrubbing; microVM/gVisor sandboxing; confidential compute where available

Metadata

Metadata

Assignees

No one assigned

    Labels

    ACK_WAITINGIssue waiting acknowledgement from core team before to start the work to fix it.HELP_WANTEDIssue for which help is wanted to do the job.UPDATE_CSIssue about the update/refactoring of a existing cheat sheet.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions