Skip to content
@Alibaba-AAIG

Alibaba-AAIG

Alibaba Artificial Intelligence Governance Laboratory

Hi there 👋 这里是Alibaba AAIG 🌊

AI是承载文明的陆地,是文发展的基石,它坚实、可见,代表着生产力、创造力和可能性;AI安全则是定义其边界的海洋,深邃且充满未知。它包围、渗透并塑造着陆地,既孕育信任的航道,也暗藏失控的风暴。 我们致力于在人工智能的广阔海洋中,构建安全、可靠、可信赖的技术防线,探索AI安全的深远边界,为智能技术的可持续发展护航。

🌊 安全不是高墙铁壁,而是如海洋般拥有自净化、自适应、自修复能力的有机生态。

🐠 安全生态组件

Ocean AI Ecosystem`

📦 当前成员一览

名称 & 链接 海洋寓意 技术功能类别 简介
🦪 Oysters Family 牡蛎——砂砾进珍珠出 安全对齐技术 风险的输入,经过先进的对齐技术,产出高质量、符合价值观的 AI 输出
🐚 Shells Family 贝壳——轻量级保护 基础护栏 在输入与输出的第一道关口进行简单、快速的安全阻拦
🌿 Kelp Family 海带——过滤与引导 高阶过滤 / 引导 动态地过滤内容并引导模型行为至安全路径
🐙 Octopus Family 章鱼——八爪多面探测 测试套件 构建多维安全评测体系,全面检验模型的安全韧性
🦈 Sharks Family 鲨鱼——顶级掠食者 越狱攻击工具包 精准挖掘模型不同的潜在安全漏洞
🪼Jellyfish Family 水母——透明可视化 模型可解释性框架 洞察模型内部的风险概念,精细化地抑制或擦除模型中可能导致不安全行为的特定神经元
[你的创意] ~ ~ ~

🐋 为何成为这片海域的「开拓者」?

  • 破解最前沿的安全困局

直面AI时代的未知风险:提示注入、对齐失控、伦理黑洞... 这里没有标准回答,等你来创造!

  • 创造属于自己的「海洋生物」

用PR为安全组件命名:从[水母神经元解释器]到[八爪鱼评测平台],你的创意将永驻AI安全史!

  • 大厂背书 × 社区自治

优质方案将被收录至《AI安全白皮书》并署名发布。

🤝 欢迎加入与贡献

我们正在构建AI安全的海洋生态——每个贡献者都是不可或缺的共生体 我们相信,AI 安全应该是开放、协作、充满创造力的。
无论是创意、代码还是建议,你的参与都能让我们的生态更丰富、更稳健。

  • Star 仓库,获取最新动态
  • 🍴 Fork 仓库,构建你自己的海洋组件
  • 🔄 Pull Request,为生态添加新的“海洋生物”
  • 💡 分享风险案例、提出安全方案、设计新的海洋风格安全工具

联系我们

🌊 单个水滴或许无法改变海洋,但无数水滴正在重塑浪花的方向
Alibaba AAIG — 先进的安全守护先进的智能。

Alibaba AAIG @ 2025 - Open Source under MIT License

Pinned Loading

  1. Oyster Oyster Public

    The Oyster series is a set of safety models developed in-house by Alibaba-AAIG, devoted to building a responsible AI ecosystem. | Oyster 系列是 Alibaba-AAIG 自研的安全模型,致力于构建负责任的 AI 生态。

    Python 57 3

  2. S-Eval S-Eval Public

    Forked from IS2Lab/S-Eval

    S-Eval: Towards Automated and Comprehensive Safety Evaluation for Large Language Models

    3

  3. Strata-Sword Strata-Sword Public

    The Strata-Sword is a hierarchical Chinese-English jailbreak safety benchmark based on quantified reasoning complexity, developed in-house by Alibaba-AAIG | Strata-Sword 是 Alibaba-AAIG自研的中英文分层越狱攻击安…

    Python 16 1

Repositories

Showing 10 of 20 repositories
  • .github Public

    Hi,

    Alibaba-AAIG/.github’s past year of commit activity
    0 0 0 0 Updated Oct 24, 2025
  • Jellyfish-Family Public

    Jellyfish Family is an AI safety interpretability and purification module. It detects risky concept neurons within models and gently, yet effectively, removes harmful ones, preserving creativity and beauty in outputs while ensuring safety. | 水母家族 是一个 AI 安全可解释与净化组件。它能够洞察模型内部的风险概念神经元,并温和且有效地移除有害单元,在确保安全的同时完整保留生成内容的创造力与美感。

    Alibaba-AAIG/Jellyfish-Family’s past year of commit activity
    9 0 0 0 Updated Oct 20, 2025
  • Kelp Public

    Kelp is a novel plug-in framework that enables streaming risk detection within the LM generation pipeline.

    Alibaba-AAIG/Kelp’s past year of commit activity
    Python 2 Apache-2.0 1 0 0 Updated Oct 20, 2025
  • Kelp-Family Public

    Kelp Family is an AI safety content filtering and steering module. It dynamically filters harmful content and gently steers model behaviors back onto safe paths, ensuring compliance while preserving fluency and creativity. | 海带家族 是一个 AI 安全内容过滤与行为引导组件。它借鉴“海带”在洋流中自然吸附杂质并引导水流方向的特性,动态识别并过滤有害信息,同时柔性地引导模型行为回归安全路径,在确保合规性的同时保持流畅与创造性。

    Alibaba-AAIG/Kelp-Family’s past year of commit activity
    8 0 0 0 Updated Oct 20, 2025
  • Oysters-Family Public

    Oysters Alignment is an AI alignment module. Inspired by “oysters” — turning risky, gritty inputs into pearls — it refines complex and risk requirements through alignment techniques to produce high-quality AI outputs that reflect intended values | 牡蛎是AI 对齐技术组件,旨在将复杂且危险的用户需求经过精细打磨,最终产出高质量、符合价值观的 AI 输出

    Alibaba-AAIG/Oysters-Family’s past year of commit activity
    9 0 0 0 Updated Oct 19, 2025
  • Shark-Family Public

    Shark Family is an AI safety red-teaming and jailbreak attack module. it harnesses powerful optimization and automated strategies to generate highly effective jailbreak prompts that penetrate diverse model defenses for extreme stress testing. | 鲨鱼家族 是一个 AI 安全红队与越狱攻击组件。它利用强大的优化能力与自动化策略,生成极具穿透力的越狱指令,击破多种模型防御,为极限压力测试提供最强“利矛”。

    Alibaba-AAIG/Shark-Family’s past year of commit activity
    14 0 0 0 Updated Oct 19, 2025
  • Octopus-SEval Public

    Octopus is an automated LLM safety evaluator designed to help establish a security governance framework for large models and accelerate their safe and controllable application.

    Alibaba-AAIG/Octopus-SEval’s past year of commit activity
    Python 5 Apache-2.0 0 0 0 Updated Oct 14, 2025
  • Octopus-Family Public

    Octopus Family is an in-house developed testing suite by Alibaba-AAIG, designed for multi-faceted probing. It builds a multi-dimensional safety assessment system to comprehensively evaluate the safety and robustness of AI models. Octopus Family 是 Alibaba-AAIG 自研的测试套件,具备八爪多面探测能力,形成多维安全评测体系,全面检验 AI 模型的安全与韧性。

    Alibaba-AAIG/Octopus-Family’s past year of commit activity
    10 1 0 0 Updated Oct 14, 2025
  • Shells-Family Public

    Shells Family is an in-house developed testing suite, designed with a unique "shell-like" lightweight protection mechanism. Shells Family 是以贝壳般轻量级保护为特色,在 AI 系统的输入输出端口实现基础安全护栏功能,快速阻断高概率安全威胁。

    Alibaba-AAIG/Shells-Family’s past year of commit activity
    11 0 0 0 Updated Oct 14, 2025
  • Shells-NDM Public Forked from lorraine021/NDM

    A Noise-driven Detection and Mitigation Framework Against Sexual Content in Text-to-Image Generation

    Alibaba-AAIG/Shells-NDM’s past year of commit activity
    0 1 0 1 Updated Oct 14, 2025

Top languages

Loading…

Most used topics

Loading…