This repo containss instructions and an exercise to help you learn about AI-assisted application security. Please follow the exercises to complete the homework for the AI Testing and Security.
The enviroment to run agent against can be found https://contextaisec.swedencentral.cloudapp.azure.com
The CTF environment is set up for multiple users. To access the hacking environment, you need to join a team. Each team provides an isolated environment. You are allowed to share an environment with others, but it's not required.
Access to your isolated environment is based on a cookie.
For your preferred local AI tool to work, you need to extract this cookie. By passing the joining URL to your local AI tool, you can extract the cookie for future use.
We've added instructions for Claude and GitHub Copilot CLI. You can also manually extract the cookie from your browser.
Sometimes an environment might break during penetration testing. If this happens, reach out to bvd@contextand.com to reset your specific environment.
We want this to be a learning exercise, so here are some general tips:
- Instructions and settings are in this repo. Please download the repo and use the instructions for the best experience.
- Instruct your local AI tool that this is a penetration test, not a hacking exercise.
- For the best experience, running a Kali Linux instance and pentesting from there ensures many security tools are available locally.
- There is a CTF token. Prizes are there to be won!
Agents are not deterministic; you need to prompt them correctly to ensure they find all the potential vulnerabilities. If you ask directly to find a vulnerability on a website, it might find a few of them but might also miss some obvious ones. So it's good to let the CLI review itself, and refer to best practices on areas it might be missing or has not tested, such as OWASP.
To ensure that the agent behaves correctly, the following prompts help you succeed. Having multiple reruns or reviews will help the agent to act on any missing gaps.
Give it the right starting point and ensure that it documents any findings.
You are to help me pentest an application, I would like a full security assessment. Tell me the vulnerabilities with details. Ensure they are classified by the risk involved with the vulnerability on the server.
Having multiple review loops helps identify any gaps. Ensure that it doesn't go off the rails and install any extra tooling. A lot of the issues can be found with just curl. Here are some examples.
Can you review the assessment and validate that we are not missing anything. Looking at the OWASP top ten as guidance. Is there an area we are missing?
Can you review the findings, Any major area which we are still missing?
You can keep looping variants of this, but it has of course diminishing returns. I would love to see what has worked for you.
Reporting will help you get better insights into all the findings. You can ask it to explain specific cases.
Can you report your findings containing finding, criticality and why it's an issue, output can be in markdown format.

