A fully autonomous AI agent that joins your Google Meet classes, marks attendance, answers questions, captures slides, and saves detailed notes β completely free.
Features Β· Architecture Β· Tech Stack Β· Project Structure Β· Setup Β· Usage Β· Screenshots Β· Troubleshooting
Google Meet AI Attendance Agent is a Python-based autonomous bot that joins your Google Meet sessions on your behalf. It uses a pipeline of free and open-source tools to:
- ποΈ Listen to the teacher via real-time audio capture and local transcription
- π§ Understand context β whether it's an attendance roll call or a question directed at you
- π¬ Respond automatically in the Meet chat with contextually appropriate answers
- πΈ See the screen and capture slide content via OCR every 60 seconds
- π Save timestamped class notes combining the audio transcript and slide text
The entire pipeline runs on locally-run open-source models, with zero recurring cost and zero cloud dependencies.
| Feature | Description |
|---|---|
| π€ Direct Join | Pass a Meet URL directly and the agent joins immediately |
| π€ Live Transcription | Captures system audio and transcribes with Faster-Whisper locally on CPU β no cloud STT, no API cost |
| π Smart Attendance Detection | 4-layer keyword engine (exact β pattern list β fuzzy-spaced β Levenshtein) catches every Whisper mis-transcription of your name |
| π€ Question Answering | Classifies audio and chat triggers as attendance or question, sends transcript context to local Ollama LLM, replies in 1 sentence |
| πͺ Identity System Prompt | Every Ollama call is pre-injected with a persona prompt so the bot always speaks as you in first person, never in third person |
| πΈ Slide OCR | Screenshots the screen every 60 seconds and extracts text with Tesseract |
| π AI-Generated Class Notes | At end of session, Ollama reads the full session log and writes a structured Markdown study guide exported as .txt + .pdf |
| π Ghost Mode | Joins with mic and camera off β completely silent to other participants |
| π‘οΈ Hallucination Filtering | Multi-layer output sanitizer strips prompt echoes, third-person self-references, and known Whisper garbage phrases before sending |
| βοΈ Configurable | Single config.py controls everything: your name, Whisper model size, Ollama model, headless mode, and more |
The diagram below shows the complete data flow from meeting URL input to notes generation:
flowchart TD
%% ββ Entry ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
subgraph ENTRY ["π Entry"]
A[main.py]
A --> REC{Recovery?}
REC -- Yes --> FIX[Fix logs\nRepair corrupted state]
REC -- No --> SENSES
end
%% ββ Senses βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
subgraph SENSES ["π‘ Senses"]
direction LR
AUD[Audio\nMicrophone stream]
SCR[Screenshots\nVisual frame capture]
CHT[Chat\nMessage listener]
end
FIX --> SENSES
ENTRY --> AUD
ENTRY --> SCR
ENTRY --> CHT
%% ββ Brain ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
subgraph BRAIN ["π§ Brain"]
direction TB
W[Whisper\nSpeech-to-text]
W --> CLS[Classifier\nIntent detection]
CLS -- Attendance --> FAST[Fast reply\nInstant response]
CLS -- Question --> OLL[Ollama\nLLM reasoning]
end
AUD --> W
SCR -. side log .-> LOG
CHT -. side log .-> LOG
%% ββ Output βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
subgraph OUTPUT ["π¬ Output"]
direction TB
INJ[Injection\nReply into chat]
INJ --> END{Meet end?}
END -- No --> INJ
END -- Yes --> PDF[PDF generator\nDetailed meeting report]
end
FAST --> INJ
OLL --> INJ
%% ββ Log files ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
LOG[Log files\nPersistent transcript]
W -. side log .-> LOG
LOG --> PDF
%% ββ Styles βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
style ENTRY fill:#f0f0ff,stroke:#9999cc
style SENSES fill:#e8f5e9,stroke:#81c784
style BRAIN fill:#fff8e1,stroke:#ffb74d
style OUTPUT fill:#e8f0fe,stroke:#7986cb
style FIX fill:#ede7f6,stroke:#9575cd,color:#000
style PDF fill:#c8e6c9,stroke:#66bb6a,color:#000
style LOG fill:#bbdefb,stroke:#64b5f6,color:#000
style FAST fill:#ffccbc,stroke:#ff7043,color:#000
style OLL fill:#ffccbc,stroke:#ff7043,color:#000
style W fill:#ffe0b2,stroke:#ffa726,color:#000
style CLS fill:#ffe0b2,stroke:#ffa726,color:#000
The diagram above also exists as a static image in the repo:
How the pipeline works, step by step:
- Entry (
main.py) β The orchestrator starts and immediately hits a Recovery? decision. If a previous session left corrupted logs or broken state, the agent repairs them first. Otherwise it launches all three sense threads simultaneously. - Senses β 3 parallel inputs β Audio captures the microphone/system stream via PulseAudio loopback. Screenshots take a visual frame every 60 seconds. Chat listens to the Meet chat panel for incoming messages. All three write side-logs to the persistent transcript continuously.
- Brain β Whisper STT β Raw audio chunks are transcribed by Faster-Whisper locally on CPU (INT8, Urdu language hint). The transcript is deduplicated, garbage-filtered, and passed to the classifier.
- Brain β Classifier / Intent Detection β
classify_and_respond()routes the transcript down one of two paths: Attendance (keywords likepresent,haziri,roll call) triggers the fast reply path; everything else goes to Ollama. - Fast Reply β Attendance gets an instant static response (
"Present sir, mic kharab hai.") with no LLM call β fastest possible path to the chat. - Ollama LLM Reasoning β Questions are passed to the local Ollama model pre-injected with the identity system prompt. The raw output is sanitized to strip hallucinations before returning.
- Output β Injection into Chat β The reply is typed into the Meet chat by Playwright. The agent then loops on Meet end? β continuing to respond until the meeting ends.
- Log Files / Persistent Transcript β OCR slide text, audio transcripts, and chat messages are all side-logged to a persistent file throughout the session.
- PDF Generator β When the meeting ends, the full persistent transcript is fed to Ollama's dedicated note-taker prompt, producing a structured Markdown study guide exported as a detailed PDF meeting report.
| Component | Tool | Cost |
|---|---|---|
| Browser Automation | Playwright | Free / Open-source |
| Audio Capture | PulseAudio virtual loopback | Free (Linux system tool) |
| Speech-to-Text | Faster-Whisper β CPU Β· INT8 Β· local | Free / No API calls |
| AI Brain | Ollama β local LLM, fully offline | Free / No API key needed |
| OCR | Tesseract OCR | Free / Open-source |
Google-Meet-AI-Attendence-Agent/
β
βββ main.py # ποΈ Entry point and orchestrator
βββ meeting_agent.py # π Playwright browser control β joins Meet, sends chat, takes screenshots
βββ audio_handler.py # π€ PulseAudio virtual loopback β captures system audio in chunks
βββ brain.py # π§ Faster-Whisper STT, Tesseract OCR, Ollama LLM, keyword detection, hallucination filtering
βββ config.py # βοΈ All configuration β edit this file before running
βββ download_model.py # β¬οΈ Pre-downloads the Whisper model (run once before first use)
βββ ollama # π¦ Ollama integration config / helper
βββ requirements.txt # π Python dependencies
β
βββ whisper-model-turbo/ # π¦ Whisper model weights (populated by download_model.py)
β βββ model.bin
β
βββ notes/ # π Auto-generated class notes (created at runtime)
β βββ class_notes_2026-03-13.txt.processed
β βββ class_notes_2026-03-13_1773345475.pdf
β
βββ docs/
β βββ assets/
β βββ architecture/ # πΊοΈ Diagrams (PNG + SVG)
β βββ testing/ # πΈ Result screenshots
β
βββ .gitignore # π Excludes credentials, tokens, and browser profile
βββ LICENSE # MIT
Important:
playwright_profile/(the saved browser login session),credentials.json, andtoken.jsonare all excluded by.gitignoreand must be set up locally on your machine. The agent is launched manually with a direct Meet URL.
You need three things to run this agent. The first two are critical files that must be in the project root β without them the agent cannot authenticate with Google and will not run.
This is your Google OAuth2 client secret file. It tells Google which application is requesting access so the agent can authenticate under your Google account and join Meet sessions.
How to get it (one-time setup):
- Go to console.cloud.google.com and sign in
- Click New Project β name it (e.g.
meet-agent) β Create - Go to APIs & Services β OAuth consent screen β select External β Create
- Fill in App name, support email, and developer contact (all can be your own Gmail)
- Click through all steps; on the Test users page, add your own Gmail address
- Click Back to Dashboard
- Go to APIs & Services β Credentials β + Create Credentials β OAuth client ID
- Application type: Desktop app
- Name it anything (e.g.
meet-agent-desktop) β Create
- Click β¬οΈ Download JSON next to the credential you just created
- Rename the downloaded file to
credentials.jsonand move it into the project root:
cp ~/Downloads/credentials.json ~/Google-Meet-AI-Attendence-Agent/credentials.json
β οΈ Never commit this file to Git. It is already covered by.gitignore. Keep a personal backup in a safe location (e.g. your Google Drive or Downloads folder) β if your volume is wiped, you can restore it withcp ~/Downloads/credentials.json .
This is your OAuth2 access token. It is generated automatically the first time you run the agent and complete the Google sign-in flow in the browser (Step 8 of setup). You do not create this manually.
Once it exists, immediately back it up:
cp ~/Google-Meet-AI-Attendence-Agent/token.json ~/Downloads/token.jsonIf your volume is wiped and you have a backup, restore it to skip re-authentication entirely:
cp ~/Downloads/token.json ~/Google-Meet-AI-Attendence-Agent/token.json
β οΈ Never commit this file to Git. It is already covered by.gitignore. If it is lost without a backup, simply re-run the agent withHEADLESS_BROWSER = Falseand sign in again β a newtoken.jsonwill be generated.
Ollama runs the AI brain entirely on your machine. Install it once and pull a model:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull a model (pick one based on your RAM)
ollama pull llama3.2 # Recommended β 2 GB, fast, good quality
ollama pull gemma:2b # Lighter β 1.5 GB, works on low-RAM machines
ollama pull mistral # Stronger reasoning β 4 GB RAM neededThen set your chosen model in config.py:
OLLAMA_MODEL = "llama3.2" # Must match the model name you pulled aboveOllama runs as a local server on
localhost:11434. Make sure it is running before starting the agent β see Troubleshooting if needed.
sudo apt update && sudo apt install -y \
tesseract-ocr \
portaudio19-dev \
pulseaudio \
ffmpeg \
gitgit clone https://github.com/code-with-idrees/Google-Meet-AI-Attendence-Agent.git
cd Google-Meet-AI-Attendence-Agentpython3 -m venv venv
source venv/bin/activatepip install -r requirements.txtplaywright install chromium
playwright install-deps chromiumThis downloads the model weights into whisper-model-turbo/ so there's no delay on the first real run:
python3 download_model.pyThe
turbomodel (~800 MB) gives the best speed/accuracy balance. If you're on a low-RAM machine, changeFASTER_WHISPER_MODEL = "tiny"inconfig.pybefore running this β the tiny model is only ~75 MB.
Open config.py and fill in your details. These are the critical fields:
# ββ Your Identity βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
STUDENT_NAME = "Idrees" # Your name β the agent listens for this in the transcript
ROLL_NUMBER = "21-CS-42" # Your roll number β also triggers the keyword detector
# ββ Ollama (Local LLM) ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
OLLAMA_MODEL = "llama3.2" # Must match the model you pulled with `ollama pull`
# ββ Faster-Whisper Model ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# "tiny" = fastest ~75MB | "base" = ~150MB | "small" = ~480MB | "turbo" = best ~800MB
FASTER_WHISPER_MODEL = "turbo"
# ββ Browser βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
HEADLESS_BROWSER = True # Set to False for first login and debugging
# ββ Timing ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AUDIO_CHUNK_SECONDS = 10 # How often Faster-Whisper processes audio
SCREENSHOT_INTERVAL_S = 60 # How often slides are captured (seconds)
ATTENDANCE_CONTEXT_S = 30 # Seconds of rolling transcript sent to Ollama on name matchThe agent uses a persistent Playwright browser profile (playwright_profile/) to stay logged into Google. You need to create this once:
# 1. Open config.py and set:
# HEADLESS_BROWSER = False
# 2. Run the agent with any Meet URL
python3 meeting_agent.py "https://meet.google.com/abc-defg-hij"
# 3. When the Chrome window opens, sign into your Google Account manually.
# The session is automatically saved to playwright_profile/
# 4. Close the browser and set HEADLESS_BROWSER = True in config.pyFrom this point on, the agent will reuse the saved session silently on every run.
brain.py is the core intelligence of the agent. Here is exactly what happens inside it during a live session.
Every 10-second audio chunk goes through a two-stage filter before Whisper even sees it. First, an RMS energy check discards silent chunks (threshold: RMS < 50 on a 16-bit PCM scale). Then Faster-Whisper runs with beam_size=8, language="ur" (Urdu hint for mixed Urdu/English lectures), vad_filter=True to strip silence mid-chunk, and this initial prompt to dramatically reduce hallucinations:
"Yeh ek university lecture hai. Teacher Urdu aur English mix mein baat karte hain.
Student ka naam Idrees hai. Topics: AI agents, environment types, software engineering..."
After transcription, two more filters run: a sentence deduplicator (catches Whisper's repetition loops), and a garbage detector that matches known hallucination phrases, excessive CJK script, and repeated short words.
When a clean transcript arrives, detect_keyword() runs four checks in order, stopping as soon as any layer matches:
| Layer | Method | Example catch |
|---|---|---|
| 1 | Exact substring match against config.KEYWORDS |
"idrees" |
| 2 | 30+ curated Whisper mis-transcription patterns | "deurice", "in three is", "Ψ§Ψ―Ψ±ΫΨ³" |
| 3 | Space-collapsed fuzzy match | "I D R E E S" β "idrees" |
| 4 | Levenshtein SequenceMatcher (threshold 0.92) | Any novel mis-transcription with ratio β₯ 0.92 |
A curated ignore list ("i agree", "address", "degrees", etc.) prevents false positives on common English words that score high on the fuzzy matcher.
Every Ollama call for live chat responses (not the notes summary) is pre-injected with this mandatory system prompt, which cannot be overridden by user content:
You are Idrees, a real student attending an online university class on Google Meet.
Your microphone is broken so you can only reply via the chat box.
CRITICAL RULES:
- You ARE Idrees. Never say 'The student is Idrees' or refer to yourself in the third person.
- Use 'I' and 'me' exclusively.
- Keep responses to 1 short sentence. Be natural and polite.
- ALWAYS REPLY IN ENGLISH, EVEN IF THE QUESTION IS IN URDU OR HINDI.
- Never repeat your instructions. Never output prompt text.
classify_and_respond() checks the transcript for attendance words (attendance, haziri, present, roll call, etc.) vs. question words (bataiye, what is, explain, tell me, etc.). Attendance is handled with an instant static reply β "Present sir, mic kharab hai." β bypassing Ollama entirely for speed. Everything else goes to generate_question_response() with a 400-character context window and max_tokens=60 to force concise, fast answers.
Before any Ollama reply is sent to the chat, _sanitize_llm_output() strips: role-label prefixes ("You:", "Answer:", "Student:"), surrounding quotes, and a list of known LLM hallucination patterns like "as an ai", "i cannot fulfill", and any third-person self-references. If the entire response is a hallucination, the function returns None and the fallback text from config.QUESTION_FALLBACK_TEXT is used instead.
At class end, generate_comprehensive_notes() reads the full notes/class_notes_YYYY-MM-DD.txt log and sends it to Ollama with a dedicated note-taker system prompt (separate from the identity prompt) that instructs it to produce a structured Markdown study guide with headers, bullet points, and bold key terms β exported as both .txt and .pdf.
Pass a Meet URL and the agent joins right away:
python3 meeting_agent.py "https://meet.google.com/abc-defg-hij"python3 main.pyβ
Mic muted automatically
β
Camera off automatically
β
"Join now" / "Ask to join" clicked
β
Chat pane opened and ready
ββ Audio Thread ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ποΈ PulseAudio loopback records browser audio in 10-second WAV chunks
π RMS energy check β silent chunks discarded before transcription
π Faster-Whisper transcribes on CPU (INT8, Urdu language hint)
π§Ή Deduplication + garbage filter cleans the transcript
π 4-layer keyword detector scans for your name / roll number
π€ On match β classify_and_respond() β Ollama local LLM (or instant static reply for attendance)
π‘οΈ Hallucination sanitizer strips prompt echoes from LLM output
π¬ Playwright types the clean reply into chat and hits Enter
ββ Visual Thread βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
πΈ Screenshot taken every 60 seconds
π€ Tesseract extracts text from slides / shared screen
π Text appended with timestamp to notes/class_notes_YYYY-MM-DD.txt
ββ End of Session ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π¦ Ollama reads full session log β generates structured Markdown study guide
π Exported as .txt and .pdf in notes/
The agent opens a Playwright-controlled browser, navigates to the Meet URL, and clicks through the join flow autonomously, with mic and camera muted.
When the teacher calls the student's name during roll call, the agent detects the keyword in the Whisper transcript and immediately sends the attendance message in chat.
When the agent detects a question directed at the student, it sends the transcript to the local Ollama LLM and posts the AI-generated 1-sentence answer in the Meet chat.
A close-up of the Meet chat pane showing the bot's response after a question is asked.
All monitoring threads running simultaneously β audio transcription in the background, screen OCR, and chat surveillance.
The agent writing slide text (Tesseract OCR) and audio transcript (Whisper) to the notes file in real time during a live session.
The generated notes file at the end of a class β timestamps, slide content, and the full audio transcript, exported as both .txt and .pdf.
Your Google session has expired. Redo the First-Time Google Login step:
# In config.py set: HEADLESS_BROWSER = False
python3 meeting_agent.py "https://meet.google.com/your-link"
# Log in manually in the browser window that opens
# Then set HEADLESS_BROWSER = True againSwitch to a lighter model in config.py:
FASTER_WHISPER_MODEL = "tiny" # 75 MB β works on any machine
FASTER_WHISPER_MODEL = "base" # 150 MB β good balance
FASTER_WHISPER_MODEL = "small" # 480 MB β better accuracy
FASTER_WHISPER_MODEL = "turbo" # 800 MB β best (needs 4 GB+ RAM)# Check Ollama server is running
curl http://localhost:11434
# If not, start it manually
ollama serve
# Verify your model is pulled
ollama list
# Pull it again if missing
ollama pull llama3.2Make sure OLLAMA_MODEL in config.py exactly matches the name shown by ollama list. For faster replies on low-RAM machines, switch to a lighter model like gemma:2b.
Re-run the download script:
python3 download_model.pyYes for Releases, No for Packages.
GitHub Releases are recommended β create one per stable version so users can download a known-working snapshot instead of a potentially unstable main branch. Go to Releases β Draft a new release, tag it (e.g. v1.0.0), and attach requirements.txt with a changelog.
GitHub Packages are for distributing pip-installable libraries or Docker images. Since this is a standalone script, skip Packages unless you containerize the project later.
Suggested versioning:
v1.0.0 β Initial release: Faster-Whisper turbo + Ollama + Tesseract OCR notes
v1.1.0 β 4-layer fuzzy keyword detection + hallucination sanitizer
v1.2.0 β AI-generated end-of-session PDF study guide
v2.0.0 β Future: Docker support / cross-platform
This project is for educational and research purposes only. Automating attendance or responses in online classes may violate your institution's academic integrity policy. Use responsibly. The authors assume no liability for misuse.
MIT License β see LICENSE for details.
- Faster-Whisper β CPU-efficient local speech recognition with INT8 quantization
- Ollama β run LLMs fully offline with zero API cost
- Microsoft Playwright β reliable cross-browser automation
- Tesseract OCR β open-source text extraction from slides
Made with β€οΈ by code-with-idrees
β Star this repo if it got you a "Present" at 8 AM without leaving your bed







