This repository serves as the technical companion to GeoGuessr.ai, our modern AI Coach for GeoGuessr. It is forked from the foundational geoguessr-ai project by Stelath, and this document has been updated to showcase the evolution of this technology.
The original geoguessr-ai project (preserved in full below) was a fantastic exploration into building a custom CNN model to identify one of five US cities. It represents a classic, hands-on approach to machine learning and was an inspiration for many.
However, the AI landscape has evolved dramatically. The challenge is no longer just about building a model that can guess, but about creating a tool that can teach.
This is the philosophy behind GeoGuessr.ai. We've shifted the focus from "solving" to "coaching." Instead of training a limited model from scratch, we leverage advanced prompt engineering on state-of-the-art vision models (like Google's Gemini 2.5 Pro and OpenAI's o3) to teach them how to think like a world-champion player across the entire globe.
Our AI Coach can:
- Identify hundreds of subtle "meta clues" (like specific bollards, road lines, and car parts).
- Explain its reasoning step-by-step, turning every analysis into a lesson.
- Help you train your own brain to spot these patterns.
This new, coaching-focused philosophy is live and accessible to everyone.
➡️ Experience the AI Coach at GeoGuessr.ai (3 free analyses per day, no signup required)
The following sections are the original, preserved README.md from Stelath's project. It's a valuable look into the foundational challenges and methodologies that paved the way for modern tools. We are immensely grateful for this foundational open-source contribution.
In order to train the model I first had to create a reasonably large dataset of Google Street View images to train it on. To do this I wrote a python script (get_images.py) to download a large set of photographs from 5 cities in the US from Google Street View API. In order to download images from the Google Street View API latitude and longitude coordinates were needed, to solve this I utilized an address book from Open Addresses to get the latitude and longitude data of random street addresses for each of the 5 cities.
Before training the model I decided to format the targets in order to create a way for the AI to better guess the location by turning the GPS coordinates into multi class targets, through formatting each number in the coordinate as a one hot array with numbers ranging one through ten.
After using a couple different model architectures I settled on a wideresnet with 50 layers which gave comparable results to a wideresnet with 100 layers but took far less GPU memory. The model was trained on a dataset of 50,000 images and had the best performance 20 epochs in and then began to slowly overfit.
Using a custom model that is better suited to guessing locations rather than just image classification. Adding far more layers so the model can pick up more complexity however this would require a larger GPU. Arguably the best performance improver - and something that would more accurately follow the GeoGuessr - idea would be to train a 3D CNN on an array of images from the location giving the model far more data to work with and make accurate predictions.
While the model is by no means perfect, it is suprisingly accurate given the limited input it recives. This interestingly enough reveals that while many would regard American cities as similar there are clearly significant differences in their landscapes so much so that the AI was able to take advantage of them to at least correctly predict the city it was in the majority of hte time.
In order to run a pretrained model you can download it from Google Drive: or you can use the Google Colab:
.
You can use the get_images.py script to download a database of images through Google Cloud, they allow 28,500 free Google Street View API calls each month so keep that in mind (anything more and you will be charged $0.007 per image), you will also have to set up a Google Cloud account. You can also download the database of Google Street View images I created here.
After you have a database of images running the dataset_builder_multi_label.py script will preprocess all of the images, then running main.py will begin training the model.
Here is a set of commands that would be used to train a model on 25,000 images (keep in mind you will need a cities folder containing .geojson files from Open Addresses):
python -m get_images --cities cities/ --output images/ --icount 25000 --key (YOUR GSV API KEY HERE)
python -m dataset_builder_multi_label --file images/picture_coords.csv --images images/ --output geoguessr_dataset/
python -m main geoguessr_dataset/ -a wide_resnet50_2 -b 16 --lr 0.0001 -j 6 --checkpoint-step 1