AUTOMATED IMAGE VOTING SYSTEM

Python • PyTorch • Transformers

This was my first programming project, built with Claude guiding the implementation. I found a crypto platform paying users to vote on which AI-generated images best matched text prompts. Easy beer money, until I realized I was clicking through hundreds of comparisons daily. I figured computers should be good at this, so I decided to automate it.

Started knowing nothing about programming, ended up deploying a PM2-managed daemon using PyTorch and CLIP models. In the end, I made some beer money,

The Journey

The Problem

Each voting round showed three AI-generated images and a text prompt. Pick the best match, submit, repeat. After a few hundred rounds, the pattern became clear: this was exactly the kind of repetitive task computers excel at. Time to learn how to make that happen.

Image voting interface showing three options

Making It Run

The bot needed to run 24/7, so we set up PM2 to keep it alive and restart if anything crashed. Every 5 seconds, it polls the API for new voting rounds, processes the images, and submits votes. JWT tokens handle authentication, with automatic refresh when they expire. Simple loop, but it had to be reliable.

Bot dashboard showing real-time activity

Three Attempts

Version 1: Started with the bot picking random pictures. I still was super amazed by seeing my mouse move on it's own though. After making tests, this version scored 33% accuracy, but it proved automation was possible and this motivated me even more to keep improving it.

Version 2: Learned about computer vision and NLP. Used ResNet18 to extract image features, OCR to read any text in images, and sentiment analysis to match against prompts. Hit 40% accuracy, but struggled with abstract concepts that didn't have clear visual or textual signals.

Version 3: Discovered CLIP. the bot now was able to understand semantic relationships between images and language. We switched to CLIP's joint embeddings and calculated cosine similarity scores then got 90% accuracy. Turns out the problem was already solved, I just needed to find the right tool. I stopped at this version because if I aimed for "perfect" results, I thought that I might get caught.

Three versions showing accuracy progression

How It Works

The final system runs in a continuous loop: fetch voting rounds from the API, download the three images, pass them through CLIP's vision transformer along with the text prompt, calculate similarity scores, submit the highest-scoring match. PM2 manages the process with automatic restarts and memory limits to keep it stable.

System architecture diagram

What I Learned

I was able to understand for the first time, more abstract concepts about technology. I found a different way of thinking, I mainly was hooked after all that.

More importantly, I learned that you don't need to know everything before starting. I began with zero programming experience and a simple goal: automate clicking buttons. By the end, I understood a glimpse of APIs, authentication, process management, PyTorch, and vision transformers.