Alright, let’s talk about my little experiment with Prodigy and spaCy, aimed at labeling some text data for a project. I wanted to see if I could quickly whip up a custom recipe to make the annotation process smoother. My goal was to annotate some text about “streets” and related terms.
Getting Started
First things first, I installed Prodigy. It was pretty straightforward using pip. I already had spaCy set up, so that part was easy.
Then, I created a new dataset in Prodigy. Just a simple command-line thing. I called it “street_data” – keeping it simple, you know?

Building a Simple Recipe
I wanted a custom recipe because the built-in ones weren’t exactly what I needed. I cooked up a basic Python script for a custom recipe. It basically loaded my text data and used spaCy to pre-process it. This meant that the words were already split up (tokenized), which made my life easier during annotation.
- I Created a new python file. Let’s name it ‘street_*’.
- I Wrote some simple python code with Prodigy’s recipe decorator.
- Imported spaCy.
The Annotation Process
I fired up the Prodigy interface using my new custom recipe.
The interface was clean and user-friendly. I could see the text, and all I had to do was click “accept” or “reject” based on whether the highlighted word or phrase was related to “streets”.
Important Note:The words that I think are not so import,I chose ignore *’s really useful.

It wasn’t perfect at first. spaCy’s pre-processing helped, but I still had to do some manual tweaking. Some things that weren’t relevant slipped through, and I had to reject them. And sometimes, it missed stuff that was relevant, so I had to correct it.
Iterating and Improving
After annotating a few hundred examples, I stopped and trained a new spaCy model using the data I’d labeled. The command was easy enough, just pointed it at my dataset and told it where to save the new model.
The goal here was to get Prodigy to learn from my annotations. The next time I used it, it should be smarter and make fewer mistakes.
And you know what? It worked! The second round of annotation was much faster. Prodigy was highlighting more of the right stuff and less of the wrong stuff. I still had to do some corrections, but it was definitely an improvement.
Final Thoughts
Overall, I’m pretty happy with how this experiment went. I was able to create a custom annotation workflow pretty quickly, and the iterative training process actually made a noticeable difference. It’s not a perfect solution, and I’m sure I could refine it further, but for a quick and dirty way to label text data, it worked pretty well. I just annotated, trained, annotated some more, and trained again. Kept repeating until I felt like the results were good *’s good to label data and train a model so fast!



 
			




