This post is the first in a short technical series documenting the architecture and development of POPOUT.NYC - as always, intended to be accessible for everyone, including my less-techy readers.
I’m starting with the backend architecture and data flow, setting the stage to explore frontend, learnings (funky AI stuff!), and process in subsequent posts. For more on the origins and vision of POPOUT, check out my previous post.
the challenge at hand
Flyers are a wonderful means of disseminating information. They have remained popular for centuries, once carved into wooden blocks and now comprised of pixels. However, their lasting qualities are not well-suited to certain conveniences of the digital age: the artfulness and variety with which they display critical information makes them inherently difficult to organize and aggregate.
Rather than strip away the aesthetic value to gain well-structured information (imagine a spreadsheet of events; gross), what if we could have both, with no tradeoff? Allow artists to produce flyers without restraint, don’t require a human to put additional effort into encoding the information in another structured format, just meet the information in its natural state (the flyer) and pull it “automagically”.
The latest AI models are capable of doing just this, effectively and cheaply1. In a one-off fashion, it’s quite trivial - I encourage you to try it yourself if you’re curious. Send an image containing some text to ChatGPT, ask the model questions referencing the text, et voila! You’ve just performed the core operation of POPOUT.NYC.
The primary difficulty in systematizing this operation for POPOUT has been maintaining a high degree of accuracy and consistency at scale given the variance of inputs and LLMs’ noted capacity for “hallucination”. To put it another way, fitting a “mushy” AI processing step into a (typically rigid and deterministic) data pipeline is a new type of challenge. (And likely one that is here to stay…)
Below is a diagram that provides a high-level view of the data flow:
down the rabbit hole
These steps provide an approximate framework2 for how data moves through POPOUT:
Source
Classify
Extract
Model
Display
Deploy
This post will cover the first four steps which comprise what I consider to be the “backend” of this system. They are explained in more detail below. If the flow diagram is not making sense on its own, I encourage you to read through the steps and then revisit. If you have questions and care enough to ask, I would be honored to answer !! Please leave em in the comments for others’ benefit.
1. data sourcing and curation
All of the flyers for POPOUT come from a hand-picked list of local venues. Inputs for this system (host info, venue info, flyers themselves) are obtained via processes that are semi-manual to maintain a non-algorithmic, human feel3. That said, expanding POPOUT’s coverage and sourcing capabilities is on the roadmap.
Flyer images and relevant context are stored as files on my laptop. Everything happens locally until data is turned to JSON and deployed to POPOUT.NYC.
2. flyer classification (using AI!)
The classification stage is intended to simplify and improve the accuracy of the subsequent extraction stage. The classifier determines whether a flyer is indeed for an event, and (if so) what type of event it describes.
I am performing this classification with OpenAI’s gpt-4o model via the structured output multi-modal prompting supported by llamaindex + Pydantic. If that is a lot of mumbo jumbo to you, read on for a more concrete description:
I am giving the LLM an image of the flyer and some context on venue/host via a general prompt, and asking it to output data in the structure of the FlyerClassified
object above. That output is simpler than it looks: basically, whether or not this is an event (is_event
), and what type of event it is (event_type
).
The descriptions of those fields (in orange, above) are hopefully useful to you in deciphering what I’m requesting - same for the LLM. There is no magic here; you get what you ask for. Upon receiving this structured output based on unstructured input, we’ve taken our first step towards refined data.
event_type
is useful in and of itself (providing filter functionality, per the above) but also adds context for subsequent processing. A more sophisticated implementation of the extraction step could use type-specific prompting and data structures to extract information that is relevant for only one type of event (e.g. the menu of a food event).
3. unstructured flyer text extraction (AI again)
Now, knowing via the classifier step that this flyer is indeed describing an event, we attempt to extract event-related information from the flyer. Again, using a structure described by a pydantic model for an Event
:
Again, if you’re reading this without knowing Pydantic, note the descriptions in orange - these are the requests I am making of the LLM. I’ve tuned the language based on naive trial and error to try to elicit consistent and accurate results. Not shown here is the general prompt that contains added context on the venue, host, and the event_type
we captured in the prior step.
This current setup is not perfect, but it’s good enough. The model consistently returns correct date and time formatting, and there are few enough edge cases so far that I can address them specifically (e.g. the reference to “late” or “sold out” in end_time)
. Asking for “LANGUAGE DIRECTLY FROM THE FLYER” was a crucial insight, as without this the event_title
and items_offered
had a sanitized and obviously robotic tone.
The AI is still weird sometimes, or can’t extract complete information from the flyer. Failed extractions are written to “triage” to be manually examined and retested such that I can improve results over time4. Successfully processed events are captured in the same database with the fields from the above Event
model.
4. structured data modeling and storage
I am storing all “structured” data (basically everything that is not a flyer image file) in a local SQL database5, depicted somewhat misleadingly in the diagram as several different greenish cylinders - more precisely, each of those cylinders represents a table in the same database.
The structure of this data is defined by pydantic models representing each of the relevant “domain concepts”. Each of these models has “attributes” for the data points of concern. For example, the Venue
model has attributes name
, address
, and boro
. Note that you’ve already seen the models for Event
and FlyerClassified
above.
Pydantic makes it easy to “dump” these models to a tabular format: each column in the table corresponds to an attribute of the model, and each row is an “instance” of that model. Putting it all together, here’s part of the table for venues
:
to be continued…
That is more than enough for now. If you’re still reading, thank you for bearing with me. The portion of the system described above is what I would consider to be “in my wheelhouse” - the AI stuff is new to me, but data pipelines in Python + SQL are my bread and butter.
In the next post, we’ll venture into what was very much uncharted territory for me: frontend web development!!! To build the frontend of POPOUT without spending weeks or months learning Javascript, I relied on Anthropic’s Claude Sonnet LLM to do much of the heavy lifting. That approach to development was also a novel one for me, with a learning curve all its own.
I’m biased, but the stuff coming up is probably more compelling in that it is close to a still under-documented technological frontier. I feel fortunate to have had the opportunity to build skills in this area on a project I care about, and I sure as hell hope they’ll take me somewhere. See you soon…
a video shop, not netflix; a radio station, not spotify’s “daylists” (ugly)
I have yet to actually do this since arriving at this version of things, but I could !! After completing this series, I have plans to make significant changes to the architecture, and those “bad” cases will be useful for development.
duckdb !! they have my wholehearted endorsement; more on “the stack” in a later post