Posterous theme by Cory Watilo

Recent Yieldbot Intent Streams Related to Steve Jobs

At Yieldbot our focus is on collection, organization and realtime activation of visit intent in publisher content. We do this not as a network but on a publisher-by-publisher basis because of this simple fact; every publisher has a unique audience and unique content. What that means is that even if the keyword is the same across publishers, the intent associated with it varies in each domain. 

The original purpose of this post however was not to point out the flaws of networked based keyword buying vs the performance advantage of Yieldbot’s publisher direct model. Nor was the purpose to show you how much we truly understand publisher side intent at the keyword level and how use that intelligence in an automated way to achieve the highest degrees of relevant matching. 

The original purpose of the post was to meet the request of a few people that had asked me to share some more data visualization of our Intent Streams™ after we originally shared a few on our recent blog post about our data visualization methods.

It occurred to me the other day that the best representative example over the last month was intent around “Steve Jobs” so below we are sharing our 30-day Intent Streams™ from four publishers. 

If you’re new to our streamgraphs the width of the stream is the measure of pageviews of intent associated with the root intent “Steve Jobs.” The other useful data points in these visualizations are the emergence, increases, decreases and elimination of the associated intent over time. As well as how many terms are seen to be associated with the root intent.


Another way we visualize intent data is across a scatter plot. Here you see the performance of the “Steve Jobs tribute” compared to the other intent related to Steve Jobs looking at the number of entrances (aka landings) on the y-axis and the bounce rate of that intent on the x-axis. 

It’s important to note in this scatter plot visualization that the analytics are predictive. We are estimating performance forward over the next 30 days. The four streamgraph visualizations were based entirely on historical data –in their case a 30-day look back as noted on their x-axis.

We hope you find this intent data as interesting as we do.


We Love the Mess of Ad Tech and Wouldn't Want it Any Other Way

My introduction to ad tech (roughly): "Ad networks are a mess, you wouldn't believe what a technical mess the industry is".

That was in my first meeting with David Cancel (then at Lookery, who since founded Performable, acquired by Hubspot) as I was bouncing an idea off of him that touched on the edges of the ad tech space.

Fast forward 6 months from then (almost exactly two years ago), I got a Twitter DM from David: "Friend is starting a new startup in the ad space. Looking for a CTO and/or help. Any interest?"

About a month later I was the CTO of Yieldbot.

Two years on I can say he was definitely right, ad tech is a mess. Part of it is how it's evolved and part of it is structural and baked into its nature.

I'll step it up and say this: ad serving may be the most complex distributed application there is.

The proof is in explaining why.

You Control Almost Nothing

There are so many degrees of freedom it can make your head spin.

You basically have a micro-application running embedded in a variety of site architectures each with their own particular constraints, whose users are distributed around the world running all manner of execution environments (browsers).

When you have your own site or application you still have to deal with (or choose what to support of) the myriad browsers and browser versions, complete with differences in language (javascript) version (or even whether javascript is enabled) and issues like what fonts are available on client systems.

If you are creating a destination site, that keeps your team's hands full. If you're serving ads, that's just a warmup. Because you also don't control the site architecture that you are embedded in.

You might need to sit behind any number of ad servers the publisher might be running everything through.

You might be in iframes on the page.

You might need to execute code in specific places relative to other ad serving technology also embedded on the page.

Navigation through the site may or may not involve full page refreshes.

But that's not all...

Distributed Worldwide with Time Constraints

Remember those environments you don't control? They are the customers' websites, and they don't want their users' UX degraded.

Serve ads, optimize it for relevance, and don't slow down page load times.

Most websites have some level of focused geographic distribution to their users. Even if it's as broad as US or even US+Europe.

But for ad serving, your user base is the set of users aggregated across all of the sites using your service. The world is your oyster. And the footprint of what you need to service. Quickly.

But wait! There's more!

Content Relevant To The User At That Moment

At least if you want to be as cool as Yieldbot.

Look, a CDN serving up a static image can satisfy all of the above if all you want to do is serve the same image to every user across all of your customers' websites.

Scattershot low value ads picked fairly at random would approximate that level of ease as well.

But there's no sport (or value) in that!

Our goal here is actually to serve content that is the most relevant to what the user is doing at that particular moment. When done right (we do), everyone wins.

So - simply serve the content that best fits what the user is doing at that moment, where they came from, and what they've expressed interest in *right now*, on whatever they happen to be running on, and wherever they happen to be. And make it snappy, would ya?

We Wouldn't Want It Any Other Way

So, that's what I signed up for - and I love it. And so does the rest of the Yieldbot team.

We started our first intent-based ad serving on a live site a couple months after coding started and started learning real world lessons immediately. And 20 months later it's still going.

I've always loved to work on systems with complex dynamics, so considering all of the above it's not that surprising I ended up finding my way to ad tech.

What I love about building Yieldbot technology? All of the above is only half the story. We also do Big Data(tm) analytics for our system to learn the intent of the users coming to our publishers' sites. We provide them visualizations and data views that teaches them what the intent to their site is. And *then* we enable them to serve ads that make that intent actionable.



Hacking Display Advertising

Being as passionate as we are about the huge advances in dynamic web languages and event based programming it is tough to love display advertising. Display advertising was never about web programming or data networking. It was nested on the web as a rogue aggregation and delivery mechanism. The ability of display to deliver relevance remains hindered by this disjointed architecture. It is not threaded into site experiences and the realtime goals of the visits on the pages where it resides. This is exactly why we’re hacking it into something else.  

Vanilla Sky

In Q2 2007 while most of the industry was living some sort of vanilla sky of Behavioral Targeting one company came in and paid, what at the time seemed to most people way too much money, to own a controlling interest in display. No, I’m not referring to AOL buying TACODA. The company I’m talking about has maintained a focus since day one on hacking what you are interested in at that very moment. Unlike other content aggregators it tied its advertising system in the core user experience of its pages and the realtime relevance they delivered. Their stated Display strategy has little to do with cookie matching and everything to do with realtime context and creative optimization with the purpose of “capturing relevant moments.” It is now the most powerful company in Display. That company is of course Google.

Lucid Dreaming

The first lesson here is about the medium itself. This is a different medium and the old media buying and selling template breaks here. Behavioral Targeting may have changed names to the less scary “Audience Buying” but seven years later performance expectations have not been met and it has dragged display into the mud of issues like privacy, ad verification, cookie stuffing and more.

By contrast, Search and email – the most important of the web’s applications - have little use for tracking people across the web, let alone reach and frequency measures. They are the opposite of that. Search (and the web itself) was built by hackers to solve information management and retrieval problems.

The second lesson is that in this medium three pieces of data are valuable – context, timing and performance. The rest is just pipes. Understanding the context of an impression or click at the moment the page is loading and the ability to optimize the message is what the web was built for. It took Search to turn it into a marketing channel but growth (~20% YoY with no end in sight) and the size ($46B in 2013 per eMarketer estimates) of that channel shows how powerful that data is and how helpful understanding it can be to consumers.

Waking Up

The fact that it is referred to as display “advertising” is reason enough to know it’s from another time. This medium kills advertising. Everything on the web is marketing. As Suzie Reider, national director of display sales for Google said recently “display needs to move beyond advertising and into interacting.” Yesterday, Krux CEO Tom Chavez wrote a thoughtful blog post on how it is time for display to move beyond advertising. We agree and we're walking the walk.

This doesn’t mean that publisher will not show ‘graphical’ units as Google calls them. Of course they will. It doesn’t mean that prime real estate isn’t going to be turned over to these units, they will. We’re headed to a world with fewer messages that will be bigger and more interactive. But if we have learned anything from Search it is that format and size don’t matter when the message is relevant, helpful and useful at that moment.

Billions of Relevant Moments

As long as technology to understand context and timing are progressing as fast as they are (and as places like Betaworks where realtime is the thesis of the new medium value creation the startups are hacking away) there is a bright future. Search has proven that the web is the greatest and most democratic marketing medium ever created. The hackers working with dynamic web languages and event driven programming can unlock an order of magnitude of more relevant moments. There are literally billions of them out there waiting to be captured and created. At Yieldbot we see this scale everyday in the inventory of web publishers and if you’re a hacker and remaking the staid idea of advertising appeals to you we would be interested in speaking with you.


One Slick Way Yieldbot Uses MongoDB

As a key/value store. Wait, what?  Yeah, MongoDB as a key/value store.

Why? Because we were already using Mongo and planned to move some of our data to a key/value store. But we also didn't want to wait until we made a decision on a specific solution to make the transitions in our code.

First, as if you wouldn't have guessed, here's how easy a "Python library for MongoDB as a key/value store" is:


def kv_put(collection, key, value):
    """key/value put interface into mongo collection"""{'_id': key, 'v': value}, safe=True)


def kv_get(collection, key):
    """key/value get interface into mongo collection"""
    v = collection.find_one({'_id':key})
    if v:
        return v['v']


Note: kv_get() returns None if nothing found, so technically this doesn't gracefully handle the case where you want None to be a possible value.

What was the pain point?

We basically found that we had collections with nightly analytics results that were really big, and whose indexes were really, really big.  And the index requirements were going way beyond our 70GB RAM server.  We didn't want to shard our Mongo server because of the cost involved, so instead decided to take a different appraoch.  Since this data was read-only results of analytics, where we once had collections that had entries that were multiply indexed, we now have collections that are pages of the old entries in a defined sorted order and are accessed as key/value.

How did the change work out? Great. We still haven't switched from MongoDB for this data. Still plan to, but in a startup once you address a pain point you move on to the next one.

You definitely can't argue that MongoDB isn't flexible.