Yieldbot 2015-04-08T00:02:47+00:00 http://yieldbot.com/index.php?p=blog.rss Yieldbot info@yieldbot.com http://yieldbot.com/blog/yieldbots-first-annual-super-bowl-intent-scorecard Yieldbot's First Annual Super Bowl Intent Scorecard 2015-02-03T19:14:00+00:00 2015-02-03T19:32:18+00:00 Jonathan Mendez http://yieldbot.com/ With the size of our platform now reaching 1 billion consumer sessions a month across our install base of premium publishers, we thought it made sense to take a look at the real-time intentions of consumers leading up to the Super Bowl. We focused on what everyone agrees is the best part of the game, food.

Of note, Yieldbot sold-out real-time intent for "Super Bowl" this year across 4 advertisers. Performance data shows they are glad they bought it.

If you are interested in buying other tent-pole or event related intent please contact sales@yieldbot.com

http://yieldbot.com/blog/yieldbot-2014-review-by-the-numbers Yieldbot 2014 Review by the Numbers 2014-12-22T18:31:00+00:00 2014-12-22T18:46:38+00:00 Jonathan Mendez http://yieldbot.com/ It’s been an amazing 2014 at Yieldbot. Since one of our core values is being data driven, I am going to share some of our amazing internal data.

Everything we are and we accomplish starts with the team. In 2014 our team grew from 29 to 62 people. We opened 2 new offices (Chicago and Bentonville) and moved into 2 larger offices in Portland and New York City.

Yieldbot is webscale. In 2014 our technology collected and processed consumer activity on 8 billion consumer sessions across almost 400 premium publisher desktop and mobile websites. This equated to over 23 billion page-views of first party data and accounting for ad slots, over 50 billion real-time decisions. We do not measure “unique visitors” since we are a “cookieless” technology but on an aggregate level we likely have one of the largest footprints in digital advertising. Certainly we have one of the largest in premium content and the largest collection and use of first-party publisher data for monetization. Woohooo!

On the advertising side we ran over 1500 campaigns this year, many of which are “always-on.” Over 240,000 creative variations ran on Yieldbot in 2014 producing a platform wide CTR of .35%. It was not unusual to see desktop campaigns over .75% CTR or higher and mobile campaigns at 3% CTR. Reminder – these are display ads in IAB standard units.

For 2014 our top-line revenue grew 6.5X after growing 29x last year. Our gross profit grew over 14X this year after growing 46X last year. Our first 3 years of revenue compares almost identically to RocketFuel, who ended up being the #1 fastest growing technology company in the D&T Fast 500 its first year of eligibility. It’s possible we are the fastest growing company in advertising technology right now and despite one-time costs associated with two office moves, December 2014 will be our first month of EBITDA profitability.

Our growth is fueled entirely by the performance of the media. We sell on a performance basis – 70% of our campaigns are run on a Cost-Per-Click (CPC) model. Our success is measured most on performance of the audiences we are helping drive to the marketer’s own sites and digital experiences. In that regard, we regularly beat the performance of paid search and crush other display technologies for our clients (ask sales@yieldbot.com for the case studies). Also, 2014 was our first full year of offering mobile ads. 25% of our revenue in 2014 was from mobile. In Q4 it is 33% of our revenue. Mobile is growing fast.

Yieldbot will continue to be guided by the fundamental truth that we can never be successful unless our publishers are. Our commitment to publishers and more specifically to the people that consume their media must be forefront in our minds. Everything flows from that.

2014 was a banner year for our publishers. For our largest publishing partner, a media titan, we are their single largest partners - digital or print. They will generate over $6M from Yieldbot for 2014. Many other publishers are in the million-dollar club and many others will be soon as quarter-over-quarter revenue to publishers is growing fast.

A large part of our publisher success is that we delivered CPMs 2-4X what publishers are getting from ad exchanges or other partners such as SSPs. Since the backbone of our tech is machine learning (or as I like to say machine “earning”) our most successful publishers are the ones our technology has been on the longest. They are getting CPM in the $5-$10 range. This is on desktop and mobile. Maybe our best story of the year is a large site with 22M unique visitors a month that had been a huge Google partner but is now getting RPM from Yieldbot as high as 11X Google and for the first time ever has a partner delivering more total monthly revenue than Google. Cha-ching!

One other amazing thing happened on the publisher side of our business this year. Our success with their first-party data lead to publishers asking to use our technology to serve their own direct-sold campaigns and creation of a service model for Yieldbot. These publishers, some of the largest on the web, are now delivering CTR performance 5X what they were delivering running campaigns with data from their DMP and using their Ad Server waterfall to decision. In addition, they are delivering back-end metrics that allow them to compete for budgets going to Facebook and Google. Ask pubteam@yieldbot.com for the case studies.

We made other incredible advances with our technology this year as well. We increased our lead in understanding real-time consumer activity with major efforts around real-time data streaming and the quality of our intent scoring and optimization decisions. We also helped a number of our publishers with issues around fraudulent traffic (at no charge) since we have our own systems to validate visitors. As a performance technology the onus/cost is on us to ensure we are serving to quality traffic.

We did it all in a fully transparent manner, building trust with our direct relationships at the agencies and with publishers. If publishers want to know their highest RPM pages by referrer source for Monday mornings we are happy that we can provide that data. If advertisers want to know what creative headline drove the highest conversion rate in the afternoon we are happy that we can provide that data. What we’re most proud of is that not only have we built a business that can provide this deep level of learning, but also by the time we have that data it is already being acted upon by our tech to improve business results.

2014 was the year Yieldbot got on the map and truly started blowing people’s minds with our technology and our results. For our company it was clearly “the end of the beginning.” Next year, it will not be enough to simply outperform our competitors. We are focused on transforming an entire industry so that advertisers, publishers and consumers all receive the benefits of relevance. Namely, increasing the value of digital media for every participant. We are excited for 2015 and thankful to everyone that helped us get this far in our journey.

http://yieldbot.com/blog/rise-of-the-intelligent-publisher Rise of the Intelligent Publisher 2014-11-10T04:07:00+00:00 2014-11-10T13:26:25+00:00 Jonathan Mendez http://yieldbot.com/ Last week was yet another digital advertising event week in New York City – ad:tech and surrounding events. Everyone involved was talking about all the great advances in advertising technology happening right now. Nobody was talking about the advances in publisher technology. Frankly, there isn’t much to talk about. Except that there is.

A certain group of publishers have advanced light years ahead of the current discourse. Advanced beyond the idea of programmatic efficiency. Advanced beyond the idea of creating segments by managing cookies. Advanced far beyond unactionable analytic reports related to viewability. These publishers, many of whom are the most revered names in media, have deployed machine learning and artificial intelligence in their media. They have moved beyond optimizing delivery of impressions to predictive algorithms optimizing ad server decisions in real-time against the performance of their media.

The movement towards publishers understanding and optimizing their media is one of retaining the value of ownership. It may seem obvious that publishers own their media but they don’t really. In reality when the buyer is the one that is optimizing the performance of that media and keeping the learning – even using that learning as leverage in what they are buying – the value in ownership, in fact the entire value of a publishing business, is called into question.

The smartest and most forward thinking publishers understand the stakes. We are entering a world where all media is performance media. Brands are getting smarter about attribution and measurement. They are getting smarter about marketing mix models and seeing the value in digital over other channels. They are gaining insight into how digital influences purchase. In this new world the imperative of publishers to be as knowledgeable about their media as brands is not about competitive intelligence. It is about better performance for these very brands – their customers!

Publishers will not survive in a world where they do not know when, why, where, how and someone is walking into their store. They will not survive if they do not know what customers are buying and how much they are paying. No business could survive that lack of data and intelligence. In fact, no customer really wants that type of store either. Customers want to buy products that serve their intended purpose. Customers want sellers to understand — even predict — what they will be interested in. Customer experience extends to buying media the same as buying anything else. This means marketer performance.

Publishers are ultimately responsible for the performance of their media and the happiness of their customers. Yet, many never think about it or feel helpless to do anything about it. They are the ones that will not make the transition to the performance media economy.

Like all intelligent systems at the core this is about data. Vincent Cerf famously said Google didn’t have better algorithms, they just had more data. The reality is no one has more data than publishers. Each landing on the site, each pageview by a consumer has well over a hundred dimensions of data. That data is fundamental to the structure of the web – it is linked data – and the serialization of it through each site session creates exponentially more of it. Unlike cookie data that does not scale this data is web scale. And it can be captured, organized and activated in real-time.

The data is a window into the consumer mindset and journey. Importantly for publishers it is an explicit first-party value exchange between the publisher and the consumer. It is an exchange of intent for content, of mindset for media. It is what brands want more than anything else. The moment, the zeitgeist, the exact time and place a consumer is considering, researching, comparing, evaluating, learning, and discovering what and where to buy. It is the single most valuable moment in media. It is right time, right place, right message. It is uniquely digital, uniquely first-party and owned solely by the publisher. It is a gold mine that Facebook and Google recognize and they have focused their recent publisher-side initiatives on capturing from publishers either unsuspecting or incapable of extracting the value themselves.

As publishers begin to understand these moments themselves, activate them for marketers and optimize the performance of the media against them, an amazing thing happens. The overall value of the media increases. It increases because of intelligence. It increases because of performance. Most important, it increases because of value being delivered to consumers. It also opens up new budgets. Over time, these systems will get smarter. With more data, publishers can even begin to sell based on performance thereby eliminating a host of issues around impression based buying and increasing overall RPM by orders of magnitude with higher effective CPMs and smarter, more efficient allocations of impressions.

Unfortunately for the hungry advertising trade publications you will not hear these most advanced publishers talking on panels about this or writing blog posts. Does Google talk about Quality Score? Does Facebook talk about News Feed? You will not hear industry trade groups arguing for publishers to sell performance to their customers either. In fact not one panel all of last week discussed first-party data created by the publisher.

Thankfully there is no hype related to this. Only performance. The people who need to know, know. As much as I’d like to share the names of every one of those publishers and agencies with you, more so I want to honor their competitive advantage in the marketplace.

The best publishers have the best content. The best content delivers the best consumers. The best consumers deliver the best performance. This is not new. The rise of the intelligent publisher in collecting, organizing, machine learning, activating and algorithmically optimizing this first-party data stream in real-time is. It’s the most groundbreaking thing in media happening right now and will be for some time because it has swung the data advantage pendulum to the publishers for the first time since data has mattered in digital media.

http://yieldbot.com/blog/tf-idf-using-flambo TF-IDF using flambo 2014-07-22T16:58:00+00:00 2014-07-22T18:29:28+00:00 Muslim Baig http://yieldbot.com/ flambo is a Clojure DSL for Spark created by the data team at Yieldbot. It allows you to create and manipulate Spark data structures using idiomatic Clojure. The following tutorial demonstrates typical flambo API usage and facilities by implementing the classic tf-idf algorithm.

The complete runnable file of the code presented in this tutorial is located under the flambo.example.tfidf namespace, under the flambo /test/flambo/example directory. We recommend you download flambo and follow along in your REPL.

What is tf-idf?

TF-IDF (term frequency-inverse document frequency) is a way to score the importance of terms in a document based on how frequently they appear across a collection of documents (corpus). The tf-idf weight of a term in a document is the product of its tf weight:

tf(t, d) = (number of times term t appears in document d) / (total number of terms in document d)

and its idf weight:

idf(t) = ln((total number of documents in corpus) / (1 + (number of documents with term t)))

Example Application Walkthrough

First, let's start the REPL and load the namespaces we'll need to implement our app:

lein repl
user=> (require '[flambo.api :as f])
user=> (require '[flambo.conf :as conf])

The flambo api and conf namespaces contain functions to access the Spark API and to create and modify Spark configuration objects, respectively.

Initializing Spark

flambo applications require a SparkContext object which tells Spark how to access a cluster. The SparkContext object requires a SparkConf object that encapsulates information about the application. We first build a spark configuration, c, then pass it to the flambo spark-context function which returns the requisite context object, sc:

user=> (def c (-> (conf/spark-conf)
                  (conf/master master)
                  (conf/app-name "tfidf")
                  (conf/set "spark.akka.timeout" "300")
                  (conf/set conf)
                  (conf/set-executor-env env)))
user=> (def sc (f/spark-context c))

master is a special "local" string that tells Spark to run our app in local mode. master can be a Spark, Mesos or YARN cluster URL, or any one of the special strings to run in local mode (see README.md for formatting details).

The app-name flambo function is used to set the name of our application.

As with most distributed computing systems, Spark has a myriad of properties that control most application settings. With flambo you can either set these properties directly on a SparkConf object, e.g., (conf/set "spark.akka.timeout" "300"), or via a Clojure map, (conf/set conf). We set an empty map, (def conf {}), for illustration.

Similarly, we set the executor runtime environment properties either directly via key/value strings or by passing a Clojure map of key/value strings. conf/set-executor-env handles both.

Computing TF-IDF

Our example will use the following corpus:

user=> (def documents
	 [["doc1" "Four score and seven years ago our fathers brought forth on this continent a new nation"]
	  ["doc2" "conceived in Liberty and dedicated to the proposition that all men are created equal"]
	  ["doc3" "Now we are engaged in a great civil war testing whether that nation or any nation so"]
	  ["doc4" "conceived and so dedicated can long endure We are met on a great battlefield of that war"]])

where doc# is a unique document id.

We use the corpus and spark context to create a Spark resilient distributed dataset (RDD). There are two ways to create RDDs in flambo:

  • parallelizing an existing Clojure collection, as we'll do now:
user=> (def doc-data (f/parallelize sc documents))
  • reading a dataset from an external storage system

We are now ready to start applying actions and transformations to our RDD; this is where flambo truly shines (or rather burns bright). It utilizes the powerful abstractions available in Clojure to reason about data. You can use Clojure constructs such as the threading macro -> to chain sequences of operations and transformations.

Term Frequency

To compute the term frequencies, we need a dictionary of the terms in each document filtered by a set of stopwords. We pass the RDD, doc-data, of [doc-id content] tuples to the flambo flat-map transformation to get a new, stopword filtered RDD of [doc-id term term-frequency doc-terms-count] tuples. This is the dictionary for our corpus.

flat-map transforms the source RDD by passing each tuple through a function. It is similar to map, but the output is a collection of 0 or more items which is then flattened. We use the flambo named function macro flambo.api/defsparkfn to define our Clojure function gen-docid-term-tuples:

user=> (f/defsparkfn gen-docid-term-tuples [doc-tuple]
         (let [[doc-id content] doc-tuple
               terms (filter #(not (contains? stopwords %))
                             (clojure.string/split content #" "))
               doc-terms-count (count terms)
               term-frequencies (frequencies terms)]
           (map (fn [term] [doc-id term (term-frequencies term) doc-terms-count])
                (distinct terms))))
user=> (def doc-term-seq (-> doc-data
                             (f/flat-map gen-docid-term-tuples)

Notice how we use pure Clojure in our Spark function definition to operate on and transform input parameters. We're able to filter stopwords, determine the number of terms per document and the term-frequencies for each document, all from within Clojure. Once the Spark function returns, flat-map serializes the results back to an RDD for the next action/transformation.

This is flambo's raison d'être. It handles all of the underlying serializations to/from the various Spark Java types, so you only need to define the sequence of operations you would like to perform on your data. That's powerful.

Having constructed our dictionary we f/cache (or persist) the dataset in memory for future actions.

Recall term-frequency is defined as a function of the document id and term, tf(document, term). At this point we have an RDD of raw term frequencies, but we need normalized term frequencies. We use the flambo inline anonymous function macro, f/fn, to define an anonymous Clojure function to normalize the frequencies and map our doc-term-seq RDD of [doc-id term term-freq doc-terms-count] tuples to an RDD of key/value, [term [doc-id tf]], tuples. This new tuple format of the term-frequency RDD will be later used to join the inverse-document-frequency RDD and compute the final tf-idf weights.

user=> (def tf-by-doc (-> doc-term-seq
                          (f/map (f/fn [[doc-id term term-freq doc-terms-count]]
                                       [term [doc-id (double (/ term-freq doc-terms-count))]]))

Notice again how we were easily able to use the Clojure destructuring facilities on the arguments of our inline function to name parameters.

As before, we cache the results for future actions.

Inverse Document Frequency

In order to compute the inverse document frequencies, we need the total number of documents:

user=> (def num-docs (f/count doc-data))

and the number of documents that contain each term. The following step maps over the distinct [doc-id term term-freq doc-terms-count] tuples to count the documents associated with each term. This is combined with the total document count to get an RDD of [term idf] tuples:

user=> (defn calc-idf [doc-count]
         (f/fn [[term tuple-seq]]
           (let [df (count tuple-seq)]
             [term (Math/log (/ doc-count (+ 1.0 df)))])))
user=> (def idf-by-term (-> doc-term-seq
                            (f/group-by (f/fn [[_ term _ _]] term))
                            (f/map (calc-idf num-docs))


Now that we have both a term-frequency RDD of [term [doc-id tf]] tuples and an inverse-document-frequency RDD of [term idf] tuples, we perform the aforementioned join on the "terms" producing a new RDD of [term [[doc-id tf] idf]] tuples. Then, we map an inline Spark function to compute the tf-idf weight of each term per document returning our final RDD of [doc-id term tf-idf] tuples:

user=> (def tfidf-by-term (-> (f/join tf-by-doc idf-by-term)
                              (f/map (f/fn [[term [[doc-id tf] idf]]]
                                           [doc-id term (* tf idf)]))

We again cache the RDD for future actions.

Finally, to see the output of our example application we collect all the elements of our tf-idf RDD as a Clojure array, sort them by tf-idf weight, and for illustration print the top 10 to standard out:

user=> (->> tfidf-by-term
            ((partial sort-by last >))
            (take 10)
(["doc2" "created" 0.09902102579427793]
 ["doc2" "men" 0.09902102579427793]
 ["doc2" "Liberty" 0.09902102579427793]
 ["doc2" "proposition" 0.09902102579427793]
 ["doc2" "equal" 0.09902102579427793]
 ["doc3" "civil" 0.07701635339554948]
 ["doc3" "Now" 0.07701635339554948]
 ["doc3" "testing" 0.07701635339554948]
 ["doc3" "engaged" 0.07701635339554948]
 ["doc3" "whether" 0.07701635339554948])

You can also save the results to a text file via the flambo save-as-text-file function, or an HDFS sequence file via save-as-sequence-file, but we'll leave those APIs for you to explore.


And that's it, we're done! We hope you found this tutorial of the flambo API useful and informative.

flambo is being actively improved, so you can expect more features as Spark continues to grow and we continue to support it. We'd love to hear your feedback on flambo.

http://yieldbot.com/blog/marcelines-instruments Marceline's Instruments 2014-06-25T17:59:15+00:00 2014-06-25T17:59:15+00:00 Homer Strong http://yieldbot.com/ Last December Yieldbot open-sourced Marceline, our Clojure DSL for Storm’s Trident framework. We are excited to release our first major update to Marceline, version 0.2.0.

The primary additions in this release are wrappers for Storm’s built-in metrics system. Storm’s metrics API allows topologies to record and emit metrics. Read more on Storm metrics in the official documentation. We run production topologies instrumented with Marceline metrics and have found it to be stable; YMMV! Please file issues on GitHub if you encounter bugs or have ideas for how Marceline could be improved. See the Metrics section of the README for usage. Also note that Marceline’s metrics can be useful for any Clojure Storm topologies, either with vanilla Storm or Trident.

Marceline’s exposure of Storm metrics has been very useful for monitoring the behavior of Yieldbot’s topologies. Friction around instrumentation has been greatly reduced. Code smells are down. Metrics now entail fewer lines of code and less duplication. An additional architectural benefit is that dependencies on external services can be isolated to individual topology components. It is painless to add typical metrics while maintaining enough flexibility for custom metrics when necessary. We have designed Marceline’s metrics specifically with the goal to leverage Storm’s metrics API unobtrusively.

As Yieldbot’s backend scales it is increasingly crucial to monitor topologies. Simultaneously, new features require iterations on what quantities are monitored. While topology metrics are primarily interesting to developers, these metrics are often directly related to data-driven business concerns. Several of Yieldbot’s Key Performance Indicators (KPIs) are powered by Storm and Marceline, so the availability of a fantastic metrics API translates to greater transparency within the organization.

If you’re interested in such data engineering topics as this, check out some of the exciting careers at Yieldbot!

-- @strongh

http://yieldbot.com/blog/our-series-b-funding Our Series B Funding 2014-06-06T05:42:00+00:00 2014-06-06T05:43:17+00:00 Jonathan Mendez http://yieldbot.com/ Today we announced we have received $18M of funding into Yieldbot in a Series B raise. New investor SJF Ventures lead the round that included existing investors RRE Ventures, NAV Ventures, Common Angels and additional contribution from City National Bank.

In an era when everyone wants to bucket your business into a category on a slide or an acronym we have consistently defied categorization. If you want to call us any one of these names below it would be accurate:

A big data company

A real-time analytics company

A predictive analytics company

A machine learning technology

A creative testing service

A traffic fraud detector service

A media company

A demand partner

A search media company

A mobile media company

A social media company

A shopper marketing company

A performance display company

A yield optimizer

A real-time ad server

This funding will help ensure we continue to be able to define ourselves. When you create something entirely new and never seen before you get that privilege.

So how do we define ourselves? One word, performance.

Performance for our buyers based on back-end metrics. We are told more often than not that we are competing on back-end metrics with paid search ads and blowing away anything they’ve ever seen in display. That defines us.

Performance for our publishers based on effective CPM and revenue. That defines us. For our oldest partners we are their largest source of revenue at the highest non-direct CPM. In fact, the CTR our technology generates are so high we are now working closely with direct and programmatic sales teams to run more campaigns through Yieldbot.

When marketer performance is driving effective CPMs higher and everyone is feeling great about it, you know you have something special. There have only been a few technologies that have ever done that at scale for publishers -- none of them using real-time or click-stream data. That ability to create high value media with our technology, that defines us.

Best of all we are making ads better on the web by making them more relevant to consumers. That defines us.

This funding however isn’t about defining Yieldbot. Our focus is on our customers. It’s more accurate to say it’s about redefining what is possible with display advertising. It’s about new buyers into new inventory. It’s about new technology applied in new ways. It’s about new levels of performance for marketers and publishers. It’s about a new channel of real-time media that is created explicitly by consumers in a privacy friendly way. We are creating media value in the digital ad ecosystem at levels no one ever has before and in ways no one ever has before. That is why we raised money. It is value we will continue to invest in for marketers, publishers and consumers.

The next chapter begins…

http://yieldbot.com/blog/how-publishers-can-take-advantage-of-mobile-social How Publishers Can Take Advantage of Mobile Social 2014-05-27T19:00:00+00:00 2014-05-27T20:38:02+00:00 Jaan Janes http://yieldbot.com/ Pinterest continues to make headlines as its footprint expands and it launches its first ad products, but what is the value of its traffic to both publishers and advertisers through the many outbound links it drives daily? What devices are being used? What is the impact on ad performance from this traffic? How does it compare to the other largest source of social traffic, Facebook?

Yieldbot is integrated with many leading publishers that actively work with Pinterest to drive traffic and “pinning” activity. For many of these publishers, our data shows that Pinterest has become the one of the top 3 largest sources of inbound referrer traffic (along with Google search and direct traffic). However, these visitors tend to bounce at a very high rate. Even with a lower amount of referrers in April of 2014 on sites on the Yieldbot platform, Facebook drove 12% more page views for publishers than Pinterest.

The influence of mobile is profound. For people coming to publishers from social media on mobile the pageview traffic regularly approaches or beats 50% of their total mobile page views.

As important as pageview traffic from social media is the efficacy of the traffic for advertisers. The CTR rate on Yieldbot associated with mobile social referrers always beats desktop traffic.

Yieldbot CTR Index By Session Referrer Source

(Index is compared to overall Yieldbot CTR in April 2014)

Facebook Mobile 117

Twitter Mobile 112

Pinterest Mobile 96

Facebook 84

Pinterest 68

Twitter 28

The fact is these visitors will click relevant messaging. They are in a media consumption pattern that screams “give me relevant information!” Publishers that can harness technologies that deliver relevant advertising will have great success in mobile.

Key Findings:

1) There is an enormous opportunity for publishers to drive high quality traffic from social media, especially Facebook and Pinterest.

2) The acceleration of mobile traffic from social media is profound.

3) Ad engagement, as defined by CTR, is highest from mobile social referrers.

Recommended Actions:

1) Mobile-first audiences are going to dictate the future for publishers, and publishers need to quickly align their programming and marketing strategies to produce quality content experiences for this audience.

2) Publishers need to consider session referrer source to improve ad targeting and to work with marketers to develop ad creative that resonates with inbound social media audiences as opposed to other sources of traffic.

There is an enormous market opportunity for publishers to harness the power of social media and Yieldbot is actively working with its publishers to build new revenue streams while delivering results for its ad partners.

http://yieldbot.com/blog/publishers-sitting-on-a-goldmine Publishers Sitting on a Goldmine 2014-02-06T14:58:13+00:00 2014-02-06T14:58:14+00:00 Rich Shea http://yieldbot.com/

If I was asked for one slide that best captured the motivation behind the technology and business direction of Yieldbot it would still be this original one. It IS the elevator pitch. And the points it makes are clear.

This slide is actually from the presentation that @jonathanmendez first made to me over four years ago when he was pulling me in to build Yieldbot with him. All this time later, it still best sums up the motivation behind our technology and how we view our market. I was fairly new to ad tech in 2009 and the thing I remember coming away thinking the most after our first conversation was: "It's not already done that way?".

Everything captured in that slide is as true in 2014 as it was in 2009 before any lines of code for Yieldbot had been written. In an industry that embraces the pivot, it's nice to have a slide that could still be used five years from now as much as it could five years ago.

Intent is Generated in the Publisher Domain

Search gets to take a peek at users' intent as they do part of their navigation through content on the web. But when a user is searching, they're searching *for something*. Search is a tool for getting the user to what they want. Publisher content is where users spend their time discovering new interests. Or digging deeper into existing interests. Their clickstream through the site and what they interact with is how they express what they are REALLY interested in.

Yieldbot's analytics discovers the intent of the publisher's users to use relevance as a tool for effectively monetizing the publisher's users interests. When advertising is relevant to the user's intent, it is not seen as intrusive. Done right it can augment the user's experience by delivering to them something that they are interested in. Something that matches their intent.

Only Pubs Can Effectively Harvest Visitor Event & Contextual Data

Publishers have a direct relationship with their users. As users consume content they are signaling what they are interested in. Yieldbot's technology lets publishers leverage this relationship, taking the value created by the content and making it monetizable when and where possible.

Yieldbot's javascript is loaded directly on the publisher's page. This arrangement is similar to first-generation web analytics. But while first-generation web analytics was useful for understating what happened in the past, Yieldbot is driven by real-time understanding of what is happening on the publisher's site and in the users' sessions, and can take action on it in real-time. Yieldbot's decisioning algorithms take into account the full range of context such as the site referrer to the session and time of day.

Only Pubs Can Weave Ad Optimization Tech Into the Experience

The publisher owns the experience on their site. This is where the three dimensions of optimization come together - the visitor and their intent, the context of their clickstream, and the creative for the message. This is what Yieldbot's technology does - our machine learning algorithms running in real-time with our ad decisioning pick the most relevant action presented in the way that's most engaging.

Other approaches, by not optimizing based on relevance, are by definition optimizing on the wrong thing for the user and publisher (and advertiser for that matter). Retargeting for example optimizes to a cookie that was set some time in the past on some other domain, without regard for whether the message is relevant to the user at that moment. That's why retargeting examples are so jarring. More often than not their message is no longer relevant, and the experience is a reminder to the user that they're being followed around the web; in the process detracting from the experience on the current publisher site. Contextual, as another example, only looks at one of the factors for relevance and can only drive simple targeting rules that are broadly defined.

The key here is that optimizing for relevance wins. Both brand and performance advertisers want relevance and will pay a premium either for relevant placement or performance (a natural byproduct of relevance). Publishers have a double-win of relevant messaging alongside their content from an experience point of view, as well as collecting the premiums that relevance brings from the advertisers. And the user always benefits from relevance in their overall experience of getting at what they are interested in.

Only Pubs Can Accomplish the Above Without Privacy Issues

Yieldbot does not track users across the web. We support Do Not Track initiatives because we don't think relevance for anyone involved (the user, the advertiser, and the publisher) depends on this type of tracking to be effective. Our results back this up.

With Yieldbot the value of the publisher's media is not diluted by bringing the insights about a user's intent and trying to monetize them somewhere else. Let retargeters dilute the value of the publisher's media and dilute the relevance to the user. By contrast Yieldbot makes decisions based on the user's intent on this particular site at this particular time.

Pubs Are Sitting on a Goldmine

For all of the reasons above, publishers are indeed sitting on a goldmine. Their domain is where users engage with content and express their interests. If this is the information age, then the value is where the information is. Publishers own the value of the web. Until we built Yieldbot, they were just lacking the technology that allows them to realize that value.

-- @shearic

http://yieldbot.com/blog/intent-targeting-vs.-contextual-targeting Intent Targeting vs. Contextual Targeting 2014-01-17T05:00:00+00:00 2014-01-17T16:42:40+00:00 Rich Shea http://yieldbot.com/

The image above is an example of the problem with Contextual targeting with the type of unfortunate pairing of ad and content we all come across now and again. Look closely at the headline of the content behind the interstitial ad for "10 Best Cruise Ship Water Slides", and you'll see it's about the passing of Gilligan's Island star Russell Johnson.

How Contextual Gets Confused

There have been worse combinations, to be sure. But there's little chance that a visitor will be interested in a cruise-related ad just because the subject of the underlying story is a star from a television show that featured as its set-up an ill-fated cruise.

The culprit here is so-called "Contextual" targeting. This ad technology scans the words or phrases in the content of the page, picks out the common ones, and then matches ads to pages based on a list of words or phrases associated with the ad. In this particular case it may have been the several instances of the word "island" on the page that triggered the match.

Really it's a shame that this approach is called contextual, because in truth every fact about what is involved and led up to that ad impression is context, but that's the terminology we're stuck with.

Why Intent Is Better

In contrast, Yieldbot's Intent-based targeting comparatively discounts the value of the content on the page itself. Instead, multiple Intent Signals, each derived from a unique "intent source", are used to make a real-time determination of the appropriate ad for the specific pageview -- if any.

To serve an ad, Yieldbot takes a handful of Intent Signals and brings them together to determine if any ads are appropriate to display for the pageview and which ads are most relevant. Sometimes it's determined there actually is no ad relevant enough, and that's fine too.

Yieldbot's intent sources include data derived from external referrer links, the path of pages through the session, and key attributes of the page itself. Pages themselves have associations with intent based on sessions of past visitors and these are correlated with the signals for the current user for the pageview. An independent decision on matching is made every single pageview taking all of these factors into account.

Ultimately what makes intent better than contextual is that intent is about what this specific user is interested in right now. Pages themselves can be about many things, so at best contextual can parse that out and try to infer what the page might be "mostly about". Intent brings into focus what actually interests this user, whether or not it's one of the dominant concepts on the page.

Intent Enables Display Bought Like Search

In the Yieldbot Marketplace, Intent inventory is purchased in the same manner as Search (the grand-daddy of intent), through the use of keywords. Stemming from these dynamics, there are at least three reasons why Yieldbot would avoid the type of targeting faux pas illustrated above.

First, if contextual is about "what" (what is on the page) then intent is much more about "why" (why the user is visiting that page). The nature of intent is about the question: why are people (and this current person) visiting this particular page at this particular time? A frequently repeated word or phrase in the content of the page hardly factors into the equation. The keywords associated with the intent bringing users to that page is likely to be different and more relevant.

Second, negative keywords are a much more natural fit when thinking about intent. In this case it's likely that a campaign about cruises, even if it had "island" as a keyword to match and that was coming through with an initially strong intent signal, would likely have negative keywords that would filter the ad from consideration (perhaps "die" or "food poisoning" for example).

Third, intent-based ad serving is inherently about relevance. As such, Yieldbot's ad serving uses real-time updated performance data to aid in determining relevancy. If an irrelevant ad did get impressions for a short time, Yieldbot's machine learning algorithms would determine that the ad was not relevant in that situation.

Moral of the Story

It's all about relevance, and for relevance intent is king. For advertisers and publishers alike, serving a non-relevant ad impression is a wasted opportunity. As a result, advertisers in the Yieldbot Marketplace are seeing higher ROAS (return on ad spend) and publishers are getting higher CPMs.

-- @shearic

http://yieldbot.com/blog/yieldbot-2013-year-in-review Yieldbot 2013 Year in Review 2013-12-19T06:27:00+00:00 2014-01-17T16:44:58+00:00 Jonathan Mendez http://yieldbot.com/ The first full year of our real-time marketplace for consumer intent was an amazing year of growth and learning. Since our business is about delivering measured performance I thought I would deliver some of Yieldbot’s performance to our friends, partners, buyers and anyone else that cares to know.

First-Party Data

We’ve continued to attract an amazing group of premier media partners to our platform. Our technology is deployed with 3 of the top 5 and 7 of the top 10 media companies. To give you some sense of our scale we recorded and analyzed 471 million user sessions just in the month of November. In little more than a year Yieldbot grew big. In total we saw 10 billion page-views in 2013 up from 2.7 billion in 2012. Most of that in Q3 and Q4. That’s a lot of data.

Publisher Revenue

Data is meaningless without associating revenue to it. In 2013 we established ourselves as a top media partner in the space. A number of publishers are now over $1 million annual run rates in revenue from Yieldbot.

On average Yieldbot paid publishers 142% more on a CPM basis than they got from Ad Exchanges and SSPs. Some media companies are getting over 300% higher returns. We have created a true mid-tier for premium publishers at a 97% fill rate where one did not exist before. No small accomplishment if you understand this side of the business.

A New Channel

All that publisher revenue comes from our machine learning that creates real-time “queries” from the click-stream. Unlocking the real-time intent of consumers during site-sessions has created for the first time, a direct and transparent path for Search advertisers into premium-media.

In 2013 we ran 436 campaigns vs 58 in 2012 - many of which are now “always-on” campaigns. 7 of the top 10 SEM agencies in the United States buy Yieldbot and we count 2 of the top 5 largest US Search spenders as customers. All told our advertiser list is a who’s who of largest and most respected brands in the world. Top buyers in the marketplace are spending mid-six figures annualized.

What everyone also discovered this year is that if Yieldbot performs like search but can scale like display it will crush the competition in other CPC campaigns like Social, Retargeting and Mobile. It's a truly amazing development because you can count on 1 finger the other company can get all four budgets.


NONE of this happens unless consumers find the ads relevant and then take action in the marketer environment. That is entirely what our function is. Helping consumers get relevant information and helping brands connect with consumers.

In that regard, we hit a grand-slam in 2013. We established that Yieldbot has the ability to generate performance equal to and in many cases greater than Paid Search. From e-commerce giants getting $4:$1 ROAS on Yieldbot, to major CPG brands getting over 60% conversion rates, to 34% lift in “intent to purchase” and everything in between, Yieldbot set new benchmark after new benchmark for our buyers back-end performance data. The best part for everyone is that we’re only getting smarter.


From incredible work across our data collection, to our data processing, to our real-time data streaming, to our databases, to our real-time matching, to our wicked fast ad server, to our creative technology to our analytics to our UIs and reporting and whatever else I’m missing (oh, open-sourcing cool stuff), we just continue to lead the development of technology for first-party data analytics, real-time decisioning and marketplace optimization. There is not a better team on the planet. Proof: only 14 (really smart) people (and a bunch of machines) built this monster.

Business Growth

We did more revenue yesterday than we did in the entire month of December last year. Overall, Q4/13 will achieve 2,400% growth vs the same period last year. To say we're growing fast is an understatement. Our growth rate for the first full year is a rarified feat. It has gotten crazy at times but the team has been amazing all year enjoying the start-up craziness they signed up for.

Overall we grew headcount this year from 14 to 30. Some companies will brag about how many employees they have, as you already read, I like to brag about how few. We also had the great fortune of adding media legend Cathie Black to our Board of Directors. If you're interested in being a part of what we're doing let us know.


It has been a great year both for the digital media industry and for our business. There is no doubt where this is going. When you listen to the top brands and media buyers in the world you hear that they are moving more budgets into digital in an effort to capture the relevant moments when consumer need matches the promise of their brands. Yieldbot will be there to deliver that every step of the consumer journey in 2014. We can’t wait for the year to begin. To all our friends and associates, we send you happy and healthy holiday wishes from Yieldbot himself.

http://yieldbot.com/blog/say-hello-to-marceline-clojure-trident-dsl Say Hello to Marceline 2013-12-16T10:13:00+00:00 2013-12-17T17:49:16+00:00 Soren Macbeth http://yieldbot.com/ Yieldbot is pleased to announce the public release of Marceline, A Clojure DSL for Trident.

Storm plays a central role in Yieldbot's real-time data processing systems. From data collection and ETL, to powering on-line machine learning algorithms, we rely heavily on Storm to process vast amount of data efficiently. Trident is a high level abstraction on top of Storm, analogous to Cascading for Hadoop. Trident, like Cascading is written is written in Java. This simply would not do.

Clojure, a lisp dialect which runs on the JVM, forms that base of the software stack for the data team at Yieldbot. We love Clojure because it allows us to quickly and interactively build our data processing systems and machine learning algorithms. Clojure gives us the REPL based development of a dynamic, functional language with the performance and stability of the JVM. With Marceline, we get the best of both worlds by being able to develop and test our Trident topologies in Clojure, uberjar them up and ship them off to our production environments.

Marceline is still young, but we have been running in production without issue. For more information about how to use it, including examples, please see the README on github. Special Thanks to Dan Herrera and Steven Surgnier for their help in testing, writing documentation and providing additional examples.

-- @sorenmacbeth

http://yieldbot.com/blog/real-time-relevance-matters-on-cyber-monday Real-time Relevance Matters on Cyber Monday 2013-12-04T21:16:19+00:00 2013-12-04T21:16:20+00:00 Jonathan Mendez http://yieldbot.com/ Digital ads specifically messaging to Cyber Monday greatly outperformed ads that did not mention it.

From its early incarnation by Shop.org, Cyber Monday has turned into an online holiday shopping benchmark. At Yieldbot we decided to take a look across our retail advertiser base consisting of some of the largest brands and retailers in the world to see how advertisers that took advantage of this 24-hour period fared compared to those that did not modify their messaging specifically for Cyber Monday.

Our dataset includes almost a dozen retailers running millions of impressions and receiving over ten thousand clicks and hundreds of conversions during Cyber Monday on the Yieldbot platform. We compared this same dataset of advertisers to last Tuesday that we picked because it was far enough away from Thanksgiving Day and Black Friday behavior that could skew the data but close enough to Cyber Monday for data consistency.

The results show that overall there was heightened consumer purchase intent on Cyber Monday. However, marketers that took advantage of the contextual real-time relevance of the day fared significantly better.

CTR on ads without Cyber Monday messaging increased 12%

CTR on ads with Cyber Monday messaging increased 43%

For the more important conversion rate the performance difference was striking.

Conversion Rate on ads without Cyber Monday messaging increased 15%

Conversion Rate on ads with Cyber Monday messaging increased 225%

Conclusion: Being able to adjust messaging to real-time context is a marketing practice that demands more attention. Advertisers and Marketers that do this will increasingly come out on top in the hyper-competitive fight for consumer attention and action.

http://yieldbot.com/blog/how-yieldbot-influences-consumer-actions-online-and-in-store How Yieldbot Influences Consumer Actions Online and In-Store 2013-12-02T15:01:00+00:00 2013-12-02T15:08:58+00:00 Jonathan Mendez http://yieldbot.com/ Driving Coupon Conversions and Measurable Lift in Purchase Intent

Search is often leveraged to “close the loop” for other marketing channels. Some go as far to call it “the net” for its success catching previously exposed consumers and funneling them to online purchase or brand engagements. Display and traditional advertising are generally utilized to fill that proverbial “intent” net and build demand for specific products and lift in brand metrics. In tandem and through measurable attribution models they work to fulfill an efficient customer acquisition funnel.

Yieldbot has created a new marketing channel in the premium publisher environment leveraging intent signals and the buying and matching principles of Search (1st party data, in-session/real-time matching, CPC pricing, and keyword targeted ad creative) that does both. Yieldbot closes the loop on consumers by meeting the real-time consumer intent with a relevant ad to drive immediate action and builds demand for products by putting in-view, visible relevant ads when consumers are most receptive to the message, during lean forward content experiences ultimately increasing the likelihood of ad engagement and message receptivity.

As evidence, a major food brand was bringing a new product to market. Their major holding company Search agency ran a campaign using Yieldbot with the goal to drive offline/in-store purchases. To measure effectiveness of the campaign, Yieldbot, in partnership with Nielsen, ran a control vs. expose study to measure Yieldbot’s effectiveness with respect to consumer purchase intent and utilized a 3rd party ad serving tracking technology to measure coupon downloads that occurred on the food brand’s website.

For this brand the Yieldbot campaign was crafted into several thematic groups of keywords, creative and publishers. The keywords (used as proxy for the consumer’s real-time intent) and ads (HTML text) were matched with each other based on messaging. Additionally, the ads are written to match with the context of the publishers in the grouping and their design automatically matched by Yieldbot to the look and feel of each publisher. This combination creates hyper relevance and allows for a seamless consumer experience.

The campaign results spoke volumes. Yieldbot, targeting real-time intent groups for energy, healthy living, fitness, wellness, and healthy eating drove a campaign average 19% coupon download rate, and a 33.7% lift in consumer purchase intent for those that either “definitely will” or “probably will” purchase the product within the next 30 days. Additionally, the campaign drove 6x above industry average engagement rates (CTR). We were not given specific data points for other marketing channels (Search, display, video) however when budget cuts for the brand had to be made Yieldbot was still standing.

The campaign’s success can be attributed to three major components:

  1. Text ads built specifically for the Yieldbot publisher marketplace. The Yieldbot marketplace leverages the same principles as Search, as mentioned; however there is an added layer of data that Yieldbot sees in order to determine intent, the publisher environment. It must be taken into consideration to ensure hyper relevance.
  2. Custom premium publisher marketplace and above the fold placements. The Yieldbot marketplace is comprised of the most premium publishers on the web including the websites of 3 of the 5 largest magazine corporations in the world. These environments contain the highest value and most desired consumers for advertisers. Yieldbot technology and algorithms matched the desired intent, demographics, and psychographics for the advertiser. Also, all ads in the Yieldbot marketplace are delivered in above the fold placements significantly increasing viewability and overall effectiveness.
  3. 1st party data used to target real-time intent. Yieldbot captures, organizes and activates 1st party data from the visitor click-streams our publishing partners to inform ad decisioning. An ad is delivered only if there is a real-time match of intent and advertiser on that specific page-view. No 3rd party cookies, no unknown URLs. The Yieldbot technology is constantly running machine learning and using pattern recognition across these click-streams and pages to deliver the benefits of relevance to the advertiser, the publisher and most of all, the consumer.
http://yieldbot.com/blog/publisher-traffic-and-ad-performance-from-social-networks Publisher Traffic and Ad Performance from Social Networks 2013-09-19T00:00:00+00:00 2013-10-14T17:52:23+00:00 http://yieldbot.com/

The Yieldbot platform sees over a billion and a half page-views monthly across leading premium publishers, with a cross-section of the web’s largest and best women’s programming; food/recipe, home/garden and health/wellness. Many of these sites are active programmers on a range of social platforms from Facebook to Pinterest to Tumblr to Twitter. The following data from Yieldbot shows the level of traffic and engagement with ads is highly variable.

Social Media as a Traffic Source

Pinterest is by far the largest source of social media traffic to publishers on the Yieldbot platform. “Pinning” is known to be a common activity in women’s programming verticals, including style, hobbies and projects, like home improvement

Yieldbot Monthly Page Views Index by Original Referrer Source

The above data shows Pinterest dominance as a domain referrer source to publishers. Interesting in that this dominance does not extend to mobile where Facebook greatly outperforms Pinterest. Also interesting is that Publishers are not generating any significant visitors from Tumblr or Twitter.

Ad Engagement from Social Media Traffic

While Pinterest desktop drives the most social platform traffic by far to publishers on the Yieldbot platform the performance of advertisers (as defined by CTR) is below par relative to other sources of social traffic.

Yieldbot Advertiser CTR Index by Original Referrer Source

Key Findings:

1) Social media presents a rich and large opportunity for driving traffic to premium publishers, especially from Pinterest, but publishers need to do a better job of generating volume on Facebook. Both Tumblr and Twitter, despite their large audiences are virtually meaningless to publishers as a traffic source.

2) Ad engagement, as defined by CTR, indexes higher from Facebook than other social platforms. Increases in traffic from Facebook will benefit the publisher’s advertisers given the high CTR levels.

Action Items:

1) Social traffic is not performing well in content because the ads are not geared toward social referrers. Huge upside here. We call this the “inbound social” opportunity.

2) Non social referred traffic is clicking at higher rates so why not push this traffic to social presences of brands. We call this the “outbound social” opportunity.

http://yieldbot.com/blog/introducing-our-real-time-intent-reporting-tool Introducing Our Real Time Intent Reporting Tool 2013-07-03T14:46:00+00:00 2013-09-27T18:27:05+00:00 http://yieldbot.com/

This post celebrates an anniversary. A year ago today and based on two years of prior work building our real-time intent technology, we launched what we think is by far the best business model for the web. We launched a real-time keyword marketplace. It’s one of the few ever created outside of Search.

A year ago we started with 3 publisher web sites and 1 advertiser. Yieldbot now is deployed on over 60 premium websites and content networks within the three verticals serving 125 cost-per-click campaigns.

We all love stats at Yieldbot so here are some more stats about Yieldbot.
Yieldbot crunches 14TB of data a day

Revenue has doubled every 2 months since launch.
The most important stat is of course results. The vast majority of Yieldbot advertisers get performance at or exceeding their Search Marketing efforts. Our work has no precedent there.

Our second advertiser gave us an idea we were onto something. After 6 weeks buying on Yieldbot they reported that our conversion rate was 35% better than Paid Search. Then our third advertiser told us a similar story. We’ve kept hearing it and it never gets old.

Yieldbot marketplace connects the two largest channels of digital ad supply and demand. Premium publishers get the performance technology they absolutely need to survive digital. Search Marketers get much-needed new inventory. Parties that never did business with each other before now do it transparently and in real-time through Yieldbot. It’s a wonderful thing to behold.

You might be asking yourself how does this work? What is Real-Time Intent?
People browse the web with purpose and direction. A real-time “why” is the most powerful targeting dimension Marketing has ever known.

Yieldbot works to understand the real time “why” a person is clicking to any and every page of web page. We do this through our advanced keyword navigation path analytics, our applied data science and our intelligent real-time match decisions.
So one year in and we’ve won the trust of many of the leading media brands and advertising budgets on the web. Now it’s time for Yieldbot to build on that foundation. We’ve launched our Mobile solution and our real-time intent has caught interest in Social advertisers as well as Search. We’re psyched.

We still fully manage the campaigns ourselves and source our own demand. Until we fully make it a self -serve business we know to a large extent we cannot truly be a market. We are working towards opening Yieldbot up through APIs to a number of demand partners by early 2014.

Real-time click-stream intelligence is an area where few have ever ventured. Our data science team, our engineering team, our sales and business development team and our account strategy teams are learning for ourselves what Yieldbot is, what it can be and what it should be. The early results are astounding to even us. Year 1 was an incredible journey. The journey continues. We hope you join us on it.

http://yieldbot.com/blog/third-verse Third Verse 2013-06-27T00:00:00+00:00 2013-10-04T17:13:28+00:00 http://yieldbot.com/ I’m excited to announce our Series A-1 round of funding for Yieldbot. The $5M round was lead by RRE Ventures who has invested at every stage of our development and continues to show amazing support for our original vision that we’ve been executing on the past 3 years.Existing investors New Atlantic Ventures, kbs+ Ventures, Common Angels, Neu Ventures and Board member Cathie Black participated as well. To date Yieldbot has raised $10.5 million over three rounds of funding.

It’s an exciting time for Yieldbot. We spent our first two years building a technology foundation and now enter the second year of our real-time intent marketplace. We have proven the value of our real-time intent technology and data by delivering performance for Marketers that meets and exceeds the ROI they are getting from Search. Never before has this type of performance been seen in another channel by these marketers. That’s why we are fortunate to count some of the largest direct Search advertisers and many SEM arms of major agency holding companies as clients.

Our ROI performance allows the other constituent in our marketplace and the source of the data, premium publishers, to connect directly to SEM budgets for the first time since the inception of Paid Search. Considering the mounting monetization challenges premium publishers face this is a major market development. We’re fortunate to count leading media companies like Meredith, Rodale, Remedy Health, BlogHer, BabyCenter, Internet Brands and many others as partners. We’re closing in on 1.5 Billion Page Views a month of real-time intent. Compared to search query volume this would make Yieldbot the 3rd largest real-time intent source on the web behind Google and Yahoo/Bing.

All of this wouldn’t be possible without an amazing team of (at this writing 25) people starting with our CTO Rich Shea and our Chief Data Scientist Soren Macbeth. I also want to mention our platform architect Dave White and our javascript guru Erik Solen who have both been building Yieldbot since the early days.

It’s still early days. We’re travelling through uncharted territory of first-party big data mixed with real-time decisioning. That’s the way we like. The learning we’ve accumulated over the last three years in this space has been tremendous. We’re excited to learn more and keep building something different. More than one marketer told us this past year that Yieldbot is their secret weapon. A year from now we anticipate Yieldbot will not be so secret anymore.

http://yieldbot.com/blog/year-1-of-yieldbot-marketplace Year 1 of Yieldbot Marketplace 2013-05-04T20:27:00+00:00 2013-10-04T18:46:19+00:00 http://yieldbot.com/

This post celebrates an anniversary. A year ago today and based on two years of prior work building our real-time intent technology, we launched what we think is by far the best business model for the web. We launched a real-time keyword marketplace. It’s one of the few ever created outside of Search.
A year ago we started with 3 publisher web sites and 1 advertiser. Yieldbot now is deployed on over 60 premium websites and content networks within the three verticals serving 125 cost-per-click campaigns.
We all love stats at Yieldbot so here are some more stats about Yieldbot.
We’ve made 5,753,199,447 real-time match decisions (most have been NOT to serve an ad)
Yieldbot crunches 14TB of data a day
Revenue has doubled every 2 months since launch.
The most important stat is of course results. The vast majority of Yieldbot advertisers get performance at or exceeding their Search Marketing efforts. Our work has no precedent there.
Our second advertiser gave us an idea we were onto something. After 6 weeks buying on Yieldbot they reported that our conversion rate was 35% better than Paid Search. Then our third advertiser told us a similar story. We’ve kept hearing it and it never gets old.
Yieldbot marketplace connects the two largest channels of digital ad supply and demand. Premium publishers get the performance technology they absolutely need to survive digital. Search Marketers get much-needed new inventory. Parties that never did business with each other before now do it transparently and in real-time through Yieldbot. It’s a wonderful thing to behold.
You might be asking yourself how does this work? What is Real-Time Intent?
People browse the web with purpose and direction. A real-time “why” is the most powerful targeting dimension Marketing has ever known.
Yieldbot works to understand the real time “why” a person is clicking to any and every page of web page. We do this through our advanced keyword navigation path analytics, our applied data science and our intelligent real-time match decisions.
So one year in and we’ve won the trust of many of the leading media brands and advertising budgets on the web. Now it’s time for Yieldbot to build on that foundation. We’ve launched our Mobile solution and our real-time intent has caught interest in Social advertisers as well as Search. We’re psyched.
We still fully manage the campaigns ourselves and source our own demand. Until we fully make it a self -serve business we know to a large extent we cannot truly be a market. We are working towards opening Yieldbot up through APIs to a number of demand partners by early 2014.
Real-time click-stream intelligence is an area where few have ever ventured. Our data science team, our engineering team, our sales and business development team and our account strategy teams are learning for ourselves what Yieldbot is, what it can be and what it should be. The early results are astounding to even us. Year 1 was an incredible journey. The journey continues. We hope you join us on it.

http://yieldbot.com/blog/byod-todays-driving-force-in-ad-targeting BYOD – Today’s Driving Force in Ad Targeting 2013-05-03T00:00:00+00:00 2013-10-04T17:20:25+00:00 http://yieldbot.com/ As we move through the second quarter of the year, the remarkable shift in the balance of power in media buying and selling online continues and the ad side has never been more empowered. But, there are tactics that publishers can deploy to fight back and create value and insights that drive results for their advertisers while satisfying their users.

Today, media buyers are not only defining narrower audiences for media publishers to deliver, but more importantly, they are coming to publishers with defined data sets of their own – hence BYOD (Bring Your Own Data) – and they only want to buy the users they are interested in. No one else – just the users they want. This trend has accelerated tremendously in the past 12 months because of the confluence of two powerful developments that literally have had an exponential effect when combined.

First, new ad side data and insights have come on the market and taken hold with many advertisers. For example, retargeting has proven to be an effective means of driving business from existing customers, especially for retailers, and many companies work to service that need.

Second, the rise of programmatic buying and exchanges has given these new ad side tools efficient access to more and more users to scale, bypassing the typical publisher direct relationship. And the introduction of FBX (the Facebook Exchange) allows BYOD advertisers access to literally billions and billions of impressions at a fraction of the expense of traditional display inventory.

BYOD in online advertising is part of a broader trend toward addressable media that goes well beyond online. For simplicity, let’s define addressable media as the targeting of advertising to a specific consumer or household through data through a media channel. The data can come from either the advertiser side or the publisher side, sometimes a combination of the two, and frequently will include a third-party data provider. For instance, an insurance company ran a TV campaign recently for renter’s insurance in collaboration with two satellite TV providers targeted only to households that had been identified as rentals.

This shift is certainly turning traditional media selling online on its ear. Just a few years ago, a media seller would be touting the content of a site, the overall user demographics of the site and its index against a desired demographic set and it would have been enough to win an advertiser’s business. Today, an advertiser might not care about any of those factors when using their BYOD strategy, and they are likely to want to isolate just a fraction of a site’s users.

And, this trend is even scarier for online publishers with inventory on exchanges where they likely have no idea why the advertiser is buying media from them. The publisher will know the advertiser and the CPM rate but is very removed from the why? Was it a retargeted ad for a golf retailer? Was it a data match for a Caribbean traveler? Was it a user interested in a new car?

Do you want to be a publisher in a business where you don’t know why advertisers are buying your media?

There’s a slippery slope here and many publishers have started to experience the downhill trend – a loss of control, a loss of insights, a loss of advertisers and lower CPM’s overall, not higher.

It’s imperative that the online publishing community embraces a BYOD strategy of their own. Publishers must look to their audience and their behaviors and create meaningful insights that fuel ad matches with the needs of marketers. This BYOD effort by publishers has to begin with truefirst-party data about the site and the user experience. While many publishers are embracing DMP’s (Data Management Platform), consider the data sets being utilized by the DMP.They are often commoditized third-party data sets that advertisers already use on exchanges and elsewhere. And unless you have massive scale as a publisher, this commoditized data won’t really be helpful. Publishers need to create new first-party data that is differentiated and put it to use to meet the needs of advertisers and also their site’s users who are seeking a relevant, timely experience both in the content they seek and the ads they interact with.

There are few companies today that have taken on the BYOD challenge directly to help publishers. Yieldbot is one of those companies and we are working daily to provide our publisher partners incredible intelligence into first-party session-based user intent, the most powerful targeting signal of all. Yieldbot is also creating a new channel of advertising revenue for publishers by matching search advertisers with real-time user intent, creating a highly relevant and satisfying experience for users.

The opportunity has never been greater and by all measures the overall online ad spend will continue to grow. But, the risks are also high and the distribution of dollars between search, social, mobile, display, exchanges and new platforms remains to be determined. Advertisers have a lot of options and if you are a publisher whose data would you prefer to be working with? This isn’t a winner take all market, but publishers that embrace BYOD and create true data and insights into their site experience and users stand a strong chance to create separation in the market and drive value for their advertising partners and their audience.

http://yieldbot.com/blog/yieldbot-is-first-new-marketing-channel-to-perform-as-well-as-search Yieldbot is First New Marketing Channel to Perform as well as Search 2013-04-09T17:25:00+00:00 2013-10-04T18:01:36+00:00 http://yieldbot.com/ I know…that’s a pretty bold statement. But it’s true. Buried in our press release about Cathie Black joining our board were the quotes every marketing technology since 2004 or so has been hoping to have its customers say about it:

"We use Yieldbot for a number of different clients here at TargetCast, and have been seeing search-like performance on the front end, and more importantly, on the back end of our campaigns. Yieldbot has opened up a whole new marketplace for our clients to connect directly with the real-time mindset of their consumers in the publisher domain.”

- Philippe Sloan, TargetCast SVP Interactive Marketing Director

This is not an accident. We have spent the last 2.5 years building for this. Our goal was always to capture this intent where it is most valuable, premium content, and connect it to Search Marketers starving for a new channel. Our thesis, born from years of experience in performance marketing, was always that consumer intent in media outside of Search was just as valuable as Search intent. Why wouldn’t it be? Search does not create any intent. Intent moves to Search for navigation and then Search pays it forward into content.

Building a technology to capture real-time intent and then activating it to achieve Search levels of marketer performance are why CTOs of other companies have left and joined our team. Why people from Criteo, Kayak, Microsoft, Wall Street Journal, IAC have left those great companies to be part of something revolutionary. None of this happens without the amazing team of people at Yieldbot.

On the data side, the brightest minds in real-time data led by Chief Data Scientist Soren Macbeth are pioneering new ways of capturing, shaping and activating first party data on premium publishers every day. On the platform side the brightest minds in piping data led by our CTO Rich Shea are building a self-learning platform that makes real-times decisions – the scale of which would make Yieldbot the fourth largest search engine in the US.

“Yieldbot’s intent-based advertising has provided us with a critical supplement to our paid search efforts. The conversion rate performance and number of conversions are equal to, and have sometimes exceeded, those of our SEM.”

- Marin Rowe, Alliance Health Networks Marketing Manager

So it’s time to toot our horn a little.

Search divisions at 3 of the 5 major agency holding companies buy real-time intent on Yieldbot. One of the largest direct buyers of Search buys real-time consumer intent Yieldbot. We have case study after case study from leading brand after leading brand showing our performance levels for volume and conversions matching and often exceeding SEM.

People said it couldn’t be done. After 2.5 years of building and 10 months of our marketplace our advertisers are telling us with confidence it’s been done. We’re telling them and you that we’re just getting started. It will only get better.

http://yieldbot.com/blog/yieldbot-appoints-media-pioneer-cathleen-black-to-its-board-of-directors Yieldbot Appoints Media Pioneer Cathleen Black to its Board of Directors 2013-03-27T00:00:00+00:00 2013-10-07T16:20:33+00:00 http://yieldbot.com/ Black will advise Yieldbot on its strategic direction as the Company continues to grow its real-time consumer intent marketplace for advertisers and premium publishers.

NEW YORK, New York – March 27, 2013 – Yieldbot today announced it has appointed Cathleen Black to its Board of Directors, effective March 18, 2013. An experienced media executive and business leader, Black will advise Yieldbot on its strategic direction as the Company grows its real-time consumer intent marketplace.

“We are extremely excited to welcome Cathie to Yieldbot’s Board of Directors,” said Jonathan Mendez, Yieldbot Founder and CEO. “Cathie’s leadership and knowledge within the media and publishing industry is extensive. Her experience and perspective will be invaluable to Yieldbot as we continue to bring new revenue streams to publishers and better results for advertisers through our real-time consumer intent marketplace.”

“It is very exciting to join the board of such a promising company at this early stage and to play a role in its growth and success,” said Black. “Yieldbot’s technology is at the forefront of the real-time data and analytics evolution, and has the potential to bring important new insight to publishers as to how they view their data and use it to deliver better experiences to audiences in the form of content and advertising.”

Black was President, then Chairman of Hearst Magazines, one of the world’s largest publishers of monthly magazines, for 15 years, and oversaw such titles as Cosmopolitan, Esquire, Good Housekeeping, Harper’s Bazaar, O, The Oprah Magazine, and nearly 200 international editions. Called “the First Lady of Magazines,” Black has been named to Fortune Magazine’s andForbes’ “Most Powerful Women in Business” lists numerous times. Black was also President and Publisher of USA Today for 7 years in its beginning.

Black served on the boards of IBM and the Coca-Cola Company for 20 years, before becoming Chancellor of New York City Schools in 2010. She is a member of the National Council of Foreign Relations and her book, Basic Black: The Essential Guide for Getting Ahead at Work (and in Life), is a New York Times bestseller.

Black joins Yieldbot 10 months after the launch of its real-time consumer intent marketplace. The marketplace currently captures 1.6 billion real-time consumer intentions each month across its base of premium web publishers. Yieldbot makes these intentions available for purchase by search marketers to match ads against in real time. The Company has doubled its advertising revenue every two months since the marketplace’s launch.

"We use Yieldbot for a number of different clients here at TargetCast, and have been seeing search-like performance on the front end, and more importantly, on the back end of our campaigns,” said TargetCast SVP Interactive Marketing Director Philippe Sloan. “Yieldbot has opened up a whole new marketplace for our clients to connect directly with the real-time mindset of their consumers in the publisher domain.”

“Yieldbot’s intent-based advertising has provided us with a critical supplement to our paid search efforts,” said Alliance Health Networks Marketing Manager Marin Rowe. “The conversion rate performance and number of conversions are equal to, and have sometimes exceeded, those of our SEM.”

Yieldbot’s publisher base has grown exponentially, as well, tripling page views each quarter since the marketplace’s launch.

“Yieldbot is looking at publisher data in a way that no other company has done before,” said Black. “Its technology can be very powerful for media companies seeking to transform their businesses in today’s digital environment.”

“Cathie’s whole career has been forward-looking,” said Mendez. “Cathie’s joining us validates the importance of what we’re doing for publishers and premium content providers.”

With Black’s appointment, Yieldbot’s Board of Directors now comprises five members, including Scott Johnson, Founder & Managing Partner, New Atlantic Ventures; Jonathan Mendez, Founder & CEO, Yieldbot; Jerry Neumann, Partner, Neu Venture Capital; and Eric Wiesen, General Partner, RRE Ventures.

To learn more about Yieldbot’s products and services, its real-time consumer intent marketplace, and how publishers can increase their yields while advertisers strengthen their intent-based marketing results, please visit our website.

About Yieldbot

Founded in 2010, Yieldbot captures, organizes and activates real-time consumer intent before, after and outside of search, creating a new marketing channel for advertisers and new revenue streams for publishers. Through its real-time consumer intent marketplace, Yieldbot enables advertisers to deliver ads at the exact moment consumers are most open to receiving relevant marketing messages. The Yieldbot marketplace currently captures 1.6 billion intentions each month and frequently delivers performance equivalent to, or stronger than, search marketing performance.

For media inquiries, please email info@yieldbot.com.

http://yieldbot.com/blog/why-yieldbot-built-its-own-database Why Yieldbot Built Its Own Database 2013-02-04T17:48:00+00:00 2013-10-04T18:49:23+00:00 http://yieldbot.com/ About a year ago we made the decision to switch over all of our configuration to a new database technology that we would develop in-house, which we ended up calling HeroDB.

The motivating principle behind what we wanted in our configuration database was a reliable concept of versioning. We had previously tried to manually implement some concept of versioning in the context of available database technology. This would keep around older versions of objects in a separate area with a version number identifying them, and application logic would move copies of whole objects around as changes were made to them. Data events would contain versions of the objects that they were related to. While this did some of what we wanted, it was clear that this was not the solution we were looking for.

Enter Git

at the core of Git is a simple key-value data store. You can insert
any kind of content into it, and it will give you back a key that you
can use to retrieve the content again at any time.
- Pro Git, Ch 9.2

While we had these challenges thinking about the management of data in our application, we were managing one of our types of data with perfect version semantics. Our source code. Some simple searches told us we weren’t the first to think about leveraging Git to manage data, but there also wasn’t anything serious out there for what we were looking for.

So we thought hard about what Git would be able to provide us as a database (there are definitely both pros and cons) and how it intersected with what we were looking for in versioning our configuration data. A few of the things we liked:

  • every change versioned
  • annotated changes (comments) and history (log)
  • every past state of the database inspectable and reproducible
  • reverting changes
  • cacheing (by commit sha) - specific version of data doesn’t change

There were definitely cons, which we decided would be worth it for the strength of the benefits we’d be getting. A few of the cons we decided to live with:

  • comparatively slow, both reads and writes
  • size concerns, would shard
  • no native foreign keys

Some of these can be mitigated. For instance, read performance can be improved (with caveats) by having cloned repos for read that are updated as changes are made to the “golden” repo. To mitigate size concerns, and because there is no native concept of foreign keys, data can be sharded with no penalty to what can be expressed in the data.

What We Did

Once we decided we liked the sound of having Git-like semantics on our data, we went about looking for projects that might aleady be available that provided what we wanted. Not satisfied with what we found our next step was to look for suitable programmatic interface to Git. In the end we found a good solution there in a project named Dulwich (https://github.com/jelmer/dulwich) which is a pure Python implementation of Git interfaces.

With Dulwich as the core, we implemented a REST API providing the data semantics that we wished to expose.

In terms of modeling data, we took Git’s concept of Trees and Blobs, conceiving of Trees as objects and Blobs as attributes, with the content of the Blobs being the values of the attributes. The database itself exists as a Git “bare” repo. Data is expressed in JSON, where ultimately all values are encoded in Blobs and where Trees represent nesting levels of objects.

The following simple example illustrates the JSON representation of a portion of our database realized in Git and what that looks like in a working tree.

Example JSON:

{“alice@example.com”: {“name”: “Alice”, “age”: 18}, “bob@example.com”: {“name”: “Bob”, “age”: 22}}

In cloned repo:

$ find .

$ cat alice@example.com/name
$ cat alice@example.com/age

What’s magical about representing your data this way is that it has a very understandable and easy to work with when realized in a filesystem view of the repo (i.e., the working tree). The database can literally be interacted with by using the same Git and sytem tools that developers are used to using in their everyday work. The database is copied local by cloning the repo. Navigating through the data is done via `cd` and `ls`, data can be searched using tools like `find` and `grep`, etc. Best of all, data can be modified, committed with appropriate comment, and pushed back to production if need be.

Interesting Directions to Evolve

Thinking about managing your data the same way you manage your source code leads to some interesting thoughts about new things that you can do easily or layer on top of these capabilities. A few that we’ve thought of or done in the last year:

  • Reproduce exactly in a test environment the state of the database at some point in time in production in order to reproduce a problem.
  • Discover the exact state of the database correlating to data events (by including commit sha).
  • Analyze the effect of specific changes to configuration on behavior of the platform over time.
  • Targeted undo (revert) of a specific configuration change that caused some ill effect.
  • Expose semantics in a product UI that map to familiar source control semantics: pull/branch/commit/push

Why We Did It

The short answer of why we did it was because we considered history such an important aspect of the database. Building a notion of history into the database itself was the best way to ensure the ability to correlate data events like clicks on ads back to configuration of monetary settings that drove the original impression in an auditable fashion. Not finding the solution to our problem anywhere we followed one of Yieldbot’s core principles and JFDI’ed.

The simplicity of the approach hit two strong notes for us. First was that the simplicity brought with it a kind of elegance. It is easy to understand how the database handles history and to reason about the form of the data in the database. We also immediately got functionality like audit logging built into the database for free. And ultimately for a technical team that at the time was four developers with our hands full building rich custom intent analytics, performance optimized ad serving, a rich javascript front-end to manage, explore and visualize our custom intent analytics, and a platform that scales out to a worldwide footprint, we could focus on our core mission of building that product.

We’re discussing our work with HeroDB later this week:

Yieldbot Tech Talks Meetup (Feb 7, 2013 @ 7PM):www.meetup.com/Yieldbot-Tech-Talks/events/101101302/

Interested in joining the team? We're hiring!

http://yieldbot.com/blog/yieldbot-2012-review Yieldbot 2012 Review 2012-12-31T00:00:00+00:00 2013-10-04T18:10:43+00:00 http://yieldbot.com/ 2012 was a huge year of progress at Yieldbot. We started off the year by taking our second round of funding in February from true VCs that didn’t need to see “traction” before making their bet. With that investment we grew the company over the course of the year from 5 to 19 full-time employees in New York and Boston. Companies are about people first and we have put a together a tremendous team of data scientists, developers, engineers, strategists and sales people who have joined us from companies like Criteo, Microsoft, Kayak and the Wall Street Journal.

After two years of development from that small team of 5 we launched our real-time consumer intent marketplace in May. It was worth the effort. The amount of paid clicks in the Yieldbot marketplace has doubled every two months since we launched. Doubling the size of your business 4 times in 8 months presented numerous scaling challenges that touched all parts of the business. Our team handled them incredibly well.

We have a good number of the word’s leading brands and many other marketers large and small extending their Search budgets into Yieldbot to buy real-time intent on a performance (Pay Per Click) basis. Speaking of performance, in many cases advertisers are seeing results as good or better than what they see in Paid Search from traffic derived from Yieldbot. As most industry observers know, this is heretofore unseen.

Best of all we truly unlocked a direct path for Search Marketing budgets to reach Premium Publishers and buy intent in real-time. This is an industry first and we consider it a monumental achievement in digital media. Yieldbot’s largest publisher partner is on a $2M 2013 run rate for new revenue - Search revenue. These are budgets they have never touched and their direct sales teams have never called on. We truly have created a new channel. That doesn’t happen very often.

In 2012 Publisher partners were stacking Yieldbot behind their sponsorships/direct sold impressions and ahead of exchange/network. That’s a great starting place but we aim to create much more value as our technology improves in 2013. We saw over 15M different consumer intentions across more than 2.7B page-views in 2012. The more data we capture the better we perform. This is one reason Yieldbot overall platform CTR (click-through rate) has gone up every month even as impression levels have skyrocketed. There are efficiencies created as markets get larger and those will benefit both Yieldbot advertisers and publishers.

Major initiatives around automation and artificial intelligence were also started late in 2012 that will make optimization of Yieldbot performance completely automated. From campaign set-up and launch through goal management, the use of first party data and ad server integration creates an opportunity to reshape what is possible with marketing technology and reduce the resources necessary to manage campaigns.

We head into 2013 fully aware that we have not accomplished anything close to our goals and we are still at the beginning of building our business. We have 2 new verticals launching Q1 and more growth to manage ahead of us. There is also pressure that comes from the sheer enormity of the opportunity in front of us. That’s a good thing. It was Billie Jean King that said “pressure is a privilege” and we’re privileged to be solving problems that bring the highest quality consumers, world’s top marketers and premium content publishers together in a way that delivers relevance and value to each simultaneously.

Should old acquaintance be forgot and never thought upon? Maybe. But if you did not make the acquaintance of Yieldbot in 2012 and you are a Search Marketer or Premium Publisher we hope you do in 2013. In the meantime, Happy New Year!

http://yieldbot.com/blog/an-amazing-search-insider-summit-with-one-thing-missing An Amazing Search Insider Summit with One Thing Missing 2012-12-19T00:00:00+00:00 2013-10-04T18:14:48+00:00 http://yieldbot.com/ I have recently returned from arguably the most in depth, interesting and well attended event involving the search industry Mediapost’s Search Insider Summit. After spending the past nine months attending a number of other events and conferences and indoctrinating myself into this incredible community on behalf of Yieldbot, SIS clearly stood out.The content of the conference encompassed a number of the past, current and future trends of a business that is still incredibly young despite its size both in sheer economics as well as the volume of businesses both big and small that are participating in it.Due to that dynamic combination of youth and size there are many exciting new things emerging as bright entrepreneurs with new ideas and the existing industry behemoths continue to innovate and bring new methodologies, technologies and ideas to the table.

So first a brief outline of a couple of the new opportunities discussed that were most interesting and then a dive into what was so obviously missing from the dialogue.

Creative – if there was one direction and point on which it seemed almost all of these experts agreed it was that new creative formats, creative technologies, creative optimization methodologies will continue to be at the forefront of the industry’s innovation.As GYB (Google, Yahoo, Bing) and the other “traditional” platforms continue to experiment and innovate with SERPs that are more dynamic and visually pleasing for the dual purposes of better engagement and the ability to attract more dollars from brands with higher levels of concern about image, there are interesting companies well represented that are doing cool stuff to leverage this need.One that comes to mind is Datapop a real technology innovator in this space.Then there is the simple blocking and tackling of better text copy optimization being tackled by folks like BoostCTR less a technology and more a time saving method for the drudgery of copy testing.

New Platforms – the other area where much of the most interesting discussions took place and where the opportunities for industry growth seem most fertile were in new platforms.Everything from increasing growth of mobile and tablet search and its different nuances from desktop to the coming of more vertical search encompassed by Yelp, Amazon and others (surprisingly none of which were represented at SIS with the exception of the very cool Intent Media but were much talked about.)Even really cool (but maybe a little scary from a privacy perspective) technologies like visual search from Xbox were hot topics.These new platforms create new markets and new audiences on top of which the best practices of the search industry are being built and their rapid growth is representative of their promise

And now the missing…

It was never clearer to me than at this great conference that content publishers (not to be confused with eCommerce publishers) are such a glaring afterthought to the leading innovators of the most successful and impactful part of the digital advertising industry… the search community.There was not a content publisher in the room nor was there a mention of one on the stage (except briefly by me).Yet, I bet that if asked in a vacuum (and I did a bit of asking) that most of this talented group knows that premium content publishers are hardcore SEO buffs, often times buy Paid Search and are sitting on a treasure trove of first party data. When properly harvested (as we do at Yieldbot) this data can illuminate the “search-like” behaviors of web visitors in their sessions. Selling these visitors’ premium publisher intent in the currency of keywords to the very marketers that makeup the search ecosystem (that was so well represented at SIS) represents an enormous opportunity for market expansion. Not just for those publishers to access search budgets but also for search marketers to find new ways pin pointing the user that they want in real time as they are expressing interest in a specific category.Utilization of this real time data can (and does) yield in some cases results even better than traditional search itself all over the marketing funnel (from branding to conversion).

We heard and talked about the utilization of third party search data and third party site visitation in marketers’ and eCommerce platforms as a new data set for the search marketplace to leverage its methodologies in buying performance media.There is no question that there is a place for that and some have done quite well.But nowhere in this paradigm is the use of FIRST PARTY DATA in REAL TIME within publisher content to make ad decisions on the same pubs from which that data is harvested being mentioned.In this paradigm the publishers can participate at the very least on par in the search (and performance) digital marketplace like all these old and new search platforms that marketers know and want more of.

So in the vain of a good search creative call to action we say: Come talk to us, content publishers!!!!!!And next Search Insider Summit join the conversation and participate in the marketplace that our friend Murthy from Adchemy proclaimed was (to paraphrase) the dawn of a new era in search the most successful (digital) advertising platform ever.You can and should be a major voice in that room.



http://yieldbot.com/blog/data-as-a-currency Data as a Currency 2012-10-08T00:00:00+00:00 2013-10-04T18:18:53+00:00 http://yieldbot.com/ This past week was Advertising Week in New York and Yieldbot did a session with our friends @bitly during the AdMonsters OPS event titled “Data as a Currency.” The main portion of our presentation was the first public demonstration of Yieldbot’s publisher analytics product. Prior to that we led off with a brief overview of how we create currency with our data and by currency we mean real dollars to publishers and advertisers on our platform. Below is the presentation we gave. If you would like to learn more please email info@yieldbot.com

Data as Currency - OPS from yieldbot
http://yieldbot.com/blog/driving-traffic-the-publisher-panacea Driving Traffic – The Publisher Panacea 2012-09-27T00:00:00+00:00 2013-10-04T18:21:04+00:00 http://yieldbot.com/ The dream of web banners and selling impressions to large brand budgets is over. The value of audience data has surpassed the value of the ad impression. With this backdrop the future for publishers is this simple. They will get one more chance to enter the click based economy on their own terms (meaning owning their media and data) or they will lose control of their own business and become a media tool of Google and Facebook.

The last few years of rapid change in the ad tech world away from ad networks and towards ad exchanges has been a confusing one for premium publishers. First, they turned to the idea of “Private Exchanges” places where advertisers could come with their data and buy directly on the publisher inventory. The reality is there is no demand from advertisers because impressions are cheaper elsewhere. 18 months after being all the rage nobody talks about “Private Exchanges” anymore.

The new shiny object for Publishers is now Data Management Platforms or DMPs. While pubs seem to like the content optimization qualities of these platforms there are real issues using DMPs effectively with their media. DMPs add complexity and publishers are not technologists or marketers. Most important DMPs do not solve the underlying problems for publishers. Audience data is becoming commoditized and the value of their media on an impression basis continues to sink like a stone.

While Publishers have been fumbling the simple Click Economy has grown to the neighborhood of $40-50 Billon in annual revenue. This advertising economy includes Search, Contextual, Email, Affiliate. Everything that drives traffic e.g. clicks to marketers. Its newest entrants are Facebook, Twitter and Retargeting companies like Criteo – all of which are experiencing massive growth. In this economy two things stand out; the ad impressions are free and the performance-based business model drives value higher to those with the best data intelligence.

The businesses that were built to sell impressions to brands - Yahoo, AOL, Microsoft, NYT.com - have been passed by these businesses that drive traffic. Even choruses of “the click is dead” from fearful self-interested impression supporters cannot stop the basic fact that the web has always been monetized one way or another through traffic arbitrage and always will be. To she who sends the most valuable traffic goes the spoils.

The inflection point is now.

What we’re seeing from Yahoo is representative of the change that Publishers must make in order to survive. Bringing in one of the sharpest minds on Search as CEO to help save the struggling web banner company should be a beacon to all publishers. Your business is a utility. You deliver valuable content. That includes advertising. The value of that advertising is based on the quality of your audience to the advertiser as measured by a performance metric. Your ability to increase the value is based on the relevance of the message.

If that sounds a lot like Search it should. Those are the core tenants of that marketplace. A market that started later but has grown roughly twice the size of web banners. Publishers are paying dearly for missing the boat on that.

Think how much revenue the New York Times could be making had instead of web banners its digital revenue focus was to deliver traffic and conversions to sites like Expedia, Amazon, eBay, Bankrate, Home Depot, and on and on and on. It certainly would have been larger than IAC the company that just acquired About.com from them. IAC market value is 4x the New York Times. To give you another perspective the Times will do roughly $300 million this year in digital revenues. IAC will do over $2 billion.

Quietly taking advantage of publisher fumbling is Google. Google has expanded its grip on not only the publisher media through its Ad Exchange and AdSense but also their data through Google Analytics and its DFP ad server. Most publishers will openly admit that Google knows how much money they make and more about their audience then they do. When you don’t know who is coming into your store and you don’t know how much they are paying or what they are leaving with any business will die. That is exactly what is happening. Facebook is soon to take the same approach as it builds out its ad network on the back of all their javascript publishers have installed the past few years on their sites.

The fact is that publishers are already driving valuable traffic they are just not getting paid for it. A few years ago the New York Times said that 25% of its traffic leaves and goes to Google. Doing some back of the napkin math that’s about 20 million exits a month to Google and at estimated Google’s RPM rate of $80 that’s about $20 million a year in ad revenues the Times delivers to Google from intent the NYTimes themselves has generated. There needs to be an endcap.

Better yet, there needs to be a publisher controlled marketplace where the true value of traffic from premium publishers is understood, captured and passed on to marketers. Where the data is transparent and the optimizations generate mutual benefits to the publisher, marketer and site visitor. This would create a new channel. The opportunity is massive and real. Some premium publishers are already doing it and experiencing incredible revenue growth. Let us know if you want to be one of them.

http://yieldbot.com/blog/the-databases-of-yieldbot The Databases of Yieldbot 2012-09-21T00:00:00+00:00 2013-10-04T18:52:19+00:00 http://yieldbot.com/ Last night we held our first Yieldbot Tech Talks meetup, where we went into detail about the technologies that we use for data handling and storage in various parts of our platform.

MongoDB excelled as an initial choice of a flexible database that allowed for just getting stuff done and focusing on the million other things that need to be thought about when iterating around an initial idea. As our needs evolved in various areas of data we then transitioned to the best technologies and approaches for those specific cases. We now use (in addition to MongoDB) redis, ElephantDB, HBase, and our own DB technology called HeroDB that we use for configuration data.

Here are the slides:

Yieldbot Tech Talk, Sept 20, 2012 from yieldbot

Like how we roll? We’re hiring! Find out more.

http://yieldbot.com/blog/doing-production-deploys-with-git Doing Production Deploys with git 2012-09-06T19:04:00+00:00 2013-10-04T19:07:23+00:00 http://yieldbot.com/ At Yieldbot we follow a schedule of daily (or more) releases per day, coordinated by Chef with git smack in the middle.

To start, like many we’re happy users of GitHub for coordinating our code repositories across our development team, and our use is pretty straightforward for that. Developers change code, pull down everyone else’s changes, merge, push up their changes, etc. It’s like a “hub” for our “git” usage (whoah!).

git != GitHub

It’s important to remember though that GitHub, while making a certain usage pattern convenient, is not synonymous with “git”; a fact that we make use of heavily.

Using git’s ability to set up multiple remotes or have default remotes not up on GitHub provides for some powerful options in how code can be moved around and managed. Something we take advantage of every day both in production (which we’ll talk about in this post) and in our development environments (covered in a later blog post).

git + Chef == a tasty treat

For this post the important thing to understand about Chef is that we have a central Chef server that serves as the coordination point for deployment. The servers throughout the platform run chef-client to find out from the Chef server what they should be running and Chef and its cookbooks make it so.

To control what version of our code is deployed we have a git repo on the Chef server cloned from the repo up on GitHub. The core of our deploy procedure is to pull the latest on “master” branch from GitHub into the repo on Chef server, tag it with the name of the release, and also tag it with a special tag that moves called “production” (ie. “git tag -f production”).

Using a Chef recipe, each of the servers in our platform are also set up with the git repo but with the repo on the Chef server as their origin. During a deploy these repos fetch from the Chef server remote and sync to the “production” tag.

Once the repo on Chef server set up with the “production” tag in place where desired for the deploy, the actual deploy is triggered by poking the servers in the platform to run chef-client.

Note in the diagram above that in addition to the usual usage of git between developers Erik and Sean, the Chef server is also set up with GitHub as the remote. Below the Chef server are two examples of servers (UI and DB) that are set up with the Chef server as their remote. Note that UI and DB really only pull from the Chef server repo. The Chef server in turn mostly pulls from GitHub, although it does push back to GitHub the state of tags as manipulated during the deploy.


You can easily change the location of the “production” tag on the Chef server repo and then resync the servers in the platform to deploy any level of code desired.

During a deploy, our servers are not dependent on contacting GitHub. To start a deploy we sync the Chef server’s repo with GitHub, but we could just as easily push to the Chef server’s repo from somewhere else, or even add a different remote to the Chef server’s repo to pull from another location instead.

If any changes are made to a system to make it deviate from what was last deployed, a “git diff” in the repo on that server can expose exactly what those changes were made on that specific server (if you’ve worked with deployed code before you know this concept is huge).

Next Up

In a follow-on blog post we’ll cover some more advanced uses of multiple git remotes that we leverage in our development environments.

http://yieldbot.com/blog/what-if-first-party-data-told-you-everything-you-thought-about-online-advertising-is-wrong What If (First Party Data Told You) Everything You Thought About Online Advertising is Wrong 2012-08-20T00:00:00+00:00 2013-10-04T19:10:59+00:00 http://yieldbot.com/ Now that we have a few months and over a billion ad match decisions on referrer data at Yieldbot we’re seeing some very interesting performance results. Segmenting performance metrics by referrer source has always been 101 for e-commerce but amazingly has rarely if ever been capably measured/optimized for web publisher monetization. However, referrers are very important to Yieldbot because referrer data is a primary data source in our intent decisioning algos.

We look at referrers in two ways at Yieldbot. We look at session referrer – meaning the referring source of the visitor session on publisher – both domain and URL level. Also we look at the match referrer – meaning the referring source – both domain and URL level - to the actual ad match impression.

Session referrers are always external domains. They can be bookmarks, emails, search, social and every other way a person can get to a website.

Match referrers can be both external (for every ad match on a site landing) and internal URLs (for every ad match during a site session).

What we’ll share about CTR of internal match referrers is that they are the highest performing. CTR in-session — user page view 2 and above — as a whole performs 2X better than ad match impressions on landing. This counters much of what you hear in ad tech circles and more closely mirrors Search where CTR increases with query refinement.

It’s also a great validation of our thesis that the serialization of session data can operate ad match rules with higher performance than a single page level rule. Again, this looks like e-Commerce to a great extent because click path data is delivering the rule and gaining decision intelligence with each additional user click in session.

What we’ll share about CTR performance on external match referrer data for Top 10 ranking for the past 3 months. Everyone loves Top 10 lists, right?

Top 10: Yieldbot CTR by External Match Referrer











Much of what we see here runs counter to the prevailing wisdom about online advertising.

The idea that Publisher Direct traffic does not click on ads. The idea that Facebook traffic has no commercial value. The idea that Search traffic has the most obvious intent. The idea that Pinterest is the future of the commercial web. So far, our data points that all those assumptions need to be called into question.

From day 1 at Yieldbot we’ve been about breaking new ground by using first party publisher analytics to understand and redefine the relationship of visitors and monetization. We’re incredibly excited that these early results are so fascinating and we’re digging deeper. It’s possible everything thought known about online advertising was wrong. It’s likely online advertising has never been done right for publishers and advertisers at the site level. The time’s they are a changing.

http://yieldbot.com/blog/the-missing-why-for-publishers The Missing “Why” for Publishers 2012-08-05T00:00:00+00:00 2013-10-04T19:14:17+00:00 http://yieldbot.com/ During my time at The Wall Street Journal Digital Network, one problem we did not have, at least in aggregate, was traffic. The Network was one of the largest players in the business/finance category in both NNR & comScore, growing year over year. And the page views per visit for subscribers, not to mention engagement, were some of the highest…if not THE highest…of any other business property. Those readers who paid for content got their money’s worth!

Sideways traffic - that driven from search, portals, aggregators, et al. - was a different story. While it helped drive unique user growth, the traffic was mostly in and out. After landing on an article, we were lucky to see a user turn an additional page before leaving the site. We tried several methods to address this: we employed technologies to surface related or recommended content based on the content of the page, general subject matter, even social sharing. We made assumptions based on referral source and surfaced headlines accordingly – those who came from the Yahoo homepage or Facebook must be interested in general, non-business articles. If they came from a Y! Finance quote page they must be interested in articles about the company ticker they were looking up. And if they came from Drudge they must be looking for Opinion Page content. But despite these efforts, we never managed to significantly move the needle on those page views per visitor numbers.

That is because we never really understood WHY a user came to a site.

These were simply inferences, and though we tested surfacing content based on those inferences, we never managed to significantly move the page view per users numbers. Again, because we never understood “why” the user was coming to our site.

Yieldbot’s core value proposition is just that – looking at a variety of queues that inform what drives a user into a site, all in-session, and distilling that intent into keywords that can then be used as a real-time match rule to improve monetization. We drive performance for advertisers and we drive revenue for publishers. But we also generate understanding.

When we distill intent we see how many pages a users turns when they come to the site looking for, say, “ETF.” We see the percentage of time users who come into a site with the intent of “ETF” return to the site. And we see how often that intent causes users to leave or “bounce”. Beyond immediate monetization improvements due to better ad matching this data be used by a publisher to understand traffic trends and make programming and promotion decisions.

Let’s say site XYZ.com gets 10M UU’s per month. Any good SEO person will tell you that the sweet spot for search traffic is about 20%. Any less and you’re not optimized. Much more and you’re one Google algorithm tweak away from falling off a traffic cliff. The other 8M users are coming directly or thru other “sideway” means such as partner sites, portals, social media, etc. If that accounts for 20% of total traffic, you’re doing well – give those biz & audience development folks raises! But more than likely that traffic is driving 2 or less PV’s/visit (4M PVs/month). By understanding why those users come to your site, surfacing content that is of interest to them and getting them to turn an additional page, that’s now 6M PVs per month or an increase of 30%. If you’re a subscription-based site or have a newsletter product, think of applying that understanding to help drive subs!

At Yieldbot, we’re just getting started at generating value for our publisher partners. We distill intent into keywords and match them to ads in real-time, at the exact time a reader is most open to them. And the same can be done for content. After all, for publishers it is successful marriage of page content, visitor intent and the relevant advertisments that improve your business. Any of these three pieces working in a silo lower your value.

More to come!

http://yieldbot.com/blog/proving-display-can-perform-better-than-search Proving Display Can Perform Better than Search 2012-08-04T19:16:00+00:00 2013-10-04T19:20:43+00:00 http://yieldbot.com/

What more can we say here. The numbers speak for themselves.

Yieldbot Conversion Rate: 35.59% G2

Yieldbot Conversion Rate: 26.04% G3

Google CPC Conversion Rate: 29.23% G2

Google CPC Conversion Rate: 20.65% G3

Google Organic Conversion Rate: 7.81% G2

Google Organic Conversion Rate: 6.10% G3

That’s a 26% higher conversion rate vs. Google Paid Search for Goal 3 and 326% higher conversion rate than Google Organic Search.

This is the first two weeks of data for the campaign. As hard as it is to believe these numbers are before any optimization has taken place.

These are IAB standard 300 x 250 and 728 x 90 units. But the realtime decisioning on the ad call is anything but standard. In fact, we believe these results bear out that Yieldbot has redefined relevance in display and created a new advertising channel in the process. A channel unlike any other.

We’ve worked for over two years to build Yieldbot on the thesis that first party publisher data to harvest intent and realtime decisioning to match against it would deliver performance that would rival Search. Even we didn’t think one of our early campaigns would out perform it. It’s great news for marketers and publishers. For us, the best news is that we’re just getting started.

http://yieldbot.com/blog/redefining-premium-content-towards-cpm-zero Redefining Premium Content Towards CPM Zero 2012-08-02T00:00:00+00:00 2013-10-04T19:19:55+00:00 http://yieldbot.com/ Say Goodbye to Hollywood

When Ari Emanuel, co-CEO of talent agency William Morris Endeavor said that Northern California is just pipes and needs Premium Content it’s clear that he just doesn’t get it. There is no such thing as premium content. There are only two things premium on a mass scale anymore - distribution and devices.

Massive media fragmentation fueled by the Internet has forever redefined what is ‘premium’ content. The democratization of media – the ability for a critical mass of people (now virtually the entire world) to create, distribute and find content killed the old model of premium. Modern Family is a good TV show but when I can more easily stream a concert like this through my HDTV at any moment I want I’m pretty sure “premium content” has been redefined.

Since the web is the root cause of death for premium content it makes sense that the effect is no better exemplified than in web publishing. Since the advent display advertising publishers have sought to categorize and valuate their content in ways that were familiar to traditional media buyers. No media channel has promoted the idea of or value for premium content more than digital. Thus, print media’s inside front and back covers became the homepages and category pages on portals. Like print, these were areas where the most eyeballs could be reached.

But a funny thing happened in digital behavior. People skipped over the front inside cover and went right to content that was relevant to them. Search’s ability to fracture content hierarchies and deliver relevance not only became the most loved and valuable application of the web, it destroyed the idea of premium content all together. In reality, premium never really existed in a user-controlled medium because it was never based on anything that had to do with what the user wanted. It was based on the traditional ad metric of “reach” when in this medium, decisions about what is premium are determined by on-demand ability and relevance.

Sinking of the Britannica

The beauty of this medium is in the measurement of it. Validation for the drowning of premium beyond the fact that Wikipedia destroyed Encyclopedia Britannica rests in the performance of digital media. A funny thing happened as advertising performance became more measured. Advertisers discovered premium didn’t nearly matter as much as they thought. There were better ways to drive performance that yielded better and more measureable results. The ability to match messaging to peopleon-request and in a relevant way was more valuable in this medium than some content provider idea of what was “premium.” In this medium the public not the publisher determines what is premium.

As realtime rules based matching technology continues to improve performance advertising and marketing itself continues to grow at the expense of premium advertising. Today, despite those trying to hold on to the past, premium is little more than an exercise in brand borrowing and little else. Despite the best efforts of the IAB to bring Brand advertising to Digital it has fallen as a percentage of ad spend for five straight years. In the world we live in today Mr. Emanuel’s $9 billion dollar upfront for network TV primetime advertising is $1.5 billion less in ad revenue than Google made last quarter.

What this all means for the future of digital media (and thus all media eventually) is that it’s headed to “CPM Zero.” Look around - all the digital advertising powers - Google, Facebook, Twitter, Amazon - are selling based one thing. Performance. They are not selling on the premium sales mechanism of CPM. When ‘CPM Zero’ happens, and it will, these forces pushing the digital ad industry forward win. They own the customer funnel and they will own the future of marketing and advertising. It begs one big question. Where does this leave content creators and publishers?

Don’t Fear the Reaper

Publishers will never be able to put the CPM sales genie back in the bottle. CMOs and advertisers are already finding out that they are paying too much for premium. Go ask GM what they think. What publishers are finding out is that they are no longer selling their media; it’s being bought. Purchased from a marketplace with infinite inventory in a wild west of data. Therein lies the publisher’s ace in the hole and the strategies and tactics digital publishers (and eventually broadcasters) can use to combat the death of premium.

Like Search, Publishers need to have two crucial components to their marketplaces. They need the tension of scarcity in the marketplace.That will drive up demand and force advertisers to spend the time working on improving their performance. This was the cherry on the sundae for Google as a $1billion industry - Conversion Testing and Content Targetinggrew out of nowhere to support spends in Search. Most every dollar saved with optimization went to drive more volume – or back to Google. They need a unique currency for the marketplace. Keywords were a completely new way to buy media. Nothing has ever worked better. Facebook is selling Actions with OpenGraph. Ultimately advertisers are buying customers not keywords or actions but there is a unique window of opportunity for publishers at this moment in time to create something new and uniquely people, not page focused.

The tactics used to fuel these strategies all rely on one natural resource - data. Publishers have diamonds and gold in beneath the surface of their properties. Mining these data nuggets and using them to improve the performance of their media is the sole hope publishers have competing in the world of “CPM Zero.” Only publishers can uniquely wrap their data with their media and drive performance in a manner unique to the marketplace. That’s what Google does. That’s what Facebook does. That’s what Twitter does. The scarcity mentioned above is created because the realtime understanding of site visitor interest and intent is only derived using first party data as rules and integration with the publisher ad server for delivery. So pubs are really left with one choice – take control of their data and use it for their benefit creating an understanding of WHY people are buying their media and how it performs. Or let Google, Facebook, third-party et al come in and grab their data and know nothing about why it’s being bought and how much it’s being sold.

The ability to match messaging to people on-request and in a relevant way is within the publisher’s domain. It is the most premium form of advertising currency ever created and will deliver an order of magnitude more value. It will fuel the 20% YoY growth of digital advertising and marketing for the next 15 years. Who captures the majority of that value, the advertiser or the publisher, is the only question remaining.

http://yieldbot.com/blog/measure-twice-cut-over-once Measure Twice, Cut (over) Once 2012-06-05T00:00:00+00:00 2013-10-04T19:30:19+00:00 http://yieldbot.com/

This past weekend we did a deploy at Yieldbot unlike any other we’ve done before.

At its completion we had:

  • upgraded from using Python 2.6 to 2.7.3;
  • reorganized how our realtime matching index is distributed across systems;
  • split off monitoring resources to separate servers;
  • moved out git repos that were submodules to be sibling repos;
  • changed servers to deploy code from Chef server instead of directly from github;
  • completely transitioned to a new set of servers;
  • suffered no service interruption to our customers.

The last two points deserve some emphasis. At the end of the deploy, every single instance in production was new - including the Chef server itself. Everything in production was replaced, across our multiple geographic regions.

Like many outfits, we do several deploys a week, sometimes several a day. Having no service disruption is always critical, but in most deploys is also usually fairly straightforward. This one was big.

The procedures we had in place for carrying it out were robust enough though that we didn’t even internally notify anyone from the business side of the company when the transition was happening. The only notification was getting sign-off from Jonathan (CEO) on Friday morning that the cut-over would probably take place soon. In fact, we didn’t notify anyone *after* the transition took place either, unless you count this tweet:

I suppose we cheated a little by doing it late on a Saturday night though. :-)


We have a few kinds of data that we had to consider. Realtime streaming, analytics results, and configuration data.

Realtime Streaming and Archiving

For archiving of realtime stats, the challenge was going to be the window of time that old systems were still receiving requests while new servers were starting to take their place. In addition to to zero customer impact we demanded zero data loss.

This was solved mostly by preparation. By having the archiving include the names of the source donig the archiving, the old and new servers could both archive data to teh same place without overwriting each other.

Analytics Results

We currently have a number of MongoDB servers that hold the results of our analytics processes, and store the massive amounts of data backing the UI and the calculation of our realtime matching index.

Transitioning this mostly fell on MongoDB master-slave capabilities. We brought up the new instances as slave instances pointing to the old instances as their master. When it was time to go live on the new servers, a re-sync with chef reverted them to acting as masters.

There was a little bump here where an old collection ran into a problem in the replication and was replicating to be much larger in the new instance than in the large instance. Luckily it was an older collection that was no longer needed, and dropping it altogether on the old instance got us past that.

Configuration Data

Transitioning the config data was made easy by the fact that it uses a database technology that we created here at Yieldbot called HeroDB. (which we’ll much more to say about it in the future).

The beneficial properties of this database in this case is that it is portable, and can be easily reconciled against a secondary active version. So we copied these databases over and had peace of mind that we’d reconcile later as necessary with ease.


We tested the transition in a couple different ways.

As we talked about in an earlier blog post, we use individual AWS accounts for developers with Chef config analogous to the production environment. In this case we were able to bring up clusters in test environments along the way before even trying to bring up the replacement clusters in production.

We also have test mechanisms in place already to test proper functioning of data collection, ad serving, real time event processing, and data archiving. These test mechanisms can be used in individual developer environments, test environments, and production. These proved invaluable in validating proper functioning of the new production clusters as we made the transition.

The Big Switch - DNS

DNS was the big switch to flip and the servers go from “ready” to “live”. To be conservative we placed one of our new edge servers (which would serve a fraction of the real production traffic in a single geographic region) into the DNS pool and verified everything worked as expected.

Once verified, we put the rest of the new edge servers across all geographic regions into the DNS pools and removed all of the old edge servers from the DNS pools.

The switch had been flipped.

There Were Bumps (but no bruises)

There were bumps along the way. Just none that got in our way. Testing as we went, we were confident that functionality was working properly and could quickly debug anything unexpected. As any fine craftsman knows, you cut to spec as precisely as possible, and there’s always finish work to get the fit perfect.

Chef, FTW!

The star of the show, other than the team at Yieldbot that planned, coded, and executed the transition, was Chef.

We continue to be extremely pleased with the capabilities of Chef and the way that we are making use of it. No doubt there are places where it is tricky to get what we want. And of course there’s a learning curve in understand how the roles, cookbooks, and recipes all work together, but when it all snaps into place, it’s like devops magic.

http://yieldbot.com/blog/the-future-of-marketing-and-advertising-belongs-to-software The Future of Marketing and Advertising Belongs to Software 2012-04-15T00:00:00+00:00 2013-10-04T19:40:46+00:00 http://yieldbot.com/ Since day 1 I’ve described what we are building as ”a web technology for marketing and advertising - not an advertising and marketing technology for the web.” Of course it’s a play on words but the purpose is to more clearly define our product. It is software. As we begin to open source some of the tools we have created we are reminded everyday that Yieldbot is a software company. That’s a good thing.

I ventured into display advertising because it had a weak technology stack supporting a pre-digital business model. The de-facto intelligence in display advertising is a 1-10 ranking system – the ad server waterfall - with a single unit of measurement – the impression - that was in fact always different. From a software perspective, display advertising is a massive of opportunity.

My web software experience was first in e-commerce where I watched amazing software be created by the likes of ATG and Endeca. Then in Search where I watched Google and Yahoo employ thousands strong armies of engineers. Most recently at Offermatica/Adobe Test&Target where the software serves billions of highly optimized and dynamic web experiences every week. Software. Software. Software.

Michael Walrath, Founder of RightMedia said recently:

“In order to build a truly disruptive and highly valuable company delivering enterprise software for digital advertising, the new solution has to be an order of magnitude better than the existing systems. It is not enough to deliver an incrementally better version of the existing systems. If there is to be a resurgent disruptor in the advertising technology space it has to change the game. It must attack the white space…”

What I love about this quote is that it frames the market opportunity as enterprise software and software that must do something where nothing has been done before.

Yieldbot attacks this challenge everyday. Massive batch and realtime and predictive analytics. Machine-learning and automated intelligence. Differentiated and highly dynamic units of measurement. The visualization of data and the ability to make it actionable. A white space where the focus is not on buying or selling media – but on how well media and people can be matched in realtime.

Matching differently. That is our disruption.

The enterprise software I admired and mentioned above all looked to solve the matching problem. Display advertising’s main problem as I wrote 2 years ago is the only place where the “order of magnitude better than existing systems” can be achieved. This is because new, more intelligent methods of matching can fundamentally revalue the media around something besides impressions and cookies. We believe that something is realtime visit intent.

It’s an amazing time to build software. There is more technology to get more understanding and create more intelligence at a lower cost than ever before. The advances in analytics, databases and the languages create an order of magnitude more power. I couldn’t think of anything more exciting to be working on in this day an age than software or a better group of people to be doing it with. The future of marketing and advertising belongs to software.

http://yieldbot.com/blog/relevant-news Relevant News 2012-04-07T14:15:00+00:00 2013-10-07T14:37:52+00:00 http://yieldbot.com/

The enemies of advertising are the enemies of freedom.” - David Ogilvy

Exciting news for Yieldbot and lovers of relevance today as we’re announcing a new Series A round of funding led by New Atlantic Ventures(NAV) and RRE Ventures. Seed Investors kbs+p Ventures, Common Angels and Neu Venture Capital also participated again in this round.

The funny thing about raising money in media technology is that very few VC’s actually understand it and even fewer have vision for where it’s headed. We’re fortunate to bring together a team of investors that live and breathe this stuff and proudly represent New York’s media leadership and Boston’s technology leadership in a way that mirrors Yieldbot’s own corporate footprint.

The funds will be used to continue development and bring to market our Yieldbot or Publishers (YFP) realtime intent-graph™ technology (launched July 2011) and our Yieldbot for Advertisers (YFA) realtime intent marketplace that launched in alpha this month. Together YFP and YFA create a valuable media channel of realtime consumer intent that delivers an order of magnitude more relevant ad matching and performance.

From day one, two years ago, we wanted to bridge the largest digital inventory source, Web Publishers, with the largest and best digital ad spends, Search advertisers, in a way the brings a more relevant web experience to people. We’ve progressed an extremely long way with a small team and relatively little funding so far. Today we’re putting dry powder in our muskets and continuing to battle. The enemies of freedom are only so because they know not relevance.

http://yieldbot.com/blog/introducing-pascalog Introducing Pascalog 2012-04-04T00:00:00+00:00 2013-10-04T20:58:58+00:00 http://yieldbot.com/

(Shared under Creative Commons Attribution-ShareAlike license: Flikr userTimitrius)

Today, the dev team at Yieldbot is excited to announce plans to open source one of our prized internally developed technologies: Pascalog.

Technology often evolves more in cycles than linearly, with past patterns showing through as more recent innovations are made.

For a while we were doing all of our analytics in Cacsalog, and things were going great. As a Clojure DSL written on top of the Hadoop Cascading API, Cascalog is a brilliant technology for efficiently processing large data sets with very tersely written code.

In fact, we even wrote about those experiences here and here.

But we found ourselves writing things like this:

(<- [!pub !country !region !city !kw !ref !url ?s]<br> (rv-sq !pub !country !region !city !kw !ref !url ?pv-id ?c)<br> (c/sum ?c :> ?s))

We thought that there had to be a better way. When we realized that Clojure being a Lisp has its foundations in the 1960’s we immediately realized the next logical step would be an upgrade into the 1970’s.

Wouldn’t we want to write something more like:

program HelloWorld;<br> begin<br> writeln('Hello, World!');<br> end.

And we immediately set upon bringing the best of software development of the 1970’s, Pascal, into the Big Data world of the 2010’s. Pascalog was born. (who couldn’t love a language that wants you to end your programs with a “.”?)

This also fit well with internal discussions we were having at the time lamenting the complexity of managing a Hadoop cluster and the efficiencies that might be gained by combining all the functionality back into one processing environment on a mainframe. That dream is on hold until we find a suitable hardware vendor, but there was certainly no reason to hold Pascalog development back for that.

Data is a readln() Away

In Pascalog we’ve done the heavy lifting. By adapting readln() to be bound to a Cascading Tap, you read data in the way you’ve done since your Turbo Pascal days.

It didn’t take us long to realize that you’d want to save the results of your calculations somewhere, so in a followon version we added the mapping of writeln() to an output Cascading Tap.

Configuring your input and output taps and mapping them to readln() and writeln() is as easy as configuring an INI file.

An upcoming version which should be available shortly will also allow the readln() of one Pascalog program to be mapped to the writeln() of an upstream Pascalog program, allowing you to daisychain your Pascalog programs.

Why Pascal?

We make it sound above like we jumped onto the Pascal bandwagon right away, but in truth we considered several alternatives from the 1970’s.

Of particular interest was the ability to write nested procedures. We’ve grown accustomed to this from our Python development on other parts of the platform and this allows us to migrate between the two worlds more seemlessly (compared to, say, Fortran).

The availability of a goto statement is also a great feature to bail you out if you start getting a little too lost in your control flow. This has become a lost art.

We did consider C, but couldn’t get over the hump of having it named “Clog”.

The Future

We’re furiously looking for a Pascal Meetup group where we can make a live presentation. If you know of one, please let us know!

We have a long list of features in mind to build, but we also want to hear back from the community.

Visit github.com/yieldbot/pascalog to get started! We’re looking forward to the pull requests. If you have live questions there’s usually one of us hanging out on CompuServe under user ID [73217, 55].

http://yieldbot.com/blog/development-as-ops-training Development as Ops Training 2012-04-03T21:08:00+00:00 2013-10-04T21:09:26+00:00 http://yieldbot.com/

It’s become failrly well understood that “Dev” and “Ops” are no longer separate skill sets and are combined into a role called “DevOps”. This role has become one of the hottest and hardest to fill.

At Yieldbot we’ve taken a pretty hardcore approach to putting together Dev and Ops into DevOps that serves us well and should be a great repeatable pattern.

Chef + AWS Consolidated Billing

The underlying philosophy we have is that the development environment should match as closely as possible the production environment. When you’re building an analytics and ad serving product with a worldwide distributed footprint that can be a challenge.

Our first building block is the use of Chef (and on top of that ClusterChef, which is now Ironfan). Using these tools we’ve fully defined each role of the servers in a given region (by defining as a cluster), and all of the services that they run. We coordinate deploys through our Chef server with knife commands, and Chef controls everything from the OS packages that get installed, to the configuration of application settings, to the configuration of DNS names, etc.

The second building block is that every developer at Yieldbot gets their own AWS account as a sandbox. We use the AWS “Consolidated Billing” feature to bring the billing all under our production account. This lets us see a breakdown of everybody’s charges and means we get one single bill to pay.

The last detail is that every developer uses a unique suffix that is used to make resource references unique when global uniqueness is necessary. This is mostly used for resolving S3 bucket names. For any S3 bucket we have in production such as “foo.bar”, the developer will have an equivalent bucket named “foo.bar.<developer>”.

Doing Two Things at Once

With all of that as the status quo, developers are almost always doing two things: developing/testing (the Dev), and learning/practicing how the platform is managed in production (the Ops).

Everyone has their own Chef server, which is interacted with the same way that the production Chef server is. As they deploy the code they are working on into their own working environment, they’re learning/doing exactly what they would do in production.

All of this was put in place over the last year while the developement team was static, during which time we switched from Puppet to Chef.

But the power of this approach really hit home recently as we’ve started to add more people to the team. The first thing a new hire does is go through our process of getting their development environment set up. There’s still bumps along the way, and they get problems and take part in ironing them out. The great thing about this approach though is that each bump is a lesson about how the production environment works and a lesson in problem solving in that environment.

The Differences

Having said all that, there are a couple differences that we’ve put in place between consciously development and production, with the driving force being cost.

The instances are generally sized smaller, since the scale needed for production is much greater. Amazon’s recent addition for support of 64-bit on the m1.small was a great help.

We use several databases (a mix of MongoDB, Redis, and an internally developed DB tech) that are distributed on different machines in production that we collapse together onto a single instance with a special role called “devdb”.


We’ll have to have some future blog posts about how we import subsets of production data into development for testing, and the like.

We also use Chef with ClusterChef/Ironfan for managing the lifecycle of our dynamic Hadoop clusters. Yet another good topic for a post all its own.

Have experience with a similar approach or ideas about how to make it even better? We want to hear about it.

http://yieldbot.com/blog/realtime-kills-everything Realtime Kills Everything 2012-04-02T00:00:00+00:00 2013-10-07T14:12:57+00:00 http://yieldbot.com/ Our first ad campaigns are live and the results are exciting. The campaign ran on a premium publisher in the women’s lifestyle vertical and beat the publisher’s control group on Click- Through-Rate (CTR) by 77% on the 728 x 90 unit and 194% on the 300 x 250. There were over 1M impressions in the campaign served on this domain over a 2-week period. Yieldbot is now serving the entire campaign.

Most exciting to us are some of the individual results:

  • The best performing keyword has a CTR of 1.56%.
  • The best creative unit (a 300 x 250) is getting 1.01%

We are running IAB standard banner units. This is not text. This is not rich media.

According to MediaMind the industry average CTR for the campaign vertical is 0.07%

The most matched keyword intent has a CTR of .43%. It also has a CPC of $5.

That math works out to an eCPM of $21.44. That’s pretty exciting stuff. Even more so when you factor in that this campaign is running in what was unsold inventory.

When I shared the results with one of our Board Members he asked me, what at the time I thought was a simple question. “Why are the results so good?” But then, I actually had to think hard about the answer. I had to boil down a year of beta testing and then another year of building a scalable platform into what deserved to be a simple answer.


Realtime was my one word answer. Never before was every page view of intent for this publisher’s visitors captured in realtime - let alone used to make a call to an ad server at that very moment.

Realtime is different. Realtime kills everything before it. As such, Yieldbot is not building ad technology for the web. We are building web technology for ads. Since nothing is more important for advertising success than timing it makes sense that nothing is more valuable for results than realtime.

Realtime was a big buzzword for a while but the hype has died down. That’s good. In the Hype Cycle we’re now somewhere moving from the “Trough of Disillusionment” to the “Slope of Enlightenment.” It is however this ability of the web to react in realtime that makes the future of the medium so exciting.

Twitter of course is the best representative example. Twitter changed everything about media that came before it. Used to be that breaking the story was the big deal Now, even online news seemed stodgy compared to people giving realtime updates that planes have landed on rivers, people being killed and opining on a live show right along with it.

As technology continues to get better at processing the trillions of inputs from millions of people going about their daily lives - doing everything from riding their car to work, buying a pack of chips, surfing the web – the web will respond in realtime. Because of that it will be relevant. The idea of an ad campaign will seem like owning a 32 volumes set of Encyclopedia Britannica. Everything becomes response because the technology is responsive. Calculations need inputs. The web will be measuring just about everything you do and know the moment you are doing it. Nothing will be sold. Everything will be bought.

It’s that realtime pull that creates these new valuations of the media. That new value of the media is what we have been working to create at Yieldbot. That is why these results are so exciting. Best of all, we’re just getting started. We’ve got a bunch of new campaigns about to get underway and we’re only going to get smarter and more relevant. We’ll continue to keep you posted on how it’s going and if you’re running Yieldbot you’ll know yourself. In realtime.

http://yieldbot.com/blog/working-at-yieldbot Working at Yieldbot 2012-01-31T00:00:00+00:00 2013-10-07T14:29:21+00:00 http://yieldbot.com/ We’re adding more developers to our team and pushing things to the next level. If you like seriously interesting and challenging work in the areas we’re looking for help in, you should be talking to us. You’ll have a single-digit employee number, so you’ll be getting in early and powering us on our way to fulfilling the huge potential we’re sitting on.

What can you expect if you decide to jump in and join us on our mission to make the web experience more relevant? For one thing, no shortage of interesting hard problems to solve, and the latest tools to do it with.

A Great Environment

For our devleopment environment we each have an AWS sandbox that deploys the same code as production, so everyday work is production devops training too, with a Mac for your local dev environment. You’ll have Campfire group chat up all day, and be in the middle of all the important conversations around what we need to do and how we need to do it, from the CEO on down.

The language and tools you use most during the day will depend on what part of the platform you’re focusing on.

A Distributed Realtime Platform

A large part of the core platform is in Python. All of the code around scheduling of tasks and managment of the platform are found here, as well as the key ad serving logic and realtime event processing. You’ll be making use of MongoDB, redis, and ElephantDB. You’ll be solving problems on running the platform distributed across several data centers worldwide. You’ll likely be doing some devops stuff here too, and loving the ease with which Chef lets you get that done.

Bleeding Edge Analytics

If you’re working on our analytics then you are loving the use of Cascalog (a Clojure DSL that runs over the Cascading API on Hadoop). The power-to-lines-of-code ratio here is ridiculous. More than that, you’ll be writing realtime analytics in Storm. That’s not cutting edge, it’s definitely bleeding edge.

Focus on UX

To work on the UI you’re pushing the limits on the latest Javascript UI tools like D3.js and Spine.js. Have you thought about how clean client-side MVC should be done? Spine is it. We’re serious about quality of UX here. If you’re serious about it too, this is where you should be.

An Awesome Team

The team you’ll be joining has been there before. We’ve founded and built successful products, platforms, and companies. We know our industry and what it takes to be successful. And we’re doing it.

The most important thing that keeps us developers here at Yieldbot energized is that we’re building something people want, that’s been clear from the beginning. Our mission to make the web experience more relevant resonates with users, publishers, and advertisers.

If you’re up for the challenge contact us at jobs@yieldbot.com. We have some seriously challenging work you can get started on right away.

http://yieldbot.com/blog/how-yieldbot-defines-and-harvests-publisher-intent How Yieldbot Defines and Harvests Publisher Intent 2012-01-02T00:00:00+00:00 2013-10-07T14:48:18+00:00 http://yieldbot.com/ The first two questions we usually get asked by publishers are:

1)What do you mean by “intent”?

2)How do you capture it?

So I thought it was time to blog in a little more detail about what we do on the publisher side.

The following is what we include in our Yieldbot for Publishers User Guide.

Yieldbot for Publishers uses the word “intent” quite a bit in our User Interface. Webster’s dictionary describes intent as a “purpose” and a “state of mind with which an act is done.” Behavioral researchers have also said intent is the answer to “why.” Much like the user queries Search Engines use to understand intent before serving a page, Yieldbot extracts words and phrases to represent the visitor intent of every page view served on your site.

Since Yieldbot’s proxies for visit intent are keywords and phrases the next logical question is how we derive them.

Is Yieldbot a contextual technology? No. Is Yieldbot a semantic technology? No. Does Yieldbot use third-party intender cookies? Absolutely not!

Yieldbot is built on the collection, analytics, mining and organization of massively parallel referrer data and massively serialized session clickstream data. Our technology parses out the keywords from referring URLs – and after a decade of SEO almost every URL is keyword rich - and then diagnoses intent by crunching the data around the three dimensions of every page-view on the site. 1) What page a visitor came from 2) what page a visitor is about to view and 3) what happens when it is viewed.

Those first two dimensions are great pieces of data but it is coupling them with the third dimension that truly makes Yieldbot special.

We give our keyword data values derived from on-page visitor actions and provide the data to Publishers as an entirely new set of analytics that allow them to see their audience and pages in a new way – the keyword level. Additionally, our Yieldbot for Advertisers platform (launching this quarter) makes these intent analytics actionable by using these values for realtime ad match decisioning and optimization.

For example: Does the same intent bounce from one page and not another? Does the intent drive two pages deeper? Does the intent change when it hits a certain page or session depth? How does it change? These are things Yieldbot works to understand because if relevance were only about words, contextual and semantic technology would be enough. Words are not enough. Actions always speak louder.

All of this is automated and all of this is all done on a publisher-by-publisher level because each publisher has unique content and a unique audience. The result is what we call an Intent Graph™ for the site with visitor intent segmented across multiple dimensions of data like bounce rate, pages per visit, return visit rate, geo or temporal.

Here’s an example of analytics on two different intent segments from two different publishers:

For every (and we mean every) visitor intent and URL we provide data and analytics on the words we see co-occurring with primary intent as well as the pages that intent is arriving at (and the analytics of what happens once it gets there). We also provide performance data on those words and pages.

Yieldbot’s analytics for intent are predictive. This means that the longer Yieldbot is the site the smarter it becomes - both about the intent definitions and how those definitions will manifest into media consumption. And soon all the predictive analytics for the intent definitions will be updated in realtime. This is important because web sites are dynamic “living” entities - always publishing new content, getting new visitors and receiving traffic from new sources. Not to mention people’s interests and intent are always changing.

http://yieldbot.com/blog/serendipity-is-not-an-intent Serendipity Is Not An Intent 2011-12-30T00:00:00+00:00 2013-10-07T14:44:02+00:00 http://yieldbot.com/

Wired had two amazing pieces on online advertising yesterday and while Felix Salmon’s piece The Future of Online Advertising could be Yieldbot’s manifesto it is the piece Can ‘Serendipity’ Be a Business Model? that deals more directly with our favorite topic, intent.

The piece discusses Jack Dorsey’s views on online advertising and where Twitter is going with it. I had a hard time connecting the dots.

“…all of that following, all of that interest expressed, is intent. It’s a signal that you like certain things,”

Following a user on Twitter is not any kind of intent other than the intent to get future messages from that account. If it’s a signal that you like certain things it’s a signal akin to the weak behavioral data gleaned from site visitations.

Webster’s dictionary describes intent as a “purpose” and a “state of mind with which an act is done.” Intent is about fulfilling a specific goal. Those goals fall into two classes, recovery and discovery.

Dorsey goes on:

When it (Google AdWords) first launched, Dorsey says, “people were somewhat resistant to having these ads in their search results. But I find, and Google has found, that it makes the search results better.”

At the dawn of AdWords I sat with many searchers studying their behavior on the Search Engine Results Pages. What I and others like Gord Hotchkiss who also studied searcher behavior at the time learned, was people were not as much resistant to Search Ads as they were oblivious to them. People did not know they were ads!

Search ads make the results better because they are pull. Your inputs into the system are what pull the ads. So how does this reconcile with the core of Twitters ad products that are promotions? Promos need scale to be effective. Promos are push. Precisely the opposite of Search where the smallest slices of inventory (exact match) produces the highest prices and best ROI.

Twitter is the greatest discovery engine ever created on the web. But discovery can be and not be serendipitous. Sometimes, as Dorsey alludes to, you discover things you had no idea existed. More often, you discover things after you have intent around what you want to discover. This is an important differentiation for Twitter to consider because it’s a different algorithm.

Discovery intent is not an algo about “how do we introduce you to something that would otherwise be difficult for you to find, but something that you probably have a deep interest in?” There is no “introduce” and“probably” in the discovery intent algo. Most importantly, there is no “we.”It’s an algo about “how do you discover what you’re interested in.”

Discovering more about what you’re interested in has always been Twitter’s greatest strength. It leverages both user-defined inputs and the rich content streams where context and realtime matching can occur. Just like Search.

If Twitter wants to build a discovery system for advertising it should look like this.

http://yieldbot.com/blog/using-lucene-and-cascalog-for-fast-text-processing-at-scale Using Lucene and Cascalog for Fast Text Processing at Scale 2011-11-06T00:00:00+00:00 2013-10-07T15:07:23+00:00 http://yieldbot.com/ Here at Yieldbot we do a lot of text processing of analytics data. In order to accomplish this in a reasonable amount of time, we use Cascalog, a data processing and querying library for Hadoop; written in Clojure. Since Cascalog is Clojure, you can develop and test queries right inside of the Clojure REPL. This allows you to iteratively develop processing workflows with extreme speed. Because Cascalog queries are just Clojure code, you can access everything Clojure has to offer, without having to implement any domain specific APIs or interfaces for custom processing functions. When combined with Clojure’s awesome Java Interop, you can do quite complex things very simply and succinctly.

Many great Java libraries already exist for text processing, e.g., Lucene, OpenNLP, LingPipe, Stanford NLP. Using Cascalog allows you take advantage of these existing libraries with very little effort, leading to much shorter development cycles.

By way of example, I will show how easy it is to combine Lucene and Cascalog to do some (simple) text processing. You can find the entire code used in the examples over on Github.

Our goal is to tokenize a string of text. This is almost always the first step in doing any sort of text processing, so it’s a good place to start. For our purposes we’ll define a token broadly as a basic unit of language that we’d like to analyze; typically a token is a word. There are many different methods for doing tokenization. Lucene contains many different tokenization routines which I won’t cover in any detail here, but you can read the docs ot learn more. We’ll be using Lucene’s Standard Analyzer, which is a good basic tokenizer. It will lowercase all inputs, remove a basic list of stop words, and is pretty smart about handling punctuation and the like.

First, let’s mock up our Cascalog query. Our inputs are going to be 1-tuples of a string that we would like to break into tokens.


I won’t waste a ton of time explaining Cascalog’s syntax, since the wiki and docs are already very good at that. What we’re doing here is reading in a text file that contains the strings we’d like to tokenize, one string per line. Each one of these string will be passed into the tokenize-string function, which will emit 1 or more 1-tuples; one for each token generated.

Next let’s write our tokenize-string function. We’ll use a handy feature of Cascalog here called a stateful operation. If looks like this:


The 0-arity version gets called once per task, at the beginning. We’ll use this to instantiate our Lucene analyzer that will be doing our tokenization. The 1+n-arity passes the result of the 0-arity function as it first parameter, plus any other parameters we define. This is where the actual work will happen. The final 1-arity function is used for clean up.

Next, we’ll create the rest of the utility functions we need to load the Lucene analyzer, get the tokens and emit them back out.


We make heavy use of Clojure’s awesome Java Interop here to make use of Lucene’s Java API to do the heavy lifting. While this example is very simple, you can take this framework and drop in any number of the different Lucene analyzers available to do much more advanced work with little change to the Cascalog code.

By leaning on Lucene, we get battle hardened, speedy processing without having to write a ton of glue code thanks to Clojure. Since Cascalog code is Clojure code, we don’t have to spend a ton of time switching back and forth between different build and testing environments and a production deploy is just a `lein uberjar` away.

http://yieldbot.com/blog/recent-yieldbot-intent-streams-related-to-steve-jobs Recent Yieldbot Intent Streams Related to Steve Jobs 2011-10-30T00:00:00+00:00 2013-10-07T16:16:39+00:00 http://yieldbot.com/ At Yieldbot our focus is on collection, organization and realtime activation of visit intent in publisher content. We do this not as a network but on a publisher-by-publisher basis because of this simple fact; every publisher has a unique audience and unique content. What that means is that even if the keyword is the same across publishers, the intent associated with it varies in each domain.

The original purpose of this post however was not to point out the flaws of networked based keyword buying vs the performance advantage of Yieldbot’s publisher direct model. Nor was the purpose to show you how much we truly understand publisher side intent at the keyword level and how use that intelligence in an automated way to achieve the highest degrees of relevant matching.

The original purpose of the post was to meet the request of a few people that had asked me to share some more data visualization of our Intent Streams™ after we originally shared a few on our recent blog post about our data visualization methods.

It occurred to me the other day that the best representative example over the last month was intent around “Steve Jobs” so below we are sharing our 30-day Intent Streams™ from four publishers.

If you’re new to our streamgraphs the width of the stream is the measure of pageviews of intent associated with the root intent “Steve Jobs.” The other useful data points in these visualizations are the emergence, increases, decreases and elimination of the associated intent over time. As well as how many terms are seen to be associated with the root intent.

Another way we visualize intent data is across a scatter plot. Here you see the performance of the “Steve Jobs tribute” compared to the other intent related to Steve Jobs looking at the number of entrances (aka landings) on the y-axis and the bounce rate of that intent on the x-axis.

It’s important to note in this scatter plot visualization that the analytics are predictive. We are estimating performance forward over the next 30 days. The four streamgraph visualizations were based entirely on historical data –in their case a 30-day look back as noted on their x-axis.

We hope you find this intent data as interesting as we do.

http://yieldbot.com/blog/we-love-the-mess-of-ad-tech-and-wouldnt-want-it-any-other-way We Love the Mess of Ad Tech and Wouldn’t Want it Any Other Way 2011-10-24T15:11:00+00:00 2013-10-07T15:12:11+00:00 http://yieldbot.com/ My introduction to ad tech (roughly): “Ad networks are a mess, you wouldn’t believe what a technical mess the industry is”.

That was in my first meeting with David Cancel (then at Lookery, who since founded Performable, acquired by Hubspot) as I was bouncing an idea off of him that touched on the edges of the ad tech space.

Fast forward 6 months from then (almost exactly two years ago), I got a Twitter DM from David: “Friend is starting a new startup in the ad space. Looking for a CTO and/or help. Any interest?”

About a month later I was the CTO of Yieldbot.

Two years on I can say he was definitely right, ad tech is a mess. Part of it is how it’s evolved and part of it is structural and baked into its nature.

I’ll step it up and say this: ad serving may be the most complex distributed application there is.

The proof is in explaining why.

You Control Almost Nothing

There are so many degrees of freedom it can make your head spin.

You basically have a micro-application running embedded in a variety of site architectures each with their own particular constraints, whose users are distributed around the world running all manner of execution environments (browsers).

When you have your own site or application you still have to deal with (or choose what to support of) the myriad browsers and browser versions, complete with differences in language (javascript) version (or even whether javascript is enabled) and issues like what fonts are available on client systems.

If you are creating a destination site, that keeps your team’s hands full. If you’re serving ads, that’s just a warmup. Because you also don’t control the site architecture that you are embedded in.

You might need to sit behind any number of ad servers the publisher might be running everything through.

You might be in iframes on the page.

You might need to execute code in specific places relative to other ad serving technology also embedded on the page.

Navigation through the site may or may not involve full page refreshes.

But that’s not all…

Distributed Worldwide with Time Constraints

Remember those environments you don’t control? They are the customers’ websites, and they don’t want their users’ UX degraded.

Serve ads, optimize it for relevance, and don’t slow down page load times.

Most websites have some level of focused geographic distribution to their users. Even if it’s as broad as US or even US+Europe.

But for ad serving, your user base is the set of users aggregated across all of the sites using your service. The world is your oyster. And the footprint of what you need to service. Quickly.

But wait! There’s more!

Content Relevant To The User At That Moment

At least if you want to be as cool as Yieldbot.

Look, a CDN serving up a static image can satisfy all of the above if all you want to do is serve the same image to every user across all of your customers’ websites.

Scattershot low value ads picked fairly at random would approximate that level of ease as well.

But there’s no sport (or value) in that!

Our goal here is actually to serve content that is the most relevant to what the user is doing at that particular moment. When done right (we do), everyone wins.

So - simply serve the content that best fits what the user is doing at that moment, where they came from, and what they’ve expressed interest in *right now*, on whatever they happen to be running on, and wherever they happen to be. And make it snappy, would ya?

We Wouldn’t Want It Any Other Way

So, that’s what I signed up for - and I love it. And so does the rest of the Yieldbot team.

We started our first intent-based ad serving on a live site a couple months after coding started and started learning real world lessons immediately. And 20 months later it’s still going.

I’ve always loved to work on systems with complex dynamics, so considering all of the above it’s not that surprising I ended up finding my way to ad tech.

What I love about building Yieldbot technology? All of the above is only half the story. We also do Big Data(tm) analytics for our system to learn the intent of the users coming to our publishers’ sites. We provide them visualizations and data views that teaches them what the intent to their site is. And *then* we enable them to serve ads that make that intent actionable.


http://yieldbot.com/blog/hacking-display-advertising Hacking Display Advertising 2011-10-15T15:32:00+00:00 2013-10-07T15:33:46+00:00 http://yieldbot.com/

Being as passionate as we are about the huge advances in dynamic web languages and event based programming it is tough to love display advertising. Display advertising was never about web programming or data networking. It was nested on the web as a rogue aggregation and delivery mechanism. The ability of display to deliver relevance remains hindered by this disjointed architecture. It is not threaded into site experiences and the realtime goals of the visits on the pages where it resides. This is exactly why we’re hacking it into something else.

Vanilla Sky

In Q2 2007 while most of the industry was living some sort of vanilla sky of Behavioral Targeting one company came in and paid, what at the time seemed to most people way too much money, to own a controlling interest in display. No, I’m not referring to AOL buying TACODA. The company I’m talking about has maintained a focus since day one on hacking what you are interested in at that very moment. Unlike other content aggregatorsit tied its advertising system in the core user experience of its pages and the realtime relevance they delivered. Their stated Display strategy has little to do with cookie matching and everything to do with realtime context and creative optimization with the purpose of “capturing relevant moments.” It is now the most powerful company in Display. That company is of course Google.

Lucid Dreaming

The first lesson here is about the medium itself. This is a different medium and the old media buying and selling template breaks here. Behavioral Targeting may have changed names to the less scary “Audience Buying” but seven years later performance expectations have not been met and it has dragged display into the mud of issues like privacy, ad verification, cookie stuffing and more.

By contrast, Search and email – the most important of the web’s applications - have little use for tracking people across the web, let alone reach and frequency measures. They are the opposite of that. Search (and the web itself) was built by hackers to solve information management and retrieval problems.

The second lesson is that in this medium three pieces of data are valuable – context, timing and performance. The rest is just pipes. Understanding the context of an impression or click at the moment the page is loading and the ability to optimize the message is what the web was built for. It took Search to turn it into a marketing channel but growth (~20% YoY with no end in sight) and the size ($46B in 2013 per eMarketer estimates) of that channel shows how powerful that data is and how helpful understanding it can be to consumers.

Waking Up

The fact that it is referred to as display “advertising” is reason enough to know it’s from another time. This medium kills advertising. Everything on the web is marketing. As Suzie Reider, national director of display sales for Google said recently “display needs to move beyond advertising and into interacting.” Yesterday, Krux CEO Tom Chavez wrote a thoughtful blog post on how it is time for display to move beyond advertising. We agree and we’re walking the walk.

This doesn’t mean that publisher will not show ‘graphical’ units as Google calls them. Of course they will. It doesn’t mean that prime real estate isn’t going to be turned over to these units, they will. We’re headed to a world with fewer messages that will be bigger and more interactive. But if we have learned anything from Search it is that format and size don’t matter when the message is relevant, helpful and useful at that moment.

Billions of Relevant Moments

As long as technology to understand context and timing are progressing as fast as they are (and as places like Betaworks where realtime is the thesis of the new medium value creation the startups are hacking away) there is a bright future. Search has proven that the web is the greatest and most democratic marketing medium ever created. The hackers working with dynamic web languages and event driven programming can unlock an order of magnitude of more relevant moments. There are literally billions of them out there waiting to be captured and created. At Yieldbot we see this scale everyday in the inventory of web publishers and if you’re a hacker and remaking the staid idea of advertising appeals to you we would be interested in speaking with you.

http://yieldbot.com/blog/one-slick-way-yieldbot-uses-mongodb One Slick Way Yieldbot Uses MongoDB 2011-10-05T00:00:00+00:00 2013-10-07T15:40:52+00:00 http://yieldbot.com/ As a key/value store. Wait, what? Yeah, MongoDB as a key/value store.

Why? Because we were already using Mongo and planned to move some of our data to a key/value store. But we also didn’t want to wait until we made a decision on a specific solution to make the transitions in our code.

First, as if you wouldn’t have guessed, here’s how easy a “Python library for MongoDB as a key/value store” is:

def kv_put(collection, key, value):<br> """key/value put interface into mongo collection"""<br> collection.save({'_id': key, 'v': value}, safe=True)

def kv_get(collection, key):<br> """key/value get interface into mongo collection"""<br> v = collection.find_one({'_id':key})<br> if v:<br> return v['v']

Note: kv_get() returns None if nothing found, so technically this doesn’t gracefully handle the case where you want None to be a possible value.

What was the pain point?

We basically found that we had collections with nightly analytics results that were really big, and whose indexes were really, really big. And the index requirements were going way beyond our 70GB RAM server. We didn’t want to shard our Mongo server because of the cost involved, so instead decided to take a different appraoch. Since this data was read-only results of analytics, where we once had collections that had entries that were multiply indexed, we now have collections that are pages of the old entries in a defined sorted order and are accessed as key/value.

How did the change work out? Great. We still haven’t switched from MongoDB for this data. Still plan to, but in a startup once you address a pain point you move on to the next one.

You definitely can’t argue that MongoDB isn’t flexible.

http://yieldbot.com/blog/the-only-chart-you-need-to-know-the-future-of-advertising The Only Chart You Need to Know the Future of Advertising 2011-10-01T00:00:00+00:00 2013-10-07T15:59:54+00:00 http://yieldbot.com/

all media is performance media

http://yieldbot.com/blog/how-yieldbot-uses-d3.js-jquery-for-streamgraph-data-visualization-and-navigation How Yieldbot uses D3.js + jQuery for Streamgraph Data Visualization and Navigation 2011-09-26T16:02:00+00:00 2013-10-07T16:07:12+00:00 http://yieldbot.com/ One problem we needed to solve early on at Yieldbot was understanding intent trends in the publisher data. This couldn’t just be shallow understanding. We needed to expose multiple data trends at the same time around thresholds and similarity. Our need:

  1. Allow what we call the “root intent” trend to be apparent.
  2. Break out the “other words” that are associated with the root intent.

Making this happen in an integrated fashion meant we needed some flexible and powerful tools. We found that d3.js and jQuery UI were the right tools for this job.

Looking through the excellent documentation and examples from d3.js we saw the potential to build exactly the type of visualization we needed. We used a stacked layout with configurable smoothing to allow good visibility into both the overall and individual trends. Smoothing the data made it very easy to follow the individual trends throughout the visualization.

Having settled on this information rich way to visualize the data with d3 we then took the prototype static visualization and made it into a dynamic piece of our interface. It was very important for us that the data be more than just a visualization - we wanted it to be navigation. We wanted the data to be part a tangible and clickable part of the interface. The result was that each of the intent layers is clickable and navigates to another deeper level of data.

Having the core functionality in hand we used the jQuery UI Widget Factory to provide a configurable stateful widget that encapsulates the implementation details behind a consistent API. This makes using the widget very easy. Creating a trend visualization is just a one liner - while the raw power and flexibility is wrapped up and contained in the implementation of the widget.

Here are a few examples of this visualization in action:

With this reusable widget in hand we could use this trend visualization across our application in numerous places. This provides consistency to our interface that is extremely important UI concern for such a data intensive product.

Our approach to developing innovative data visualizations has been consistently repeatable as we now have 3 additional visualizations in the product and have played around with many more than that. Each time these are the steps we take when creating a new data viz.

Throughout the process the flexibility that d3 provides meant we never bumped into a wall where the framework complexity jumped drastically. It appears that the wall of complexity is still far off in the distance if it exists at all. As our understanding of d3 increased and with the use of prototypes driven by live data we are able to quickly iterate on ideas and design. This flexibility will continue to be one of the many long-term benefits that we’ll get from using d3.

Data visualization plays an important role in our product and we’re excited to keep using it to solved data comprehension problems. Not to mention it really brings the data to life. If you’re interesting in data visualization or this process we’d love to hear your thoughts.

http://yieldbot.com/blog/looking-for-a-few-good-devs Looking for a Few Good Devs 2011-09-26T00:00:00+00:00 2013-10-07T16:00:13+00:00 http://yieldbot.com/ At Yieldbot we’re a small team building incredible technology that’s getting major publishers and advertisers hooked. We’re always looking for the best technical talent to join our development team and work with us on getting to the next level, and as CTO I think it’s only fair that you know what we’re looking for. :-)

What you need to know:

  • We’re a small talented team looking to become a bigger talented team.
  • We work on tough interesting problems, use cutting-edge “big data” technology, and enjoy winning.
  • We’re working on revolutionizing how web advertising works.
  • This is gonna be big.

What we need to know:

  • You like to solve tough problems and have a history of winning.
  • You can code in a few different languages and are expert with one of them.
  • That language is Python or you are py-curious.
  • You want to make big contributions on a small team and build a valuable business.

Some technology stuff:

  • Python’s big here, but we rock Clojure and Javascript too.
  • We use Django. Bonus if you know it, but if you don’t it ain’t so hard.
  • We wrestle some serious big data, and use Hadoop to do it.
  • MongoDB and HBase compete for our love.
  • Chef and the Vagrant knifed Puppet and we’ve never been the same

If you’re a fit, dust off your Python and contact us:


”.join(map(chr,map(lambda x:x[0]+x[1],zip(map(ord,str(dir)),yieldbot))))At Yieldbot we’re a small team building incredible technology that’s getting major publishers and advertisers hooked. We’re always looking for the best technical talent to join our development team and work with us on getting to the next level, and as CTO I think it’s only fair that you know what we’re looking for. :-)

http://yieldbot.com/blog/rise-of-the-publisher-arbitrage-model Rise of the Publisher Arbitrage Model 2011-09-15T00:00:00+00:00 2013-10-07T16:12:31+00:00 http://yieldbot.com/

The business of the web is traffic. Always has been. Always will be. That’s why I was a bit surprised with how much play Rishad Tobaccowala’s quote received last week in the WSJ:

"Most people make money pointing to content, not creating, curating or collecting content."

The original lessons of how to make money on the web got lost because at some point everyone with a web site thought they could go on forever just selling impressions. They didn’t foresee two things:

A) The massive amounts of inventory being unleashed on the web. In the last two years alone Google’s index has gone from 15B pages to 45B pages.

B) The explosive rise of the ad exchange model with its third party cookie matching business that gave advertisers the ability to reach a publisher’s audience off the publisher site and on much cheaper inventory.

Fortunately the more things change on the web the more they stay the same – especially the business models.Traffic arbitrage, the web’s original model, is a more viable model than ever for publishers and likely the only hope to build a sustainable business in the digital age.

The web is literally built around traffic. Here’s what it looks like right now.

From Search to Email to Affiliate the value and monetization of the web occurs in sending and routing these clicks or as Tobaccowala said “pointing out” at a higher return than your cost of acquiring the traffic.

Google of course is the biggest player in the arb business. 75% of the intent Google harvests costs them nothing. They’ve been able to leverage all the intent generation created in other channels like TV that shifts to Google for free. Just take a look at how much TV drives Search. But Google also pays for traffic. Last year they spent over $7.3 Billion or 25% of revenue on what they call TAC (traffic acquisition cost) to get intent. Of course you need this kind of blended model to be successful as Google is with arb.

With the growing (and free) traffic generating intent to publishers it is time they got in this game. Organic Search continues to drive higher percentages of traffic to top publishers (and the Panda update has pushed that even higher). YouTube, Facebook and Twitter are also sending more traffic all the time at no cost to Pubs. The seeds of a huge arb model continue to be sown.

Vivek Shah CEO of Ziff Davis speaking to Ad Exchanger about ‘What Solutions are Still Needed For Today’s Premium, Digital Publisher’ put it this way:

“…you need to invest in technology that can sort through terabytes of data to find true insights into a person’s intent…not just surface “behaviors.”

This speaks directly to the arb model and how it becomes a true revenue engine for publishers. Once you understand the intent that is present on your site you can then quantify its value. Once you quantify the value you can figure what intent you need to get more of, seed that intent with new content and how figure out how much you can spend to drive traffic to it. This is exactly the model we’ve been working with publishers - using Yieldbot to qualify and quantify the intent on their sites.

So the future of the web looks a lot like the past. There will be marketing levers and technologies that optimize what traffic you drive in and there will be marketing technologies that optimizes where and at what value that same traffic goes out. Everything else you will do in your business will support those value creation events. Yieldbot just wanted to “point out” that for publishers.

http://yieldbot.com/blog/why-publishers-are-the-bread-in-the-intent-sandwich Why Publishers are the Bread in the Intent Sandwich 2011-08-30T00:00:00+00:00 2013-10-07T16:14:44+00:00 http://yieldbot.com/

There are few main theses that I’ve spent my 14 year career online successfully operating under. The most successful one has always been to leverage the fact that the web is the only user controlled medium. The more ability you give for visitor events to define run-time or dynamic rules the better your ability to deliver relevance. Put another way, there is no better segmentation than self-segmentation.

In marketing terms this means pull instead of push. The best representative example of this is, of course, Search. A visitor action provides an input (query) and everything else processes based off that rule*. Relevance is delivered (and the most successful advertising technology is born) by pulling content to data input rules. Dynamic landing page optimization works the same way.

So why does this matter to publishers whose business is one of “pushing” content? It matters because of one incredible fact about Search that seems to get lost on Publishers. Neither the intent that precipitates the query or the content used to deliver relevance to it belongs to Search. People bring their intent to Search and Search sends it to content created by digital publishers.

Search maybe the meat in the intent sandwich but you can’t have a sandwich without bread. Bread is the media generating intent and receiving intent. Publishers are the bread in the intent sandwich and bread is what publishers have been leaving on the table.

Two years ago I wrote about the opportunity to build an intent harvesting platform for publishers and Chris Dixon followed up with a piece Why Content Sites are Getting Ripped Off. In the ensuing time our team went ahead and built that platform. Because of the complex level of intelligence and scale needed it has taken until now to bring Yieldbot to market. In fact we spent a full-year in invite only beta doing nothing but learning. Now that we’ve released Yieldbot we’re finding out even more amazing things about intent on the Publisher side and the opportunities are obvious and bode very, very well for the future of ad supported publishers.

The most important thing may be that publisher inventory for realtime intent dwarfs Search. As David Koretz figured out a couple years ago the top 200 pubs generate 2000% more pageviews than Google. We see that dichotomy everyday in our data. The amount of inventory using first party (better) data also dwarfs third-party data as Chris O’Hara at Traffiq recently pointed out.

"You have an entire ecosystem built around audience targeting using 3rd party data. The problem? The companies with better and deeper first-party data have a lot more audience"

We also see that most of the advertisers who are buying this intent in Search are not buying from the Pubs that have the exact same intent on their site. Even better the pubs intent is more down-funnel, has greater context, is less competitive and has the influential power of being in a branded domain and can leverage creative in ways search cannot. Publisher side intent should be more valuable to advertisers than Search with its SERP landscape of crowded text link ads and arcane rules.

As we work with pubs to understand this information arbitrage opportunity it’s clear that intent matched with timing and context can improve visit monetization by an order of magnitude. And why shouldn’t it? Yieldbot structures the data from the page that is clicked on and the subsequent pages that are clicked to.This scenario happens millions of times a day on large sites and these (click) streams of data provide rich realtime intent data that fuel our intent classifications and matching rules.

The bottom line is publishers are in the unique position to both classify the intent on their sites and use pull rules once that intent is recognized to deliver the highest levels of relevance in realtime - just like Search. Even better, they never lose ownership of their data and can monetize it directly with advertisers - even using their own ad server. This is game changing. This is the bread in the Intent sandwich. This is Yieldbot.

The Yieldbot team will be writing a lot more about our data and technology right here on our blog. We hope you join the conversation about the power shift to publishers and their data in the ad ecosystem.

* there are additional rules that are used with the query as well such as geo, temporal, query number, however the query itself is the primary rule.