Why Yieldbot Built Its Own Database

Posted February 4th, 2013 by admin in Insight


About a year ago we made the decision to switch over all of our configuration to a new database technology that we would develop in-house, which we ended up calling HeroDB.

The motivating principle behind what we wanted in our configuration database was a reliable concept of versioning. We had previously tried to manually implement some concept of versioning in the context of available database technology. This would keep around older versions of objects in a separate area with a version number identifying them, and application logic would move copies of whole objects around as changes were made to them. Data events would contain versions of the objects that they were related to. While this did some of what we wanted, it was clear that this was not the solution we were looking for.

Enter Git

at the core of Git is a simple key-value data store. You can insert
any kind of content into it, and it will give you back a key that you
can use to retrieve the content again at any time.
– Pro Git, Ch 9.2

While we had these challenges thinking about the management of data in our application, we were managing one of our types of data with perfect version semantics. Our source code. Some simple searches told us we weren’t the first to think about leveraging Git to manage data, but there also wasn’t anything serious out there for what we were looking for.

So we thought hard about what Git would be able to provide us as a database (there are definitely both pros and cons) and how it intersected with what we were looking for in versioning our configuration data. A few of the things we liked:

  • every change versioned
  • annotated changes (comments) and history (log)
  • every past state of the database inspectable and reproducible
  • reverting changes
  • cacheing (by commit sha) – specific version of data doesn’t change

There were definitely cons, which we decided would be worth it for the strength of the benefits we’d be getting. A few of the cons we decided to live with:

  • comparatively slow, both reads and writes
  • size concerns, would shard
  • no native foreign keys

Some of these can be mitigated. For instance, read performance can be improved (with caveats) by having cloned repos for read that are updated as changes are made to the “golden” repo. To mitigate size concerns, and because there is no native concept of foreign keys, data can be sharded with no penalty to what can be expressed in the data.

What We Did

Once we decided we liked the sound of having Git-like semantics on our data, we went about looking for projects that might aleady be available that provided what we wanted. Not satisfied with what we found our next step was to look for suitable programmatic interface to Git. In the end we found a good solution there in a project named Dulwich ( which is a pure Python implementation of Git interfaces.

With Dulwich as the core, we implemented a REST API providing the data semantics that we wished to expose.

In terms of modeling data, we took Git’s concept of Trees and Blobs, conceiving of Trees as objects and Blobs as attributes, with the content of the Blobs being the values of the attributes. The database itself exists as a Git “bare” repo. Data is expressed in JSON, where ultimately all values are encoded in Blobs and where Trees represent nesting levels of objects.

The following simple example illustrates the JSON representation of a portion of our database realized in Git and what that looks like in a working tree.

Example JSON:

{“”: {“name”: “Alice”, “age”: 18}, “”: {“name”: “Bob”, “age”: 22}}

In cloned repo:

$ find .

$ cat
$ cat

What’s magical about representing your data this way is that it has a very understandable and easy to work with when realized in a filesystem view of the repo (i.e., the working tree). The database can literally be interacted with by using the same Git and sytem tools that developers are used to using in their everyday work. The database is copied local by cloning the repo. Navigating through the data is done via `cd` and `ls`, data can be searched using tools like `find` and `grep`, etc. Best of all, data can be modified, committed with appropriate comment, and pushed back to production if need be.

Interesting Directions to Evolve

Thinking about managing your data the same way you manage your source code leads to some interesting thoughts about new things that you can do easily or layer on top of these capabilities. A few that we’ve thought of or done in the last year:

  • Reproduce exactly in a test environment the state of the database at some point in time in production in order to reproduce a problem.
  • Discover the exact state of the database correlating to data events (by including commit sha).
  • Analyze the effect of specific changes to configuration on behavior of the platform over time.
  • Targeted undo (revert) of a specific configuration change that caused some ill effect.
  • Expose semantics in a product UI that map to familiar source control semantics: pull/branch/commit/push

Why We Did It

The short answer of why we did it was because we considered history such an important aspect of the database. Building a notion of history into the database itself was the best way to ensure the ability to correlate data events like clicks on ads back to configuration of monetary settings that drove the original impression in an auditable fashion. Not finding the solution to our problem anywhere we followed one of Yieldbot’s core principles and JFDI’ed.

The simplicity of the approach hit two strong notes for us. First was that the simplicity brought with it a kind of elegance. It is easy to understand how the database handles history and to reason about the form of the data in the database. We also immediately got functionality like audit logging built into the database for free. And ultimately for a technical team that at the time was four developers with our hands full building rich custom intent analytics, performance optimized ad serving, a rich javascript front-end to manage, explore and visualize our custom intent analytics, and a platform that scales out to a worldwide footprint, we could focus on our core mission of building that product.

We’re discussing our work with HeroDB later this week:

Yieldbot Tech Talks Meetup (Feb 7, 2013 @ 7PM)

Interested in joining the team? We’re hiring!

Yieldbot 2012 Review

Posted December 31st, 2012 by admin in Company News


2012 was a huge year of progress at Yieldbot. We started off the year by taking our second round of funding in February from true VCs that didn’t need to see “traction” before making their bet. With that investment we grew the company over the course of the year from 5 to 19 full-time employees in New York and Boston. Companies are about people first and we have put a together a tremendous team of data scientists, developers, engineers, strategists and sales people who have joined us from companies like Criteo, Microsoft, Kayak and the Wall Street Journal.

After two years of development from that small team of 5 we launched our real-time consumer intent marketplace in May. It was worth the effort. The amount of paid clicks in the Yieldbot marketplace has doubled every two months since we launched. Doubling the size of your business 4 times in 8 months presented numerous scaling challenges that touched all parts of the business. Our team handled them incredibly well.

We have a good number of the word’s leading brands and many other marketers large and small extending their Search budgets into Yieldbot to buy real-time intent on a performance (Pay Per Click) basis. Speaking of performance, in many cases advertisers are seeing results as good or better than what they see in Paid Search from traffic derived from Yieldbot. As most industry observers know, this is heretofore unseen.

Best of all we truly unlocked a direct path for Search Marketing budgets to reach Premium Publishers and buy intent in real-time. This is an industry first and we consider it a monumental achievement in digital media. Yieldbot’s largest publisher partner is on a $2M 2013 run rate for new revenue - Search revenue. These are budgets they have never touched and their direct sales teams have never called on. We truly have created a new channel. That doesn’t happen very often.

In 2012 Publisher partners were stacking Yieldbot behind their sponsorships/direct sold impressions and ahead of exchange/network. That’s a great starting place but we aim to create much more value as our technology improves in 2013. We saw over 15M different consumer intentions across more than 2.7B page-views in 2012. The more data we capture the better we perform. This is one reason Yieldbot overall platform CTR (click-through rate) has gone up every month even as impression levels have skyrocketed. There are efficiencies created as markets get larger and those will benefit both Yieldbot advertisers and publishers.

Major initiatives around automation and artificial intelligence were also started late in 2012 that will make optimization of Yieldbot performance completely automated. From campaign set-up and launch through goal management, the use of first party data and ad server integration creates an opportunity to reshape what is possible with marketing technology and reduce the resources necessary to manage campaigns.

We head into 2013 fully aware that we have not accomplished anything close to our goals and we are still at the beginning of building our business. We have 2 new verticals launching Q1 and more growth to manage ahead of us. There is also pressure that comes from the sheer enormity of the opportunity in front of us. That’s a good thing. It was Billie Jean King that said “pressure is a privilege” and we’re privileged to be solving problems that bring the highest quality consumers, world’s top marketers and premium content publishers together in a way that delivers relevance and value to each simultaneously.

Should old acquaintance be forgot and never thought upon? Maybe. But if you did not make the acquaintance of Yieldbot in 2012 and you are a Search Marketer or Premium Publisher we hope you do in 2013. In the meantime, Happy New Year!

An Amazing Search Insider Summit with One Thing Missing

Posted December 19th, 2012 by admin in Events


I have recently returned from arguably the most in depth, interesting and well attended event involving the search industry Mediapost’s Search Insider Summit. After spending the past nine months attending a number of other events and conferences and indoctrinating myself into this incredible community on behalf of Yieldbot, SIS clearly stood out.The content of the conference encompassed a number of the past, current and future trends of a business that is still incredibly young despite its size both in sheer economics as well as the volume of businesses both big and small that are participating in it.Due to that dynamic combination of youth and size there are many exciting new things emerging as bright entrepreneurs with new ideas and the existing industry behemoths continue to innovate and bring new methodologies, technologies and ideas to the table.

So first a brief outline of a couple of the new opportunities discussed that were most interesting and then a dive into what was so obviously missing from the dialogue.

Creative – if there was one direction and point on which it seemed almost all of these experts agreed it was that new creative formats, creative technologies, creative optimization methodologies will continue to be at the forefront of the industry’s innovation.As GYB (Google, Yahoo, Bing) and the other “traditional” platforms continue to experiment and innovate with SERPs that are more dynamic and visually pleasing for the dual purposes of better engagement and the ability to attract more dollars from brands with higher levels of concern about image, there are interesting companies well represented that are doing cool stuff to leverage this need.One that comes to mind is Datapop a real technology innovator in this space.Then there is the simple blocking and tackling of better text copy optimization being tackled by folks like BoostCTR less a technology and more a time saving method for the drudgery of copy testing.

New Platforms – the other area where much of the most interesting discussions took place and where the opportunities for industry growth seem most fertile were in new platforms.Everything from increasing growth of mobile and tablet search and its different nuances from desktop to the coming of more vertical search encompassed by Yelp, Amazon and others (surprisingly none of which were represented at SIS with the exception of the very cool Intent Media but were much talked about.)Even really cool (but maybe a little scary from a privacy perspective) technologies like visual search from Xbox were hot topics.These new platforms create new markets and new audiences on top of which the best practices of the search industry are being built and their rapid growth is representative of their promise

And now the missing…

It was never clearer to me than at this great conference that content publishers (not to be confused with eCommerce publishers) are such a glaring afterthought to the leading innovators of the most successful and impactful part of the digital advertising industry… the search community.There was not a content publisher in the room nor was there a mention of one on the stage (except briefly by me).Yet, I bet that if asked in a vacuum (and I did a bit of asking) that most of this talented group knows that premium content publishers are hardcore SEO buffs, often times buy Paid Search and are sitting on a treasure trove of first party data. When properly harvested (as we do at Yieldbot) this data can illuminate the “search-like” behaviors of web visitors in their sessions. Selling these visitors’ premium publisher intent in the currency of keywords to the very marketers that makeup the search ecosystem (that was so well represented at SIS) represents an enormous opportunity for market expansion. Not just for those publishers to access search budgets but also for search marketers to find new ways pin pointing the user that they want in real time as they are expressing interest in a specific category.Utilization of this real time data can (and does) yield in some cases results even better than traditional search itself all over the marketing funnel (from branding to conversion).

We heard and talked about the utilization of third party search data and third party site visitation in marketers’ and eCommerce platforms as a new data set for the search marketplace to leverage its methodologies in buying performance media.There is no question that there is a place for that and some have done quite well.But nowhere in this paradigm is the use of FIRST PARTY DATA in REAL TIME within publisher content to make ad decisions on the same pubs from which that data is harvested being mentioned.In this paradigm the publishers can participate at the very least on par in the search (and performance) digital marketplace like all these old and new search platforms that marketers know and want more of.

So in the vain of a good search creative call to action we say: Come talk to us, content publishers!!!!!!And next Search Insider Summit join the conversation and participate in the marketplace that our friend Murthy from Adchemy proclaimed was (to paraphrase) the dawn of a new era in search the most successful (digital) advertising platform ever.You can and should be a major voice in that room.

Data as a Currency

Posted October 8th, 2012 by admin in Insight


This past week was Advertising Week in New York and Yieldbot did a session with our friends @bitly during the AdMonsters OPS event titled “Data as a Currency.” The main portion of our presentation was the first public demonstration of Yieldbot’s publisher analytics product. Prior to that we led off with a brief overview of how we create currency with our data and by currency we mean real dollars to publishers and advertisers on our platform. Below is the presentation we gave. If you would like to learn more please email

Data as Currency - OPS from yieldbot

Driving Traffic – The Publisher Panacea

Posted September 27th, 2012 by admin in Insight


The dream of web banners and selling impressions to large brand budgets is over. The value of audience data has surpassed the value of the ad impression. With this backdrop the future for publishers is this simple. They will get one more chance to enter the click based economy on their own terms (meaning owning their media and data) or they will lose control of their own business and become a media tool of Google and Facebook.

The last few years of rapid change in the ad tech world away from ad networks and towards ad exchanges has been a confusing one for premium publishers. First, they turned to the idea of “Private Exchanges” places where advertisers could come with their data and buy directly on the publisher inventory. The reality is there is no demand from advertisers because impressions are cheaper elsewhere. 18 months after being all the rage nobody talks about “Private Exchanges” anymore.

The new shiny object for Publishers is now Data Management Platforms or DMPs. While pubs seem to like the content optimization qualities of these platforms there are real issues using DMPs effectively with their media. DMPs add complexity and publishers are not technologists or marketers. Most important DMPs do not solve the underlying problems for publishers. Audience data is becoming commoditized and the value of their media on an impression basis continues to sink like a stone.

While Publishers have been fumbling the simple Click Economy has grown to the neighborhood of $40-50 Billon in annual revenue. This advertising economy includes Search, Contextual, Email, Affiliate. Everything that drives traffic e.g. clicks to marketers. Its newest entrants are Facebook, Twitter and Retargeting companies like Criteo – all of which are experiencing massive growth. In this economy two things stand out; the ad impressions are free and the performance-based business model drives value higher to those with the best data intelligence.

The businesses that were built to sell impressions to brands - Yahoo, AOL, Microsoft, - have been passed by these businesses that drive traffic. Even choruses of “the click is dead” from fearful self-interested impression supporters cannot stop the basic fact that the web has always been monetized one way or another through traffic arbitrage and always will be. To she who sends the most valuable traffic goes the spoils.

The inflection point is now.

What we’re seeing from Yahoo is representative of the change that Publishers must make in order to survive. Bringing in one of the sharpest minds on Search as CEO to help save the struggling web banner company should be a beacon to all publishers. Your business is a utility. You deliver valuable content. That includes advertising. The value of that advertising is based on the quality of your audience to the advertiser as measured by a performance metric. Your ability to increase the value is based on the relevance of the message.

If that sounds a lot like Search it should. Those are the core tenants of that marketplace. A market that started later but has grown roughly twice the size of web banners. Publishers are paying dearly for missing the boat on that.

Think how much revenue the New York Times could be making had instead of web banners its digital revenue focus was to deliver traffic and conversions to sites like Expedia, Amazon, eBay, Bankrate, Home Depot, and on and on and on. It certainly would have been larger than IAC the company that just acquired from them. IAC market value is 4x the New York Times. To give you another perspective the Times will do roughly $300 million this year in digital revenues. IAC will do over $2 billion.

Quietly taking advantage of publisher fumbling is Google. Google has expanded its grip on not only the publisher media through its Ad Exchange and AdSense but also their data through Google Analytics and its DFP ad server. Most publishers will openly admit that Google knows how much money they make and more about their audience then they do. When you don’t know who is coming into your store and you don’t know how much they are paying or what they are leaving with any business will die. That is exactly what is happening. Facebook is soon to take the same approach as it builds out its ad network on the back of all their javascript publishers have installed the past few years on their sites.

The fact is that publishers are already driving valuable traffic they are just not getting paid for it. A few years ago the New York Times said that 25% of its traffic leaves and goes to Google. Doing some back of the napkin math that’s about 20 million exits a month to Google and at estimated Google’s RPM rate of $80 that’s about $20 million a year in ad revenues the Times delivers to Google from intent the NYTimes themselves has generated. There needs to be an endcap.

Better yet, there needs to be a publisher controlled marketplace where the true value of traffic from premium publishers is understood, captured and passed on to marketers. Where the data is transparent and the optimizations generate mutual benefits to the publisher, marketer and site visitor. This would create a new channel. The opportunity is massive and real. Some premium publishers are already doing it and experiencing incredible revenue growth. Let us know if you want to be one of them.

First Page « 1 2 3 4 5 6 7 8 9 10 » Last Page
More from Our Blog
Yieldbot's First Annual Super Bowl Intent Scorecard

Posted February 3rd, 2015 by Jonathan Mendez in intent, CTR

Yieldbot 2014 Review by the Numbers

Posted December 22nd, 2014 by Jonathan Mendez in

Rise of the Intelligent Publisher

Posted November 10th, 2014 by Jonathan Mendez in Media, CPC, Performance , Publishers, Data, Analytics, First Party, real-time

TF-IDF using flambo

Posted July 22nd, 2014 by Muslim Baig in Clojure, Data, flambo, Analytics

Marceline's Instruments

Posted June 25th, 2014 by Homer Strong in Clojure, Data, Storm, Analytics

View More

Yieldbot In the News
RTB’s Fatal Flaw: It’s too slow

From Digiday posted September 23rd, 2014 in Company News

Yieldbot Hands Publishers A New Way to Leverage Their First-Party Data

From Ad Exchanger posted September 23rd, 2014 in Company News

Yieldbot Raises $18 Million to Advance Search-Style Display Buying

From AdAge posted September 23rd, 2014 in Company News

Follow Us

Yieldbot In the News

RTB’s Fatal Flaw: It’s too slow

From Digiday posted September 23rd, 2014 in Company News

I have some bad news for real-time bidding. The Web is getting faster, and RTB is about to be left behind. Now, 120 milliseconds is becoming too long to make the necessary computations prior to page load that many of today’s systems have been built around.

Visit Site

Yieldbot Hands Publishers A New Way to Leverage Their First-Party Data

From Ad Exchanger posted September 23rd, 2014 in Company News

Yieldbot, whose technology looks at a user’s clickstream and search data in order to determine likeliness to buy, is extending its business to give publishers a new way to monetize their first-party data.

Visit Site

Yieldbot Raises $18 Million to Advance Search-Style Display Buying

From AdAge posted September 23rd, 2014 in Company News

Yieldbot, a New York based ad-tech company that lets advertisers buy display ads via search-style keywords, has raised a $18 million series B round of funding

Visit Site

Much Ado About Native Ads

From Digiday posted December 5th, 2013 in Company News

The most amazing thing about the Federal Trade Commission’s workshop about native advertising Wednesday morning is that it happened at all. As Yieldbot CEO Jonathan Mendez noted...

Visit Site

Pinterest Dominates Social Referrals, But Facebook Drives Higher Performance [Study]

From Marketing Land posted October 3rd, 2013 in Company News

Publishers in women’s programming verticals such as food and recipes, home and garden, style and health and wellness have found a deep, high volume source of referral traffic from Pinterest.

Visit Site

Pinterest Sends Your Site More Traffic, Study Says, but Maybe Not the Kind You Want

From Ad Age posted October 3rd, 2013 in Company News

Pinterest may have quickly arrived as a major source of traffic to many websites, but those visitors may click on the ads they see there less often than others.

Visit Site

From Our Blog

Yieldbot's First Annual Super Bowl Intent Scorecard

Posted February 3rd, 2015 by Jonathan Mendez in intent, CTR

Read More

Connect With Us

Where to Find Us

New York City

149 5th Ave.
Third Floor
New York, NY


1 Clock Tower Place
Suite 330
Maynard, MA


1033 SE Main St.
Suite #4
Portland, Oregon