Posterous theme by Cory Watilo

Introducing Pascalog

Media_httpfarm7static_bgjxj

(Shared under Creative Commons Attribution-ShareAlike license: Flikr user Timitrius)

Today, the dev team at Yieldbot is excited to announce plans to open source one of our prized internally developed technologies: Pascalog.

Technology often evolves more in cycles than linearly, with past patterns showing through as more recent innovations are made.

For a while we were doing all of our analytics in Cacsalog, and things were going great. As a Clojure DSL written on top of the Hadoop Cascading API, Cascalog is a brilliant technology for efficiently processing large data sets with very tersely written code.

In fact, we even wrote about those experiences here and here.

But we found ourselves writing things like this:

(<- [!pub !country !region !city !kw !ref !url ?s]
    (rv-sq !pub !country !region !city !kw !ref !url ?pv-id ?c)
    (c/sum ?c :> ?s))

We thought that there had to be a better way. When we realized that Clojure being a Lisp has its foundations in the 1960’s we immediately realized the next logical step would be an upgrade into the 1970’s.

Wouldn’t we want to write something more like:

program HelloWorld;
begin
   writeln('Hello, World!');
end.

And we immediately set upon bringing the best of software development of the 1970’s, Pascal, into the Big Data world of the 2010’s. Pascalog was born.  (who couldn’t love a language that wants you to end your programs with a “.”?)

This also fit well with internal discussions we were having at the time lamenting the complexity of managing a Hadoop cluster and the efficiencies that might be gained by combining all the functionality back into one processing environment on a mainframe. That dream is on hold until we find a suitable hardware vendor, but there was certainly no reason to hold Pascalog development back for that.

Data is a readln() Away

In Pascalog we’ve done the heavy lifting. By adapting readln() to be bound to a Cascading Tap, you read data in the way you’ve done since your Turbo Pascal days.

It didn’t take us long to realize that you’d want to save the results of your calculations somewhere, so in a followon version we added the mapping of writeln() to an output Cascading Tap.

Configuring your input and output taps and mapping them to readln() and writeln() is as easy as configuring an INI file.

An upcoming version which should be available shortly will also allow the readln() of one Pascalog program to be mapped to the writeln() of an upstream Pascalog program, allowing you to daisychain your Pascalog programs.

Why Pascal?

We make it sound above like we jumped onto the Pascal bandwagon right away, but in truth we considered several alternatives from the 1970’s.

Of particular interest was the ability to write nested procedures. We’ve grown accustomed to this from our Python development on other parts of the platform and this allows us to migrate between the two worlds more seemlessly (compared to, say, Fortran).

The availability of a goto statement is also a great feature to bail you out if you start getting a little too lost in your control flow. This has become a lost art.

We did consider C, but couldn’t get over the hump of having it named “Clog”.

The Future

We’re furiously looking for a Pascal Meetup group where we can make a live presentation. If you know of one, please let us know!

We have a long list of features in mind to build, but we also want to hear back from the community.

Visit www.pascalog.org to get started! We’re looking forward to the pull requests. If you have live questions there’s usually one of us hanging out on CompuServe under user ID [73217, 55].

 

by Rich Shea

| Viewed
times | Favorited 0 times

2 Comments

Leave a comment...