Make room for Functional Programming (2)
Quod in vita agimus, in aeternum resonat
Tags: development
This is part two, of a two part look at functional programming (FP). The first article examined some of the concepts and terms used. This piece puts them into practice in a worked example. Proponents of FP focus largely on its mathematical purity and its promise of reducing the bottlenecks inherent in the stored-program model of logic/memory proposed by John Von Neumann (AKA “how computers work now”) and moving more towards models put forward by people like Alonzo Church and John Backus.
I say that this is fine and dandy (and largely beyond my intellectual capacity, frankly) but what’s also hugely promising is that FP techniques make regular, imperative, programming (AKA “how we write software now”) more efficient, easier to maintain, more elegant, reusable and component-based, better able to support systemic requirements and, if you believe in such things, more beautiful. Let’s get to the code.
The Code
Our employer, Gristle and Flint, wants to establish a more intimate relationship with its customers and their buying activity. As new on-line promotions are launched, we wish to track their take-up such that we can respond in the most appropriate way. We don’t know what the most appropriate way is yet (because we don’t really know very much about how customers behave) so the first thing is to produce a weekly report.
If we just wrote something, it might look (part pseudocode, part JavaScript) a bit like this:
function run_weekly_report()
{
var db_connection = db.open("salesserver", "production_weblog");
var web_activity_data = db_connection.execute("select * from webclickstable");
var report_text = "";
var people_clicked = 0;
var people_bought = 0;
while(!web_activity_data.eof)
{
if(web_activity_data["button_clicked"] == "offerbutton") people_clicked++;
if(web_activity_data["button_clicked"] == "purchasebutton") people_bought++;
}
report_text = "Number of people who clicked on offer " + people_clicked + ". ";
report_text += "Number of people who took up offer " + people_bought + ". ";
report_text += "Which is a hit rate of " + (people_bought/people_clicked) + " %. ";
send_email("product_manager@gristleandflint.com",report_text);
db_connection.close();
}
Not beautiful, but there’s plenty of code around that’s much worse than this. And, for a simple app, that might never change, this may even be acceptable. If knowing the success rate of this promotion represents a market opportunity for Gristle and Flint (i.e. running this report allows promotional terms to be changed quickly but ceases to be of strategic value soon after) then we find that even the great Martin Fowler agrees this is the way to go, because it passes his Design Stamina Hypothesis.
But, alas, not much commercial software is truly here-today-gone-tomorrow opportunistic. It can seem that way at first, but useful software often becomes strategic before we know it, and suddenly we find ourselves having to start thinking about extending it, supporting it, testing it, scaling it, and so on. And here lies the root of all problems in software development, for it’s at this precise point where infinite possibilities arise. One developer might see the twenty lines of code above as being precisely what is required, no more, no less, whereas an architect might leap to design patterns, abstractions and reporting frameworks that cater for eventualities as yet undreamt of. Both are right and both are wrong, it all depends on when you ask the question - as Maximus Decimus Meridius said in Gladiator “What we do in life echoes in eternity”. He also said “At my signal, unleash hell”. The code could be released as-is, or release could be delayed as it’s worked and reworked. Only as time passes will we really know its strategic importance. The more strategic it is, the more likely that a hasty deployment will unleash hell as we try to keep up with changes to it. The less strategic it is, the more likely that we introduce unnecessary complexity by overdesigning it. With just twenty lines of code any overhead is manageable, but at twenty thousand or two hundred thousand these differences become significant.
In this article, we’ll look at the FP features discussed last time to see how they allow early release of software but still provide for strategic change without too much of that architectural overhead.
The Prefactor Clean-up
First we need to clean that code up a bit.
In FP, a pure function is one that only depends on the arguments passed in. A quick look at our function shows that it’s tightly-coupled to quite a lot of things that aren’t passed in, notably some database details: server name, database name and table name. The answer therefore is to pass in any dependencies. In the non-FP world we just call this Dependency Injection. If we do this we get:
function run_weekly_report(report_data)
{
var report_text = "";
var people_clicked = 0;
var people_bought = 0;
while(!report_data.eof)
{
if(report_data["button_clicked"] == "offerbutton") people_clicked++;
if(report_data["button_clicked"] == "purchasebutton") people_bought++;
}
report_text = "Number of people who clicked on offer " + people_clicked + ". ";
report_text += "Number of people who took up offer " + people_bought + ". ";
report_text += "Which is a hit rate of " + (people_bought/people_clicked) + " %. ";
send_email("product_manager@gristleandflint.com",report_text);
}
// Run the report
run_weekly_report(webdata);
Of course there still has to be a function, or configuration file, somewhere that creates the report_data record-set, but already run_weekly_report() has become easier to test because we can send it report data from anywhere we like, as long the data has the right column names (which is, in effect, the contract for this function). It’s not an entirely pure function yet, because there’s still an external dependency for that email, but it is purer than it was.
One minor but significant point is the name of our function. It may be a weekly report today but if Gristle and Flint aspire to understand customers better over the long term (i.e. it underpins their strategy) then the information it provides is vital to making good business decisions. Tying the word weekly to it doesn’t feel right, and it’s a very IT name to use. Let’s give it a bit of branding and decouple the name from any sense of timing (so, for example, it would make just as much sense as a daily report). Let’s call it MarketWatch. This may sound somewhat specious in relation to functional programming, but it’s not. FP is all about focusing on composability not delivering-exactly-what-the-business-wants-in-the-exact-order-they-asked-for-it. A component named run_weekly_report() isn’t as semantically composable as market_watch() (even though technically it is). But we’re wandering into Domain Driven Design territory here so let’s leave that notion for another time.
The Stakeholder Situation 1
As a function market_watch() is pretty fixed in what it does. It takes some data, performs two simple sum operations on it, and emails out a result. If business stakeholders want more involved and extensive reports, we could keep changing the guts of the function, but that could make it increasingly hard to change in future if these are done in the conventionally reactive way.
FP though gives us a way to inject report logic just as we did data using Lambda Functions. Last time we touched on three categories of lambda functions: find, fold and map. What we’re doing here is taking some data, identifying the actions a user took, and fold-ing up the result into a total.
So the core business logic could be written like this:
function sum_clicks(click_data)
{
var people_clicked = 0;
var people_bought = 0;
while(!click_data.eof)
{
if(click_data["button_clicked"] == "offerbutton") people_clicked++;
if(click_data["button_clicked"] == "purchasebutton") people_bought++;
}
return { clicked : people_clicked, bought : people_bought }
}
// And we can test this in isolation
var click_results = sum_clicks(testwebdata);
alert(click_results.clicked); // prints number of people who clicked
alert(click_results.bought); // prints number of people who bought
alert(click_results.bought/click_results.clicked); // prints success rate
Now our function can be written, using a lamdba function, like this:
function market_watch(report_data,report_logic)
{
var report_text = "";
var report_results = report_logic(report_data);
report_text = "Number of people who clicked on offer " + report_results.clicked + ". ";
report_text += "Number of people who took up offer " + report_results.bought + ". ";
report_text += "Which is a hit rate of " + (report_results.bought/report_results.clicked) + " %. ";
send_email("product_manager@gristleandflint.com",report_text);
}
// Run the report
market_watch(webdata,sum_clicks);
The contract for market_watch() is now that it takes some data and some logic to be run on that data. As long as the data has a field for button_clicked, and the logic returns an object with clicked and bought attributes, the rest is all changeable. If two weeks after launch the business needs MarketWatch to only show weekday activity we can simply create a new function:
function sum_weekday_clicks(click_data)
{
var people_clicked = 0;
var people_bought = 0;
while(!click_data.eof)
{
if(click_data["day"] != "Saturday" && click_data["day"] != "Sunday")
{
if(click_data["button_clicked"] == "offerbutton") people_clicked++;
if(click_data["button_clicked"] == "purchasebutton") people_bought++;
}
}
return { clicked : people_clicked, bought : people_bought }
}
// Run the new report
market_watch(webdata,sum_weekday_clicks)
// Run the old report
market_watch(webdata,sum_clicks);
It’s still not as elegant as it could be, but you can see signs of composability starting to emerge.
The Stakeholder Situation 2
One problem is that sum_clicks() and sum_weekday_clicks() are very similar, and we could be in danger of creating so many variations in business logic that any benefits of composability are lost under a sea of stakeholder-driven functions to maintain. What we need is the ability to create our summing function variations from the same root. In effect, a function clever enough to create summing functions to order. This is where closures come in.
This function
function click_summary(excluded_days)
{
var people_clicked = 0;
var people_bought = 0;
if (excluded_days == undefined) var excluded_days = [];
return function(click_data)
{
while(!click_data.eof)
{
if(excluded_days.indexOf(click_data["day"]) != -1)
{
if(click_data["button_clicked"] == "offerbutton") people_clicked++;
if(click_data["button_clicked"] == "purchasebutton") people_bought++;
}
}
return { clicked : people_clicked, bought : people_bought }
}
}
supersedes both sum_clicks() and sum_weekday_clicks(), doing the job of both and allowing for any days to be excluded, as required by the business.
// Run the old report
market_watch(webdata, click_summary());
// Run the new report
market_watch(webdata, click_summary( [ "Saturday", "Sunday" ] ) );
It’s not hard to see how, after further cleanup, click_summary() could be made quite sophisticated indeed.
The Refactor Cleanup
Over time we might find we are running a lot of different profiling models against our customer click data. The numbers of customers that read the offer and the number of customers that took it up only tells us so much, even after we can focus in on specific days of activity. There’s referring sites, keyword matches in search engines, geographic location, landing pages, exit pages, previous visits, time on site, pages per visit and all manner of ways to slice and dice the data. At some point we’re also going to want to correlate this data with what we know about pre-existing customer activity elsewhere (spending patterns, loyalty, etc). Closure generating functions certainly help but as they get more sophisticated there’s a risk to readability. If our development team changes over time, we need a way to make the market_watch() source code more readable too. This helps maintain efficiency but also reduces the semantic gap between what’s going on in the code and what the business ask for. Communication issues lead to business frustration with IT and that’s naturally something be aware of around strategic initiatives like market_watch(). Currying is a useful technique here.
Here’s a simple extension to JavaScript that allows any function to be curried:
Function.prototype.curry = function()
{
var method = this;
var args = Array.prototype.slice.call(arguments);
return function()
{
return method.apply(this, args.concat(Array.prototype.slice.call(arguments)));
}
}
At this point it’s not crucial to understand the mechanics of currying, only what it can do for readability. Currying allows us to take a function like market_watch(some_data,some_logic) and return a function that represents a partial computation of it. So rather than write:
market_watch(website_data, click_summary());
market_watch(website_data, page_summary());
market_watch(website_data, location_summary());
We can write:
market_report = market_watch.curry(website_data);
market_report(click_summary()); // report of click activity
market_report(page_summary()); // report of page access etc..
market_report(location_summary()); // report of where customers came from etc..
Thereby creating an abstraction of market_watch() called market_report() that always uses the same data, but without having to write another function. Currying is really just a fancy form of closure, because it’s returning a version of market_watch() that retains access to its lexical state, but what it allows us to do is begin to create a domain specific language(DSL) for (in this case) customer activity analysis.
The Giant Leap
Things are certainly slicker but market_watch() still has a couple of issues:
function market_watch(report_data,report_logic)
{
var report_text = "";
var report_results = report_logic(report_data);
report_text = "Number of people who clicked on offer " + report_results.clicked + ". ";
report_text += "Number of people who took up offer " + report_results.bought + ". ";
report_text += "Which is a hit rate of " + (report_results.bought/report_results.clicked) + " %. ";
send_email("product_manager@gristleandflint.com",report_text);
}
Although the business logic has been extracted from the guts of our function, the report formatting and the sending of the email are still, in effect, hard coded. If we remove them and apply an appropriate level of abstraction we can make the whole of market_watch() composable.
Continuations are a way to hook functions together to make this possible and, like lambda functions, involve passing more functions in:
function market_watch(report_data,report_logic,report_format,report_issue)
{
var report_results = report_logic(report_data);
if(report_results) report_issue(report_format(report_results));
}
function report_format_basic(report_results)
{
var report_text;
report_text = "Number of people who clicked on offer " + report_results.clicked + ". ";
report_text += "Number of people who took up offer " + report_results.bought + ". ";
report_text += "Which is a hit rate of " + (report_results.bought/report_results.clicked) + " %. ";
return report_text;
}
function issue_email(report)
{
send_email("product_manager@gristleandflint.com",report_text);
}
// Run the new report
market_watch(webdata,click_summary(),report_format_basic,issue_email);
// Which means a variation could easily be
market_watch(webdata,click_summary(),report_format_pdf,web_publish);
This example uses the continuation passing style to call market_watch() providing all the (potentially changeable) information required, leaving it only to coordinate getting the job done. Our path toward creating a DSL is all but complete. In a Ruby-like language, with softer syntax needs, that last line could easily be:
RUN market_watch REPORTING ON click_summary AGAINST webdata IN report_format_pdf ISSUED AS web_publish
The Business Critical
Our journey is nearly over. So far we’ve looked at how lambda functions, anonymous functions, closures, currying and continuations can help structure code in such a way as to make it look and feel like things the business talk about. The more we do this, the easier code is to understand and maintain, and the less of a gap there is between business and technology. Reducing the gap makes us agile and our code base more supple. The last step in this journey doesn’t come directly from the business. It’s how we wrap up all this elegant code to ensure it delivers a good service.
And we’re back to monads.
When take our webdata and string together click_summary(), report_format_basic() and issue_email() to process it, things can go wrong in any of these functions. If webdata is corrupt then click_summary() will fail, meaning report_format_basic() will fail too. We could add a lot of defensive coding into the functions themselves but that goes against the philosophy of FP because it threatens their future composability. What we need is a universal way to handle exceptions that all our functions can adhere to that won’t impact the others. Here’s a Maybe monad, so called because it maybe contains the expected result, but can safely deal with exceptions too.
function Maybe(contents, worked)
{
this.contents = contents;
this.worked = worked;
}
function Container(something) { return new Maybe(something, true); }
var Nothing = new Maybe(undefined, false);
Maybe.prototype.bind = function(f) { if(this.worked) return f(this.contents); else return Nothing; };
As you can see the Maybe monad is just a function object that has two properties: contents (what it contains) and worked (whether the operation to generate contents worked or not). This isn’t the standard naming convention, I’ve used these to make using it more readable. Next we declare a function object Container, which is just an instance of Maybe and will contain our market_watch() execution, and another instance called Nothing containing (not surprisingly) a contents of nothing and a worked of false. Nothing is our universal we-hit-a-problem return value.
The Maybe function is then extended with a method called bind, which allows us to assign other functions to contents (and check whether worked is true or not). Bind is standard for Monads and there are other methods but to keep this example cleaner we can ignore these as we don’t need them for market_watch(). To run it we simply do this:
function market_watch(report_data,report_logic,report_format,report_issue)
{
var market_watch = Container(report_data).bind(report_logic).bind(report_format)
if(market_watch.worked)
report_issue(market_watch.contents);
else
report_issue("There was a problem generating the report.");
}
// Run the report
market_watch(webdata,click_summary(),report_format_basic,issue_email)
If any of the bind_ed functions fail (i.e. return Nothing) each subsequent bind will automatically pass this up the chain, with _no side effects. The only rule being that, of course, each function must return a Maybe monad typed result or a Maybe monad typed Nothing. A compliant report_format_basic() might look like this:
function report_format_basic(summdata)
{
if(summdata.clicked && summdata.bought)
{
var report_text = "";
report_text += "Number of people who clicked on offer " + summdata.clicked + ". ";
report_text += "Number of people who took up offer " + summdata.bought + ". ";
report_text += "Which is a hit rate of " + (summdata.bought/summdata.clicked) + " %. ";
return Container(report_text);
}
else
return Nothing;
}
Although it might seem like a bit of a hassle to make functions Monad-friendly, it does increase their composability substantially. That bind function can be very powerful. In this example all it does is pass a Container result or a Nothing up the chain, but it’s just a function so it can do whatever you want.
Summary
So that’s a brief introduction to what Functional Programming is all about. As I said at the beginning, there is very much more to this topic than I have covered here, but I hope it piques enough interest for further exploration.
Many languages support FP techniques these days (C# got closures in 2.0 for example) so it’s not like you have to go pure FP with something like Haskell or Lisp to try ideas out. Not that I would necessarily advocate that anyway. In the world of corporate software development there’s only really one goal, and that is to make customers happy (or at the very least for IT not to be among the top five things the business thinks hold them back). FP techniques help close the gap between what the business asks for and what they really require, by allowing for safer refactoring and improving composability. Purists will also add that FP approaches are becoming necessary in a world of multi-threaded, multi-core computing, where imperative programming is likely to find its natural limits.
If both of these things are true then FP promises a great deal for the future because it does for the bottom-up view of the world what top-down approaches like SOA and EDA have been attempting for years, except that the results are immediately tangible because they are delivered by developers and don’t require the kind of cost and organisational overhead that architecture-led initiatives so often come with.
Further Reading
-
Nice Introduction to Functional Programming on defmacro.org
-
Dustin Diaz’s nice intro to Currying in JavaScript from Feb 2007
-
Another look at Currying by Svend Tofte