Batch Processing and Grid Computing -- Commercial and Open Source Options


Explore four alternatives for Batch Processing and Grid Computing among the many options available from both Open Source and Commercial space.

Scheduling and Load Balancing are two very closely knitted concepts. Most use the term “Load Balancing” in the context of Network Load Balancing – for example, This wiki describes (network) load balancing in the context of a web server farm. But, what options are available today if what you need is a grid for analytic/scientific computations? We do a lot of batch processing in hedge funds and investment banking space – from (S)FTP file transfer to/from brokers (Deal/execution submissions/Allocations), to derivatives pricing/risk/stressing/Pnl calculations (Real time, Day-end, Month-end). Many firms, still, implement their infrastructure from scratch.

We can find from Wiki a precursory survey under two separate categories: “Job Scheduler” and “Grid Computing Software/Middle”.

The question remains – How do they measure up against what set of criteria?

Adding to the confusion, should you decide to build your own batch processing/grid infrastructure, many Open Source libraries are available. Many support Scheduling but not Load Balancing, and vice versa. Yet, some others are simply too immature or lack a following – you can tell from # downloads, # broken links and absence of documentation. For example, NGrid supports load balancing, but not scheduling., on the other hand, supports both scheduling and “Clustering”, but with specific limitations – the job must be coded in .NET, and it must implement the “IJob” interface (Less restrictive compared to NGrid where you need subclass from “GObject”).

The objective of this article is to explore what options we have from both Commercial and Open Source spaces.

We Don’t Need a Scheduler for Everything

Before we explore our options further, I want to first establish that while scheduling and load balancing are very closely knitted concepts, we do NOT need a scheduler for everything.

Real-time updates of derivatives sensitivities is an example where we don’t need a Scheduler
Fig 1. Real-time updates of derivatives sensitivities is an example where we don’t need a Scheduler

“Market Data Feed Adapter/Server” may be listening on a socket (Bloomberg Desktop API for example), and publishes arriving ticks to Message Bus, accessible from only the firm’s application within the intranet environment. Some calculation grid monitoring the message bus picks up the newly published market data, runs its calculation, and publishes the result back to the Message Bus. Clients, Desktop or Web, subscribes to updates from Message Bus. In this scenario, you do NOT need a scheduler – what you need is a Message Bus, RabbitMQ for example. Say for example your calculation grid is built in .NET, your primary concern should be to integrate jobs implemented in different languages: unmanaged C++, Java, Perl scripts, .NET.

Day-end/Month-end processing is an example where we DO need a scheduler
Fig 2. Day-end/Month-end processing is an example where we DO need a scheduler

Typical day end batch includes mark-to-market, Pnl and risk calculations, stressing/scenario analysis, aggregation of position level data to different levels (book/account/strategy/country levels…etc) – in this case, we need a scheduler.

Anatomy of a Complete Batch Processing Infrastructure with Load Balancing Capability

As mentioned, Wikipedia is a good starting point to get a grasp what tools are available today if you need batch processing and load balancing capability in your firm, or in the new application you’ll be building. In the following passage, we’d try to make detail comparisons using the following criteria:

  • Cost estimates and Time-to-Delivery (Case when you decide to build your own Scheduler/Grid)
  • Platform Compatibility
  • Scheduling facilities
  • Load Balancing facilities
  • Built-in ERP adapters
  • Built-in ETL commands (And Open Source libraries available, from the perspective of a .NET developer, if you build your own)
  • Persistence of execution status/timestamps, execution parameters and execution result (Actual Data)
  • GUI (Desktop/Web/Mobile)
  • Alert/Notification
  • Support for Change Management and Security Audits (Common requirement in Enterprise Computing)

The Comparison
Fig 3. The Comparison - “Open Source Stack” is if you decide to build your own Batch Infrastructure & Grid. BMC, Applied Algo, and Schedulix are Standalone Applications. (Click for larger image)


Yellow boxes are modules you’d need to build – i.e. components that are not included with the application. Green boxes are “Nice to Haves”.


We have explored four options available today.

  • Build your own (We’ve described also the number of modules you need to build in order to “Connect-the-dots” and the number of Open Source libraries available, in particular, Quartz.NET+RabbitMQ – from the perspective of a .NET Developer. They also have the most polished GUI – parent child jobs are displayed in Flow Chart Diagram).
  • BMC Control-M (Commercial Scheduler+Load Balancer, most expensive but with ERP adapters and built-in ETL commands).
  • Applied Algo ETL Suite (Commercial Scheduler+Load Balancer, best suited for anyone which does a lot of number crunching – their persistence mechanism automatically store processed data from FTP transfer to output from a Time Series Analysis is particularly geared for quantitative/scientific analysis. Applied Algo ETL Suite also bundled the most).
  • Schedulix (Open Source Scheduler+Load Balancer, with support contract available from independIT. Everything you need for General IT automation purposes. No built-in ERP adapters or ETL commands however.).

This article presented only four alternatives among the many options available from both Open Source and Commercial space. Readers are welcomed to submit additions via the comments below. Please, however, use the comments to make suggestions, not to market your product.


  • There are no comments yet. Be the first to comment!

Leave a Comment
  • Your email address will not be published. All fields are required.

Top White Papers and Webcasts

  • As the mobile enterprise marketplace expands and customer needs grow more diverse, Samsung recognizes that solution partners and developers play an essential role by continually innovating to meet their customers' needs. Samsung works to provide these developers and partners with the latest tools and resources needed to create these solutions. Read this program guide to learn how the Samsung Enterprise Alliance Program provides partners and developers with Samsung enterprise software development kits (SDKs) …

  • On-demand Event Event Date: September 23, 2015 The cloud is not just about a runtime platform for your projects – now, you can do your development in the cloud, too. Check out this webcast to learn how the cloud improves your development experience and team collaboration. Join Dana Singleterry, Principal Product Manager for Oracle Dev Tools, as he discusses how to simplify every aspect of the development lifecycle, including requirements gathering, version management, code reviews, build automation, and …

Most Popular Programming Stories

More for Developers

RSS Feeds

Thanks for your registration, follow us on our social networks to keep up-to-date