Spring Batch 2.0 -Basic Concepts
There is always a healthy debate when talking Java and batches. When I heard Spring Batch, I had to try it out. On a previous project, many eons back, I did some batch processing in Java. What hurt me there (after a lots of optimizations) was a call to another persons module. His module happily loaded up an entity bean. You can guess where that ended. Next release I went through the code and replaced the entity bean calls with ONE update SQL statement. That fixed things. I was processing 200k records in 15-20 minutes, with an extremely small memory footprint. Even this I could reduce further if I tuned another module. But the performance was deemed enough and we moved on.
What I personally felt from that experience was the need of a decent Java-based Batch processing framework. Of course having this does not mean use Java for batches. Sometimes for bulk processing doing it in the database may be the right approach.
In this blog I want to go over Spring Batch processing. We will start off with some definitions.
Job - A job represents your entire batch work. Each night you need to collect all of the 1)credit card transactions, 2)collect them in a file and then 3)send them over to the settlement provider. Here I defined three logical steps. In Spring Batch a job is made of up of Steps. Each Step being a unit of work.
Step – A job is made up of one or more steps.
JobInstance - A running instance of the job that you have defined. Think of the Job as a class and the job instance as your , well object. Our credit card processing job runs 7 days a week at 11pm. Each executions is a JobInstance.
JobParameters - Parameters that go into a JobInstance.
JobExecution - Every attempt to run a JobInstance results in a JobExecution. For some reasons Jan 1st, 2008 CC Settlement job failed. It is re-run and now it succeeds. So we have one JobInstance but two executions (thus two JobExecutions). There also exists the concept of StepExecution. This represents an attempt to run a Step in a Job.
JobRepository - This is the persistent store for all of our job definitions. In this example I setup the repository to use an in-memory persistent store. You can back it up with a database if you want.
JobLauncher – As the name suggests, this object lets you launch a job.
TaskLet - Situations where you do not have input and output processing (using readers and writers).
ItemReader - Abstraction used to represent an object that allows you to read in one object of interest that you want to process. In my credit card example it could be one card transaction retrieved from the database.
ItemWriter - Abstraction used to write out the final results of a batch. In the credit card example it could be a provider specific representation of the transaction which needs to be in a file. Maybe in XML or comma separated flat file.
ItemProcessor - Very important. Here you can initiate business logic on a just read item. Perform computations on the object and maybe calculate more fields before passing on to the writer to write out to the output file.
Comments