The challenge that I had, was dealing with the heap size. Even with a FileChannel, I still had no way to hold the entire file in memory. So I came up with a pretty cool solution. It's simple but it seems to have hit the sweet spot for processing between holding a large object in memory vs. the cost of writing it out. I needed to do this because I was writing across a mounted drive and the writes were costly. With a local drive it wouldn't have been an issue.
So I had a loop that looked like this:
StringBuffer good = new StringBuffer()
StringBuffer bad = new StringBuffer()
records.each { rec ->
if (good.length() > 150000) {
ByteBuffer buf = ByteBuffer.wrap(good.toString().getBytes());
goodChannel.write(buf)
good = new StringBuffer()
}
if (bad.length() > 150000) {
ByteBuffer buf = ByteBuffer.wrap(bad.toString().getBytes());
badChannel.write(buf)
bad = new StringBuffer()
}
//read the next record and do my processing
if (goodRecord) {
good.append(chunk)
} else {
bad.append(chunk)
}
//Then print out whatever is left in case the Buffers aren't empty
if (good.length() > 0) {
ByteBuffer buf = ByteBuffer.wrap(good.toString().getBytes());
goodChannel.write(buf)
}
if (bad.length() > 0) {
ByteBuffer buf = ByteBuffer.wrap(bad.toString().getBytes());
badChannel.write(buf)
}
This worked extremely well for large file processing, given that I had to contend with the heap size. I wouldn't do this for most IO problems but for very large files, this is really effective.
No comments:
Post a Comment