In order to follow below examples, create a large CSV file using this Utility Java Program, this will produce a CSV file which is approximately 350MB in size. If you want to produce a file with different size for testing purpose, you can tweak the for loop inside this utility code (line number 35) or if you have any other large file, that is fine too.

First, lets look into the memory related problems, say if you try to load all the file content into memory where JVM doesn't have sufficient memory, then it will throw java.lang.OutOfMemoryError. Consider below example, it is loading all the lines of the file into JVM memory before it started using those lines.

public void loadFileIntoMemory(String fullyQualifiedPath) throws IOException { List<String> allLines = Files.readAllLines(Path.of(fullyQualifiedPath)); allLines.forEach(System.out::println); }

If you run above program with max heap as -Xmx256m it will throw OutOfMemoryError. So, it is important to understand memory implications and choose the right option to deal with it.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

Before going to look at different options to process large files with memory efficiency, ask yourself the following questions.

Note: Note that the default buffer size in BufferedReader is 16 bytes (char array with size 8192) and you can change that if you wish to, and pass new size as an argument when creating BufferedReader using constructor where second argument is character array buffer size new BufferedReader(Reader, int).

public void processFileUsingStreams(String fullyQualifiedPath) throws IOException { System.out.println("File size : "+ Files.size(Path.of(fullyQualifiedPath))/1024/1024 + "MB"); Runtime runtime = Runtime.getRuntime(); System.out.println("BEFORE ---- Total memory : "+ (runtime.maxMemory() / 1024/ 1024) + "MB , free memory : "+ (runtime.freeMemory()/ 1024/ 1024) + "MB"); long start = System.currentTimeMillis(); try (InputStream inputStream = new FileInputStream(fullyQualifiedPath); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));) { Stream<String> lines = bufferedReader.lines(); lines.forEach(System.out::println); } long end = System.currentTimeMillis(); System.out.println("AFTER ---- Total memory : "+ (runtime.maxMemory() / 1024/ 1024) + "MB , free memory : "+ (runtime.freeMemory()/ 1024/ 1024) + "MB"); System.out.println("Total processing time (ms) : "+(end - start)); }

When you above program, you might notice that memory usage is well under control as we are reading it in a streaming fashion and discarding once we read a line and print it.

File size : 335MB
BEFORE ---- Total memory : 128MB , free memory : 125MB

... discarding printing lines on the console

AFTER ---- Total memory : 128MB , free memory : 55MB
Total processing time (ms) : 6312
File system at operating system level are based on blocks, meaning that files and file parts split up a partition into blocks. This is why using an optimal buffer size in buffered reader will give good performance.

If you would like to add buffer at Stream level, we can use BufferedInputStream and pass FileInputStream as underlying input stream. Example code looks like below.
BufferedInputStream inputStream = new BufferedInputStream(new FileInputStream(fullyQualifiedPath); BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
public void processFileUsingScanner(String fullyQualifiedPath) throws IOException { System.out.println("File size : "+ Files.size(Path.of(fullyQualifiedPath))/1024/1024 + "MB"); Runtime runtime = Runtime.getRuntime(); System.out.println("BEFORE ---- Total memory : "+ (runtime.maxMemory() / 1024/ 1024) + "MB , free memory : "+ (runtime.freeMemory()/ 1024/ 1024) + "MB"); long start = System.currentTimeMillis(); try(Scanner scanner = new Scanner(new FileInputStream(fullyQualifiedPath))) { while(scanner.hasNext()) { System.out.println(scanner.nextLine()); } } long end = System.currentTimeMillis(); System.out.println("AFTER ---- Total memory : "+ (runtime.maxMemory() / 1024/ 1024) + "MB , free memory : "+ (runtime.freeMemory()/ 1024/ 1024) + "MB"); System.out.println("Total processing time : "+(end - start)); }

Output

File size : 335MB
BEFORE ---- Total memory : 128MB , free memory : 125MB

... discarding printing lines on the console

AFTER ---- Total memory : 128MB , free memory : 51MB
Total processing time (ms) : 10148
Although FileChannel belong to Java NIO package(non-blocking I/O), FileChannel always runs in blocking mode.
public void processFileUsingFileStreamFileChannel(String fullyQualifiedPath) throws IOException { System.out.println("File size : "+ Files.size(Path.of(fullyQualifiedPath))/1024/1024 + "MB"); Runtime runtime = Runtime.getRuntime(); System.out.println("BEFORE ---- Total memory : "+ (runtime.maxMemory() / 1024/ 1024) + "MB , free memory : "+ (runtime.freeMemory()/ 1024/ 1024) + "MB"); long start = System.currentTimeMillis(); try (FileInputStream fileInputStream = new FileInputStream(fullyQualifiedPath)) { FileChannel fileChannel = fileInputStream.getChannel(); ByteBuffer byteBuffer = ByteBuffer.allocate(8 * 1024); while(fileChannel.read(byteBuffer) != -1) { String str = new String(byteBuffer.array()); System.out.println(str); byteBuffer.clear(); } } long end = System.currentTimeMillis(); System.out.println("AFTER ---- Total memory : "+ (runtime.maxMemory() / 1024/ 1024) + "MB , free memory : "+ (runtime.freeMemory()/ 1024/ 1024) + "MB"); System.out.println("Total processing time (ms) : "+(end - start)); }

Output

File size : 335MB
BEFORE ---- Total memory : 128MB , free memory : 125MB

... discarding printing lines on the console

AFTER ---- Total memory : 128MB , free memory : 79MB
Total processing time (ms) : 4920

In the above example, we can obtain FileChannel using as well, below is sample code snippet for that.

RandomAccessFile randomAccessFile = new RandomAccessFile(fullyQualifiedPath, "r"); FileChannel fileChannel = randomAccessFile.getChannel();

Summary

All the above methods discussed here solves Memory related issues, and methods which uses a buffer as an intermediate storage in memory provides good performance. Another important factor to keep in mind is that, performance bottlenecks may come from underlying Disk I/O or if you reading a file from a Network File System (NFS), then network speed also may contribute to the overall performance.

Above examples source code can be found at GitHub link for Java code