Often I find myself writing Python scripts to iterate over the files in a directory for fun and profit. This is so common that I wrote a simple helper function in Auxly called walkfiles()
. The performance of walkfiles()
always felt kinda sluggish, especially when compared to blazing fast utilities like fd. Only recently did I learn about os.scandir()
in the Python standard library which is essentially a more performant os.listdir()
. After some benchmarks, the speed improvement from os.scandir()
is impressive. Here's the results using Python 3.7.2 on a decently spec'd Windows 10 laptop:
- Recursively iterating though all 107,000 files under a directory using
walkfiles()
from Auxly 0.6.4 which was implemented usingos.walk()
:- Five samples (seconds) = 9.96875, 9.796875, 9.421875, 9.453125, 9.8125
- Average (seconds) = 9.690625
- Same test using refactored
walkfiles()
implemented usingos.scandir()
:- Five samples (seconds) = 2.359375, 1.90625, 1.859375, 1.71875, 1.734375
- Average (seconds) = 1.915625
- Using fd to iterate over the same 107,000 files:
- Five samples (seconds) = 0.46875, 0.5, 0.46875, 0.515625, 0.484375
- Average (seconds) = 0.4875
- Using same file set with regex to iterate over only the 3,600
.txt
files usingwalkfiles()
from Auxly 0.6.4 (before refactor):- Five samples (seconds) = 3.5625, 3.0625, 3.109375, 3.109375, 3.078125
- Average (seconds) = 3.184375
- Iterating over the 3,600
.txt
files with refactoredwalkfiles()
:- Five samples (seconds) = 2.109375, 1.953125, 1.78125, 1.796875, 1.8125
- Average (seconds) = 1.890625
- Using fd to iterate over the 3,600
.txt
files:- Five samples (seconds) = 0.03125, 0.03125, 0.0625, 0.078125, 0.046875
- Average (seconds) = 0.05
Wow, some big differences there. To summarize:
- Iterating over 107,000 files:
- Auxly 0.6.4
walkfiles()
= 9.690625 seconds - Auxly refactored
walkfiles()
= 1.915625 seconds - fd = 0.4875 seconds
- Auxly 0.6.4
- Iterating over 3,600
.txt
files in the original 107,000:- Auxly 0.6.4
walkfiles()
= 3.184375 seconds - Auxly refactored
walkfiles()
= 1.890625 seconds - fd = 0.05 seconds
- Auxly 0.6.4
The takeaways here are that os.scandir()
is fast. Not fd fast but still not bad. This refactored walkfiles()
will be included in an upcoming Auxly release.