Remove Duplicate Lines from a Large File
Design a system to remove duplicate lines from a very large file that cannot be loaded into memory all at once. The solution should efficiently handle files larger than available RAM using techniques like external merge sort.
Asked at:
Microsoft
Question Timeline
See when this question was last asked and where, including any notes left by other candidates.
All Regions
Early January, 2020
Microsoft
Mid-level
Remove duplicate lines from a very large file that can't be loaded into memory all at once, derived from External Merge Sort concept
Hello Interview Premium
Your account is free and you can post anonymously if you choose.