1. Technology

Find and Remove Duplicate Files in Linux

duplicates-teaserIt might seem unnecessary to worry about duplicate files when you have terabytes of storage. However, if you care about file organization, you’ll want to avoid duplicates on your Linux system. You can find and remove duplicate files either via the command line or with a specialized desktop app.

Use the “Find” Command

duplicates-find-command

In case you’re not familiar with this powerful command, you can learn about it in our guide. By combining find with other essential Linux commands, like xargs, we can get a list of duplicate files in a folder (and all its subfolders). The command first compares files by size, then checks their MD5 hashes, which are unique bits of information about every file. To scan for duplicate files, open your console, navigate to the desired folder and type:

find -not -empty -type f -printf "%sn" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

This one-liner does the following:

find -not -empty -type f -printf "%sn" – looks for regular files which are not empty and prints their size. If you care about file organization, you can easily find and remove duplicate files either via the command line or with a specialized desktop app.

sort -rn – sorts the file sizes in reverse order.

uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 – prints only duplicate lines. In this case, names of duplicate files.

xargs -0 md5sum | sort | – sorts the MD5 hashes of scanned files.

uniq -w32 --all-repeated=separate – compares the first 32 characters of MD5 hashes and prints those which are duplicates.

Note that this command doesn’t automatically remove duplicates – it only outputs a list, and you can delete files manually if you want. If you prefer to manage your files in an application that offers more options at once, the next solution might suit you.

Employ dupeGuru

DupeGuru is a cross-platform application that comes in three editions: Standard (SE), Music and Picture. It’s designed to find duplicate files based on multiple criteria (file names, file size, MD5 hashes) and uses fuzzy-matching to detect similar files. Windows and OS X users can download the installation files from the official website, and Ubuntu users can pull dupeGuru from the repository:

sudo add-apt-repository ppa:hsoft/ppa sudo apt-get update sudo apt-get install dupeguru

duplicates-dupeguru-search

To search for duplicates, first add some folders by pressing the “+” button. Setting a folder state to “Reference” means that other folders’ contents are compared to it. Before clicking “Scan,” check the “View -> Preferences” dialog to ensure that everything is properly set up.

duplicates-dupeguru-preferencesIf you care about file organization, you can easily find and remove duplicate files either via the command line or with a specialized desktop app.

“Scan Type” varies across dupeGuru editions; in Standard, you can compare files and folders by contents and filename. Picture edition offers comparison by EXIF timestamp and “Picture blocks” – a time-consuming option that divides each picture into a grid and calculates the average color for every tile. In Music edition, you can analyze “Fields,” “Tags” and “Audio content.” Some settings depend on the scan type: “Word weighting” and “Match similar words” work only when you search for file names. Conversely, “Filter Hardness” doesn’t apply when you perform a “Contents” scan.

DupeGuru can ignore small files and links (shortcuts) to a file, and lets you use regular expressions to further customize your query. You can also save search results to work on them later. Apple fans will love the fact that dupeGuru supports iPhoto and Aperture libraries and can manage iTunes libraries.

duplicates-dupeguru-details

When dupeGuru finds duplicates, a new window opens with reference files colored in blue and their duplicates listed below. The toolbar displays basic information, and you can see more about every file if you select it and click the “Details” button.

duplicates-dupeguru-actions

You can manage duplicate files directly from dupeGuru – the “Actions” menu shows everything you can do. Select files by ticking the checkbox or clicking their name; you can select all or multiple files using keyboard shortcuts (hold Shift/Ctrl and click on desired files). If you’re interested in differences between duplicate files, toggle Delta Values. The results can be re-prioritized (so the files listed as dupes become references) and sorted according to various criteria like modification date and size. The official dupeGuru user guide is helpful and clearly written, so you can rely on it if you ever get stuck.

Naturally, it would be more practical if dupeGuru wasn’t split into three editions – after all, most users love one-stop solutions. Still, if you don’t want to use the find command, dupeGuru provides a neat and quick way to eradicate dupes from your filesystem. Can you recommend some other tools for removing duplicate files? Do you prefer the command line for this task? Tell us in the comments.

The post Find and Remove Duplicate Files in Linux appeared first on Make Tech Easier.



No Comments
Comments to: Find and Remove Duplicate Files in Linux

Recent Articles

Good Reads

The retail industry is ‍currently undergoing a significant transformation as brands strive ‌to find new and innovative ways to engage customers across multiple channels. This has led to the emergence of headless platforms in retail, a cutting-edge approach that separates the front end from the back end. This allows retailers to create custom, highly ‌responsive […]
Let’s talk about teeth. Yep, those shiny little soldiers in your mouth that help you chew, talk, and make goofy faces in the mirror. Now, you might not think about it, but those teeth are under attack every day. No, not from aliens or monsters, but from something far sneakier: cavities. They love sugar, sneak […]

Worlwide

Overview VipsPM – Project Management Suite is a Powerful web-based Application. VipsPM is a perfect tool to fulfill all your project management needs like managing Projects, Tasks, Defects, Incidents, Timesheets, Meetings, Appointments, Files, Documents, Users, Clients, Departments, ToDos, Project Planning, Holidays and Reports. It has simple yet efficient layout will make managing projects easier than […]
The retail industry is ‍currently undergoing a significant transformation as brands strive ‌to find new and innovative ways to engage customers across multiple channels. This has led to the emergence of headless platforms in retail, a cutting-edge approach that separates the front end from the back end. This allows retailers to create custom, highly ‌responsive […]
Let’s talk about teeth. Yep, those shiny little soldiers in your mouth that help you chew, talk, and make goofy faces in the mirror. Now, you might not think about it, but those teeth are under attack every day. No, not from aliens or monsters, but from something far sneakier: cavities. They love sugar, sneak […]
Gemstones have been a source of fascination for centuries due to their unique colors, properties, and potential to influence emotions and energy. In addition to their aesthetic value, gemstones have been highly regarded by many cultures for their alleged ability to attract positive energy and prosperity. This article​ will discuss the ⁤arrangement of different gemstones […]

Trending

Turquoise Jewelry is one of the ancient healing stones used for personal adornment and astrological benefits. The rare greenish blue-colored pectolite is celebrated for its enchanting powers among many crystal lovers. It is a hydrated phosphate of copper and aluminum that ranks 5 to 6 on the Mohs hardness scale. It is deemed a protective […]
Singapore is recognised globally as a prime destination for foreign investors. Its business structure is well-developed, and its tax system is favourable to business owners. The government has a strong support system for entrepreneurs and provides legal protection for intellectual property rights. All of these conditions create an environment that is ideal for Singapore company […]
2020 has been a year that represents aggressive and sustained volatility with a confluence of unexpected situations, including economic shifts and market disturbance confluence. The COVID-19 pandemic forces businesses to adjust their methods of operations to ensure survival. These adjustments become the trajectory and guidance of what 2021 should look like and what companies should […]
COVID-19 pandemic has affected Thailand’s economy and labor market. World Bank’s Thailand Economic Monitor predicted that it would take Thailand over two years to return to its pre-COVID-19 growth and domestic product output levels. Although the country has successfully curbed the pandemic tide over the last few months, the economy remains severely hit. Nevertheless, heavily […]