Duplicate finder
-
I download without regard for HDD space or Ratio so I have accrued a lot of dupes from these massive packs people put together that often contain the same stuff.
Ive used CCleaner for finding dupes but now its packed with spyware and top heavy.. This tool is a lot better and more precise if people are looking to clean up their files like I just did...
https://github.com/qarmin/czkawka

-
@cp2000 I use Duplicate Cleaner Prime to find exact duplicates, but so many videos are uploaded in varying sizes and resolutions, sometime with a couple of seconds trimmed at the start or end, that it can't find them all.
I've been using the idle time on my PC to run my ENTIRE collection through Handbrake to reduce file sizes and fix various playback issues, and now have everything shrunk down to 720 (or smaller if the original was smaller.) Now that that's done I'm tidying up my folders, putting everything into studio groups and doing the same with amateur content. (People are so bad at uploading OF collections that you inevitably end up with a dozen duplicates of some files!)
The final stage is cataloging everything with Stashdb so that it will all be searchable by studio, performers, date, tags etc. It makes finding the remaining duplicates much easier as it creates it's own preview images and you simply list videos by length and then scroll through looking for matching preview images which appear side-by-side most of the time. Yesterday I managed to scroll through roughly 4000 HX videos and eliminate about 200 dupes, saving about 250Gb in about 20 minutes. It also generates perceptual hashes for every video, so is good at recognising dupes and has it's own duplicate cleaner, but i prefer to check manually.
-
That all sounds like a very good method. I just dont want to reduce size. I did that in the past and regretted it. I just expanded my HDD size library.
I do want to invest some time into proper organization though, I was doing well till my TB HDD got broken into 20TB / 14TB then it all went to shit, im trying to find a good method but just cleaning out my duplicates even some so far --- ive opened up half a TB of dupes.
Im hoping AI software will eventually better help organize things where you have a HDD completely filled with random files and it can organize appropriately. I know its coming.
I need to look at the Stashdb thing. I got digikam hoping it would help give a better interface since i usually just work at a file level in windows on the hard drives.
If anyone has better advice for all of this please chirp in. Thanks