Storage solutions
-
I was just curious to hear about some peoples storage solutions for local storage with hard drives. What people that have a lot of storage do? Raid drives? Backups? Some of these torrents are huge and I dont really want to delete things after.
Do most people here just keep getting external hard drives?
I lost my original 2TB drive of porn, now I have everything backed up. I guess there isnt really an easier option but I dont know what others do if anyone cares to share.
-
Server engineer here.
There's a few different options. Unfortunately there is no one perfect solution, they all have their pro's and cons.
RAID
- Good for performance (not needed for this application)
- Inflexible (need to rebuild entire array if you want to change it which means you need the same amount of free space somewhere else to offload the data while you do it + stuck to buying the same drives / size)
- Not great for power consumption (all drives must spin to access any data)
- Not great wear (all drives wear out at the same rate)
- Not a substitute for backups (it's only redundant on the disk level, doesn't protect you from failures in other hardware or accidental deletions etc)
- Not particularly efficient: you shouldn't use RAID 5 for large drives (4TB+) and even RAID 6 isn't great for larger drives, so then you need to look at RAID 10 (or 0+1) which only gives you half your effective disk space and you still need backups on top of that.
All in all RAID is not a great solution for (cold) data storage. I used to have 8x8TB + 8x4TB in RAID 6 for my data storage but I have moved away from that approach. Great while it works, shit when it inevitably starts to fail due to age.
ZFS
Probably better than RAID for this depending on how you set it up, but can still be a pain to manage. Won't go too deep into this since it's probably too advanced/technical for most people here.
Separate drives (JBOD / mergerfs)
Much better for data storage.
- Only wears out individual disks when you actually use them (particularly important when you also seed from these drives) - so also better for power consumption
- Flexible: you can add and remove drives at will, of any types and sizes (which means you can gradually upgrade and retire old disks), and you only have to offload a single disk worth of data at a time (as opposed to the whole array) + your computer can still treat the whole thing as 1 big disk without having to worry about what goes where (but you have the option if you want that)
- No inherent redundancy, you definitely still need backups
Commercial devices (Synology etc)
- Probably the best option for people who aren't very technical. Offers all of the above options in a user-friendly package with lots of features.
Note on backups
Ideally you want your backups to be in cold storage, and have them sync periodically. Better for disk lifespan (+ you don't want the backup disk to be worn out around the same time as the main disk), better for power consumption, gives you restore ability in case of accidental fuckups, also helps to protect you from freak accidents (power surges, lightning strikes, ...). Having them connected but spun down is good, having them completely disconnected is better (but of course requires manual interaction to sync).
Note on seeding
If you seed, designate 1 disk for it so you don't wear out your entire storage array.
I can go deeper into the subject if you have any questions but I don't want to overload you any more than I already have.
-
@Rapsey-0 Thank you so much - you just helped a lot with the whole RAID thing... I just bought the raid 2 drive enclosure for my personal files.... I like the raid system because it is noticeably faster, but Im going to have to have a third drive to backup my raid files.
Question - I thought if one of the two raid drives fail I can still access the data on one of them? Then put in a new fresh replacement for the other and it rebuilds? Correct?
But I can still access the data if one drive failed?
For my Fun files I just got a 6 stackable enclosure with each drive having a mirror so essentially 3 hard drives with 3 mirrors. I went with this because I thought about the wear of a Raid system so its interesting you pointed that out.
having one seeding disk is brilliant.... im looking at the seedboxes as a better option and not seed from any of my local drives.
Ill have to look more into this JBOD / mergerfs - is all data backed up on other drives so if one fails you havent lost anything?
Im mainly concerned about loss prevention which best practices the deeper i dug the more they are like yeah essentially you need THREE backups with one offsite or in another spot but I dont want todo cloud for any of this.
The cold storage extra drive has been my normal MO up to this point but the data has just grown exponentially.
I want to keep one cold storage disk with periodic updates.... possibly some sort of interim secondary backup till each is sync'd. Then with my new 2 disk 8tb RAID i love it just for my personal files and how quick it is, but I think im going to have to have a third disk cold storage with periodic updates.
Such great info from you thank you very much!!!
-
@cp2000 said in Storage solutions:
Question - I thought if one of the two raid drives fail I can still access the data on one of them? Then put in a new fresh replacement for the other and it rebuilds? Correct?
But I can still access the data if one drive failed?
It depends on the RAID level. For a 2-drive enclosure you only have two options:
RAID 0 aka striping
- Not "true" RAID, because RAID stands for Redundant Array of Independent/Inexpensive Disks and this one is not redundant)
- Half the data is written to one disk, half the data to the other
- You get the combined storage space of both disks
- Read speed is 2x that of a single disk (because you read half of the file from one disk and half from the other, in parallel)
- Write speed is 2x that of a single disk (because you write half of the file to one disk and half to the other, in parallel)
- If one of the two disks fails, all data is lost
RAID 1 aka mirroring
- Both disks contain the data full (they are identical)
- You only get half of the total diskspace (because the other half is an identical clone)
- Read speed is 2x that of a single disk (because you can read half of the file from one disk and half from the other, in parallel)
- Write speed is the same as a single disk (because the full file needs to be written to each disk, even though it's in parallel)
- If one of the two disks fails, you don't lose any data (just your 2x read speed advantage because now you are just using a single disk) and you can insert a new disk and have it rebuild from the other copy to regain redundancy
However, one often overlooked issue with RAID is that you will typically use identical disks (needed for optimal performance), often even from the same factory batch if you bought them together. The disks will have experienced exactly the same workload over ther lifespan under the exact same operating conditions (temperature, vibrations, ...). The result: when one disk fails due to old age, the other one will likely be on its last legs as well.
This is, by the way, why RAID 5 & 6 are problematic with large drives. The rebuild process on those is more complex and can easily put all the remaining disks under stress for 48 hours. At that point the chance of 1 or 2 more disks failing during the rebuild becomes a very real possibility.
For my Fun files I just got a 6 stackable enclosure with each drive having a mirror so essentially 3 hard drives with 3 mirrors. I went with this because I thought about the wear of a Raid system so its interesting you pointed that out.
This should be fine in practice but you would be wise to replace failed disks immediately. If you put it off for a few months you are playing with fire. I speak from experience (RIP my Tumblr archive).
having one seeding disk is brilliant.... im looking at the seedboxes as a better option and not seed from any of my local drives.
Even better if it's an SSD. The random IO performed by torrents (as opposed to sequential reads/writes) puts a lot more strain on mechanical hard drives. The arm inside then needs to constantly jump all over the place, as opposed to the more gradual motions of a pick-up needle.
Seedboxes are also a good option. Doesn't hog your home bandwidth, no risk of exposing your home IP if your VPN fails for some reason etc. But they do tend to be more expensive for the same storage space than a DIY solution.
Ill have to look more into this JBOD / mergerfs - is all data backed up on other drives so if one fails you havent lost anything?
There is absolutely zero redundancy with JBOD / mergerfs, it is just a convenience thing that lets you access data spread across multiple disks as a single volume, without having to think about which file is on which disk and which disks still has space available. A typical implementation will simply start filling up the first disk, then once it's full start filling up the second disk and so on.
Advantages are:
- You can simply add more disks as your storage needs grow
- Disks don't have to be identical, you can mix various sizes and even SSD's/HDD's
- Each disk will experience its own unique workload based on how often the files on it are accessed, so you don't get the "everything is near-death at the same time" problem
- If one disk starts showing early signs of failing, you only have to offload the data on that one disk to somewhere else in order to safeguard it (just like using separate individual disks - which this essentially is)
- If one disk does fail you only lose the data that was on that particular disk (again, just like using separate individual disks)
However you don't really get any of the RAID performance benefits (although you can access multiple files in parallel at increased speeds if those files happen to live on different disks). And, like I said, there is no redundancy whatsoever in this, so you need some kind of backup solution in addition to this.
JBOD stands for Just a Bunch Of Disks so that pretty much says it all.
Im mainly concerned about loss prevention which best practices the deeper i dug the more they are like yeah essentially you need THREE backups with one offsite or in another spot but I dont want todo cloud for any of this.
Multiple / offsite / offshore backups is something for companies who stand to lose millions from data loss. At that point you're in the terrain of "but what if an airplane crashes on top of the data center?". There are many data loss scenarios that are extremely unlikely but still have a non-zero chance. It's always a tradeoff, use common sense to decide what's best in your scenario. If you live in an area that's prone to flooding, maybe don't keep all your data-copies in the basement. If your collection is so precious to you that losing it would be a major heartache if your house were to ever burn down (or burglared or swept up by a tornado), maybe keep an extra copy somewhere else. For most people 1 backup copy will be enough to reduce the risk to negligible levels they can live with, 2 if you're paranoid. Anything more and we're talking about insurance aganst real-world disasters, not disk failures.
The cold storage extra drive has been my normal MO up to this point but the data has just grown exponentially.
I want to keep one cold storage disk with periodic updates.... possibly some sort of interim secondary backup till each is sync'd. Then with my new 2 disk 8tb RAID i love it just for my personal files and how quick it is, but I think im going to have to have a third disk cold storage with periodic updates.
You could have a main JBOD for your everyday use and a backup JBOD in an external enclosure which you connect and sync periodically. That would largely eliminate the hassle of juggling too many drives to manage comfortably.
One final note: these days you can buy very large consumer drives (20TB+, even 30TB with enterprise drives). While there are definitely use cases for them, always bear in mind the old addage about putting all your eggs in one basket.
-
@Rapsey-0 What program do you use for JBOD? Ive only used syncback for mirror / backups
Thanks for all the good info
my dual 8tb raid is setup as RAID1 so mirrored copies for my more important personal data.
-
@cp2000 JBOD is something that can be done through your motherboard BIOS (on most mid- to high-end ones) and most multi-disk storage enclosures also have it built-in. However doing it that way also has drawbacks, because it is done on a "hardware" level, below the filesystem. Your storage controller (whether it's the one in your motherboard or in the enclosure) doesn't have any notion of files. It just sees storage devices as having a bunch of 0's and 1's on them without any understanding of what the data means. This prevents it from providing some of the more advanced functionality you might want in your storage solution (file sync / duplication etc) and also means that a single file could potentially be split across more than 1 disk.
What you most likely want is a more high-level software solution. Personally I'm a Linux guy so my experience with Windows solutions is somewhat limited but as far as I know StableBit DrivePool is hands down the best option in that category. It is unfortunately not free (one-time $30 purchase) but then again, since you are on a torrent site you might not be averse to some piracy. I would highly recommend checking it out. You will probably find it's exactly what you're looking for.