OK, a very nerdy post I admit, but recently in trying to manage lots of images photogrammetry and gaming engines, it's pretty clear that a lot of things breakdown, but here are some notes to help the "cheap" startup folks:
- If you have lots of binary data like images and videos that don't change alot, then the cheapest way to keep them is with cloud buckets in AWS S3 or Google Cloud buckets.
- Register your domain name with Google cloud so your bucket names are controlled by you, so for instance then
gs://tongfamily.comis owned by me and you don't have to worry about people copying it.
- Turn object versioning on with
gcloud set versioning gs://tongfamily.comand then it costs more, but you can't accidentally delete stuff from your buckets unless you really want to.
- For things like Reality Capture or Unreal Engine which are storing really big files, you have two choices, run them under Git LFS control, then all the blobs you have are kept there, you are going to have modify your .gitignore for things you don't really need like build artifacts such as compressed files that are used for display. But this works well when you want to revert things.
- However when you are doing this, remember that Git has limits like how many files can be in a single check-in, so if you are deleting or creating thousands of files per commit, then you need to slice those up.
- Git LFS is way more expensive than cloud buckets at $5/month for 50GB of storage but it is nice to have things versioned controlled. Also if you have two tools, you can synchronize their outputs. It is also going to download lots of data down if you are not careful whereas with buckets, you can control what you
- If you do not need this and just want to remember the output files, the Git LFS suffices.
Net, net what makes the most sense to me is that raw input files can live in a storage bucket, but if you are editing and changing things, then Git LFS while more expensive ensures you can revert.