There are many merits of blockchain-based technology, but the fact it can offer unparalleled levels of transparency and immutability is significant. Both concepts are generally limited in centralized approaches to data storage. The true benefit from distributed data hosting depends on what type of data you’re storing.
Science, in general, could take advantage of Ethereum’s blockchain-based technologies regarding distributed data storage. Transparency of raw scientific data could engender increased trust from the public. If data is auditable by anyone, nothing would appear to be “hidden” from the public. Scientific information should be made available regardless of whether or not the data supports a popular opinion. Scientists ideally seek the truth, but truth isn’t always good for business, making transparency a more trustworthy paradigm.
The immutability of blockchain-based storage would also eliminate the fear of fraud or forgery regarding the alteration of data sets. If a cryptographic hash of a file is stored on the blockchain’s public ledger, its original state is preserved. Any changes at all, no matter how minor, would create a wildly different hash of a file, so tampering could easily be weeded out.
Using a public blockchain specifically, like Ethereum could allow for ease of interoperability between scientific institutions, increasing efficiency and potentially leading to quicker scientific breakthroughs. If research facilities are using proprietary systems, the sharing of data becomes more complicated.
As Ethereum continues to grow as a platform, archiving existing data, along with uploading new research, will become easier. One issue with using Ethereum for this purpose is scalability, as raw scientific data can easily reach gigabyte levels. While Ethereum transaction logs don’t have a set character limit, and Ethereum’s blockchain doesn’t have a limit to how much information you could write to the log of a transaction, it can get prohibitively expensive to try to store large files directly on the blockchain due to gas costs, and would only serve to bloat the blockchain. Gas is basically the fee a miner charges to include your transaction into a block. Therefore, it isn’t often practical to store raw data on the blockchain itself.
That’s where Swarm comes in. Swarm is now available in the latest release of Geth (version 1.5 “let there bee light”) and it attempts to solve this problem of scalability regarding storage of large data files. To keep gas costs low and the blockchain debloated, Swarm would break up a file into very small chunks (currently set to 4KB) and spread them across participating Swarm nodes. Only a cryptographic hash is stored on the blockchain, and that hash allows an original file to be recompiled from the separate chunks. This is done via the Ethereum Name Service, which acts like a Domain Name Server (DNS). So, the blockchain contains very small amounts of relevant information, while anyone offering up their computer as a Swarm node would host the actual data.
Storing raw scientific data in this manner guards against many threats. Human threats are mitigated first and foremost, so no bad actors with personal agendas or suppressive governments would be able to whitewash or alter important data. Distributed file hosting would also eliminate standard centralization issues such as system crashes, physical damage, or intentional sabotage. Using a system like Swarm would even protect against natural disaster. For example, if the sun has a powerful enough coronal mass ejection (CME), a solar storm could disrupt or even destroy electronics across an entire continent.