We're developing a peer-to-peer database and search engine for decentralized media archives, called Sonar. All code is open source, go check it out!
For about four months, we've been busy developing Sonar, our upcoming basis for decentralized media archives. At its core, Sonar is a peer-to-peer database and search engine. Sonar is based on the Dat protocol for the peer-to-peer exchange of data (think BitTorrent and Git being merged) and uses the Tantivy search engine for full-text indexing of content.
A decentralized database
Modern P2P protocols like Dat make it relatively easy to share large amounts of data between losely connected peers. Exploring and searching through this data without returning to centralized indexing servers is, however, still hard to do. We may have heaps of data we can access, but if we want to search through or analyze them more closely we are mostly on our own.
With Sonar we're building a foundation for decentralized, yet easily explorable and searchable archives of media and other content. For now we have focused on developing a solid database abstraction on top of hypercores that seamlessly plugs into a search index. It is based on the so-called Kappa architecture where each peer keeps local indexes of the data its interested in. We're actively developing several modules to make this scale and work well for our constraints (see e.g. kappa-core@experimental, kappa-sparse-indexer, kappa-record-db).
When this is done, we will extend the system so that you can query the peers in your swarm and then selectively sync only the results – which basically makes the search index itself being replicated in the P2P network. At the same time, we're adding a straightforward integration of the database with hyperdrives for storing media files attached to database records – and a runtime for bots that can do work on your behalf, like extracting metadata or full text content from files.
Sonar is the continuation of previous research in this field, which we started with Archipel. Archipel was based on the last iteration of the Dat stack. It served well as a prototype on how a user-friendly app could look like. Sonar provides a much better basis, though, due to advancements in the Dat stack (Hypercore 8, Hyperdrive 10) and our own iterations on the technical needs for decentralized archives.
Sonar also includes a HTTP API server, and we intend to develop this further so that it serves as a bridge or proxy from the Dat based peer-to-peer network onto the traditional HTTP based web. This means that its very much possible to use Sonar as a toolset where the peer-to-peer exchange is limited to a small set of computers that are used for collaborating on a collection of material, while the content is served over HTTP for a larger audience. This also reduces the privacy implications of peer-to-peer based tools: While all content being exchanged is encrypted, the network of peers interested in similar topics however is currently exposed. There's likely solutions to this involving Mixnets like Tor, however this still needs to be developed.
Since starting with the Archipel prototype our main focus has been on providing tools for social movements, initiatives, collectors to create resilient collections of content. Groups that often lack resources for complicated archiving processes, are often socially decentralized and may be short lived. People and groups that don't have institutional funding to cater for long-term preservation. Like all these Wordpress blogs where people collect material like conference recordings, lectures, posters – and which are often enough gone a few years later. With Sonar, we want to fix these problems – while at the same time researching how the promise of seamless replication between losely connected peers can be put into actual application practice.
We're busy in getting a first alpha of Sonar ready soon. If you are at 36c3 in Leipzig at the end of this year we hopefully will have something to show around. We'll be around 1Komona, and will post an update soon on when & where you can meet us.
Sonar is a young open source project and we're looking for collaborators and anyone interested in contributing. Currently, docs are still lacking – but we're in the process of writing up more comprehensive docs in the Sonar book. To get started, check out the main Sonar repository. For now you'll need Node.js and NPM to get started. In the future, we'll provide binary releases also.
If things don't work or you're in doubt on how to get started, please open an issue.
If you have further questions, ideas or usecases please contact us, we'd love to talk.