Introducing Sonar

We're developing a peer-to-peer database and search engine for decentralized media archives, called Sonar. All code is open source, go check it out!

For about four months, we've been busy developing Sonar, our upcoming basis for decentralized media archives. At its core, Sonar is a peer-to-peer database and search engine. Sonar is based on the Dat protocol for the peer-to-peer exchange of data (think BitTorrent and Git being merged) and uses the Tantivy search engine for full-text indexing of content.

A decentralized database

Modern P2P protocols like Dat make it relatively easy to share large amounts of data between losely connected peers. Exploring and searching through this data without returning to centralized indexing servers is, however, still hard to do. We may have heaps of data we can access, but if we want to search through or analyze them more closely we are mostly on our own.

With Sonar we're building a foundation for decentralized, yet easily explorable and searchable archives of media and other content. For now we have focused on developing a solid database abstraction on top of hypercores that seamlessly plugs into a search index. It is based on the so-called Kappa architecture where each peer keeps local indexes of the data its interested in. We're actively developing several modules to make this scale and work well for our constraints (see e.g. kappa-core@experimental, kappa-sparse-indexer, kappa-record-db).

When this is done, we will extend the system so that you can query the peers in your swarm and then selectively sync only the results – which basically makes the search index itself being replicated in the P2P network. At the same time, we're adding a straightforward integration of the database with hyperdrives for storing media files attached to database records – and a runtime for bots that can do work on your behalf, like extracting metadata or full text content from files.

Sonar is the continuation of previous research in this field, which we started with Archipel. Archipel was based on the last iteration of the Dat stack. It served well as a prototype on how a user-friendly app could look like. Sonar provides a much better basis, though, due to advancements in the Dat stack (Hypercore 8, Hyperdrive 10) and our own iterations on the technical needs for decentralized archives.

Use cases

Sonar currently has a command-line interface, a basic user interface, and client libraries for JavaScript and Python. We intend to develop this into an easily usable toolset on top of which application-specific frontends and integrations can be easily developed.

Sonar also includes a HTTP API server, and we intend to develop this further so that it serves as a bridge or proxy from the Dat based peer-to-peer network onto the traditional HTTP based web. This means that its very much possible to use Sonar as a toolset where the peer-to-peer exchange is limited to a small set of computers that are used for collaborating on a collection of material, while the content is served over HTTP for a larger audience. This also reduces the privacy implications of peer-to-peer based tools: While all content being exchanged is encrypted, the network of peers interested in similar topics however is currently exposed. There's likely solutions to this involving Mixnets like Tor, however this still needs to be developed.

Since starting with the Archipel prototype our main focus has been on providing tools for social movements, initiatives, collectors to create resilient collections of content. Groups that often lack resources for complicated archiving processes, are often socially decentralized and may be short lived. People and groups that don't have institutional funding to cater for long-term preservation. Like all these Wordpress blogs where people collect material like conference recordings, lectures, posters – and which are often enough gone a few years later. With Sonar, we want to fix these problems – while at the same time researching how the promise of seamless replication between losely connected peers can be put into actual application practice.

Going ahead

We want to thank NLNet foundation for the grant that allows us to develop Sonar, and the Prototype fund which funded our initial prototype (Archipel) that in many ways led us to where we are now.

We're busy in getting a first alpha of Sonar ready soon. If you are at 36c3 in Leipzig at the end of this year we hopefully will have something to show around. We'll be around 1Komona, and will post an update soon on when & where you can meet us.

Contributing

Sonar is a young open source project and we're looking for collaborators and anyone interested in contributing. Currently, docs are still lacking – but we're in the process of writing up more comprehensive docs in the Sonar book. To get started, check out the main Sonar repository. For now you'll need Node.js and NPM to get started. In the future, we'll provide binary releases also.

If things don't work or you're in doubt on how to get started, please open an issue.

If you have further questions, ideas or usecases please contact us, we'd love to talk.