A proposal to facilitate information sharing on the internet

This is motivated primarily by my own usage patterns for information on the internet, and by reasonable extrapolation to the needs and motivations of others.

Statement of the problem

My starting point is the observation that the current non-system of information distribution is inadequate for present needs, and is rapidly advancing to "hopelessley inadequate".

The existing "gopher" and "archie" technologies are making the problem worse, by exploding redundant copies of and indexes to (possibly) identical documents; and adding to the exploding tower of Babel of servers for information about other services.

The Proposed Solution

This is a modest proposal to address these problems.Scaleability and simplicity are primary goals, and a strong secondary goal is to separate protocol (how it's done) frompolicy (what is done). Among the observations behind this proposal if that most of the information is either completely static (such as files) or changes relatively slowly (such as directories of files).

This is a client/server protocol, with a very simple client protocol, and a server protocol only slightly more complex. For reference, I'll refer to the servers as share servers. All the tough parts are matters of policy, not basic protocol, so within the scheme proposed, there is a very large space for exploration.

Data Types

There two basic data types to the procol; a documentand a header. A document is just a collection of bits, to be passed verbatim from server to client. Theheader is a short, structured collection of data about the document. My goal is that theheader be small and informative enough that interested parties will feel free to keep the header in it's entirety, with the freedom to snap off and discard the attacheddocument, secure in the knowlege that it can be retrieved from a server if need be. The minimum contents for theheader will run something like this:

One special type of document used by the protocol, which is a directory document, consisting of a collection of headerS for otherdocuments.

The Client Protocol

The client protocol has very few types of requests. Basicly they are

The client can address his request to any known or suspectedshare server for the information he wants. The key point is that any host, or many hosts, not only the one that nominally contains the document, and not only the one directly addressed, may respond with an offer to provide the data requested.

The offer will probably contain some estimate of the quality of service likely to be provided, and the client is free to bias the evaluation of the offers based on the timeliness of the offer and other information. No matter who or how many respond with offers to provide the data, the client is free to pick and choose which to ask to provide the bulk data, and once provided, to ask others to authenticate the accuracy of the information provided.

The Share Server protocol

The server protocol has a few more elements. In addition to responding directly to client requests, the servers attempt to act asbrokers to find a copy of the requested data, and aslibrarians to maintain a cache of frequently requested data.

The basic intent is that both requests for information and offers to provide it will cascade in a decentralized manner, much the same as internet routing information is exchanged now. Senescent information will gradually come to reside only on the original server (and die-hard mirrors) while any information that is in high demand will rapidly become ubiquitous.

Except for the ultimate "home" of any document, there will be no fixed points where the document is stored, and no fixed relationships between consumers and theimmediate providers of data.

Local sites will not have to establish and manage explicit mirrors, or guess will be of interest to local clients, and individual users will not be so motivated to hoard copies of "interesting" documents.

Authentication and Security of data

Is should be obvious that under this system, clients never know where their data comes from, so it's important that they verify that the data described in the header actually matches the data that arrives in the document.

It's important that the servers know and respect the wishes of whoever "owns" the document, so documents can be deleted or superseded.

For both of these reasons, I suspect that share servers and clients should use digital signatures to establish the authenticity of the information they deal in.

Scenario

Suppose WWW.MCOM.COM announces a new version of netscape, or the US government department of spinach research (a small office) announces its spinach-based cancer cure. Both can expect about a million requests within the next few days.

You know what happens now. How would it work under theshare protocol?

So, the first few minutes after the big announcement, local share servers all over the world would be asked a copy of that important document on spinach farming. Not knowing anything about it, the local servers would pass on the request in the approximate "uphill" direction. "Uphill "servers, seeing a wave of requests would pass the requests on, and also request copies for themselves. As soon as their copies arrived, they could begin serving the requests themselves. By the time the BIG wave of requests crests, many, many local servers will have copies. The lonely PC in the spinach research office may have only actually delivered one copy.