At Home L. Donnerhacke (private) Draft February 2, 1997 HDDB - Hierarchical Distributed Data Base draft-hyper-j-hbbd-04.txt Status ~~~~~~ This draft is written to become a official Internet protocol in the best case. [Ideally this draft is going to form the basis for an official Internet protocol in the future.] Abstract ~~~~~~~~ This draft offers an enhanced DNS protocol to manage large and freely defined data. Scalability and fast access are the main design criteria. Contents ~~~~~~~~ TBW Background ~~~~~~~~~~ Spring 1994, Berlin: Heiko Schlichtig wrote down a list of criteria for a simple UseNet News Administration System (NAS)[1]. The main idea of this concept was to use TXT Resource Records of DNS to exploit the advantages of DNS-like scalability and decentralization. These records were designed to store data like their names, their archive location, their description line, their moderators and many more. Heiko came to the conclusion that DNS is not capable of managing manage large records and that the caching mechanism of DNS might be damaged and result in a refusal by the DNS to handle these queries. Late autumn 1994, Jena: The local web administration and the administration guy from Religio [2] spent a number of dinners together, because Religio became larger and larger ... unmanagable. The idea of a global keyword management system was born. A system of searchable keywords, which can easily used to index all documents like those systems for books called a systematical catalog in a typical library. Librarians are still unable to provide such a list of keywords for 3 reasons: - Traditional indexing of books requires an extensible and consistent list in each library. This list needs to be up to date. - Keywords must be available in the librarian's native language. - If such a list actually existed, some librarians might even lose their jobs. So the Jena administration checked Hyper-G [3]. But Hyper-G is not scalable, it offers a worldwide database of all possible links to any document. Besides, Hyper-G was not available as source code. The idea of a systematic catalog for every Internet resource came up and was called Hyper-J. HDDB is the main part of Hyper-J. Introduction ~~~~~~~~~~~~ HDDB has been invented in order to access huge sets of data indexed by a key derived from a hierarchical structure. The data ist accessed and stored by sets of servers, connected by the hierarchy of keys. Access to any entry should require no more than a bare minimum necessary to structure infomations and the data itself. Data are authoritivly stored on distributed servers including fallback to secondary servers. The main structure ideas are directly derived from DNS [4]. Thus, similar records for storing these structure informations are taken from DNS, like SOA, NS, TTL, SERIAL, CONTACT, RETRY and EXPIRE. The reading direction of the hierarchy elements is inverted due to cultural requirements. So the application NAS fits the naming structure of UseNet in a natural manner. Different from DNS, HDDB has an unlimited number of record types for each hierarchy. In order to guarantee a correct and consistent usage of self defined record types, hard syntax checks are provided. To specify such syntax definitions, regular expressions (regexp)[5] are used. Each request contains a key mask and a type mask, both are regular expessions. Any request should be resolved by exactly one server. If a server has have different matching hierarchy delegations, it should not resolve these delegations recursivly, it should return regular expessions. On the other hand, the server should provide as many pieces of informations as possible, including those from cache to the answer. Thus, a resolver can use more closely located servers for further requests or cached data. That's why, primary servers (SOA) only provide data to secondary servers and only secondary servers (NS) answer resolver requests. This should reduce international traffic and result in quicker response times. Both, the resolver and the server can set a maximum of returned answers. All unresolved answers are concated to an single regexp returned as the last answer. This summarizing regexp MUST not match already tranferred answers and SHOULD not match keys or types known to be nonexistent. Key hierarchies ~~~~~~~~~~~~~~~ In order to speed up search access to huge data, an indexing key is used. These keys are called first level keys. They reference data records. If the set of first level keys is too large, a second key will be provided. This key to the primary keys is called second level key. This method is used recursivly until each key points to a small set of keys or data. This way each set of data indexed by a key can be handled on a single server manually. This draft decribes a key management system with the following axioms: - Every 1st level key points to exactly one data record and vice versa. - Every n-level key points to exactly one hierarchy record consisting of m-level keys (m