KnowOS

From TunesWiki

Knowledge Operating System (KnowOS) is a project for a Dynamic system with persistent objects. It is written on Common Lisp by Mike Travers, JP Massar, and Jeff Shrager as an underlying framework tool for BioBike. See ILC 2005 Paper.


Contents

Intro to comments

Good intentions, but they repeat the usual bullshit about RDBMSs that come from OO-DBMS proponents.

Confusion: relations vs relationships

Citing from ILC 2005 Paper:

An Oracle instance is a virtual machine that allows users to interact with tables and relations as if these were persistent basic elements, and provides many of the above-described services to application programs that operate on tables and relations.

Tables are a 2D representation of relations, so speaking about tables and relations is nonsense. Probably, they are speaking about realations (tables) and relationships (one-to-one [1:1], many-to-one [M:1] and many-to-many [M:M]), represented in the database by foreign key-primary key referential integrity constraints.

On the flatness of relations

[..] and so that it becomes part of the persistent storage space - the files or tables. We feel that this is needlessly myopic; if one wants to work with complex, structured knowledge, one should not have to create it from flat representations [..]

I feel this is needlessly stupid. Again, tables are a 2D representation of relations; a relation is a variable which can assume as value sets of points in a n-dimensional discrete space (where n is the number of relation attributes), aka sets of tuples. What does it mean for a n-dimensional discrete space to be flat??

Note: Wikipedia has an article on flatness, in particular the section Flatness in mathematics seems relevant. Apparently (to me at least) there is no way that the concept can be stretched so that the phrase "relations are flat" can be meaningful. (Please, discuss this if you know better).

On chasing pointers in networks of complex objects

Whereas classical operating systems provide these services for simple data objects (files or tables), a KnowOS provides them for networks of complex objects.

.. and ..

Threading is the way that pointers are maintained between complex objects that have their own first-class status;

They miss completely the fact that the relational data model (rdm) was invented to overcame the deficiencies of hierarchical and network DBMSs (re-proposed in "modern" form as XML-DBMSs and OO-DBMSs). RDBMSs can cope with hierarchies and networks (graphs) better, in a more dynamic way, because they are not limited by a static representation of these structures as a web of pointers1.

The main difference is this: a RDBMS manages graphs in a declarative way, with relational algebra or calculi which are collection-oriented (aka data parallel or set-oriented); hierarchical and network DBMSs manage it procedurally2 iterating node by node, instead. Also the integrity of such graphs are mantained declaratively by RDBMSs by integrity constraints (again, to express these constraints you have all the power of relational algebra/calculi at your disposal).


  • 1) Codd's intent was to ban pointers from data management, on purpose (of course, I am speaking about the logical level, with which the rdm is concerned; at the physical (implementation) level pointers are admissible, if not necessary on conventional hardware).
  • 2) About XQuery see this discussion in TUNES vs the WWW.

On representing graphs with relations

Citing again from ILC 2005 Paper:

If one wants to work with complex, structured knowledge, one should not have to create it from flat representations [..]. Instead, complex objects should constitute the basic persistent storage entities that are the domain of the operating system.

rdm has no problems in representing, managing and assuring the integrity of such complex structures; on the contrary, it was invented exactly to overcame the deficiencies of hierarchical and network DBMSs (although SQL DBMSs, violating rdm theory, cause some problems in supporting graphs, above all admitting duplicates).

Supporting evidence

A pure mathematical theory of data

(Note: in a preceding version, this was point 4. cited in my e-mail Re: Google Summer of Code)

The following projects show clearly that rdm is a pure mathematical theory of data, with no connection whatsoever with a specific implementation; the equations:

relations = files

and

tuples = records

ARE NONSENSE.

Here program graphs (for program analysis) are represented as relations at the logical level and as (RO)BDDs (Reduced Ordered Binary Decision Diagrams) at the physical (implementation) level:

  1. GBDD - A package for representing relations with BDDs, from Regular Model Checking framework.
  2. Jedd: Java Extension for Decision Diagrams project and this example paper Jedd: A BDD-based Relational Extension of Java: whole program analyses based on BDDs; Jedd language abstracts BDDs as database-style relations and operations on relations, and provides Static type rules to ensure that relational operations are used correctly. Citing from this paper:
    In developing our approach, it soon became apparent that a simple strategy of providing a Java wrapper to interface with a BDD library was not a good solution, for many reasons. First, we found that the interface provided by the existing BDD libraries is very low level, and as we attempted to express several complex interrelated analyses, understanding and maintaining our code became difficult. Moreover, programming at such a low level was error prone, and errors in our code led to either the BDD library aborting, or worse, to incorrect results [shortcomings of procedural programming]. The implicit nature of the BDD representation made these errors difficult to track down. Furthermore, we found that it is quite difficult to match the memory management in Java with the reference-counter-based schemes employed in the BDD packages. Finally, we found that tuning a BDD-based algorithm requires profiling information about the size and shape of the underlying BDDs at each program step. We had previously developed some ad-hoc methods for visualizing this information, but a more automated approach was needed.
    Our solution, and the topic of this paper, was the development of: (1)Jedd, a language extension to Java, which provides a high-level way of programming BDD-based algorithms based on relations and operations on relations [advantages of declarative programming]; (2) an associated translator which automatically translates Jedd to Java code that efficiently interacts with back-end solvers; and (3) run-time support for memory management, debugging and profiling of BDD operations.
  3. bddbddb - A BDD-Based Deductive DataBase project: a tool for easily and efficiently specifying and querying program analyses, specified as programs in Datalog, a standard Declarative language for reasoning about relations.
    Also this example application paper Cloning Based Context Sensitive Pointer Alias Analysis Using Binary Decision Diagrams (.PDF) by John Whaley: this paper shows that pointer analysis, and many other queries and algorithms, can be described succinctly and declaratively using Datalog, a logic programming language [equivalent to Relational Algebra under some assumptions -- again, advantages of declarative programming].
  4. CrocoPat: A Tool for Simple and Efficient Relational Computation.

Other papers on graphs and trees in RDBMSs

  1. In PRACTICAL ISSUES IN DATABASE MANAGEMENT, by Fabian Pascal, Chapter 7, Climbing Trees in SQL: Data Hierarchies.
  2. TREE-STRUCTURED DATA: A RELATIONAL PERSPECTIVE by C. J. Date.
  3. Oracle documentation, from Oracle Life Sciences Platform (even biological applications, the same target as KnowOS) and Oracle Spatial, Locator, and Location-Based Services:
    1. Network Data Model White Paper (.PDF); don't be deceived by Oracle terminology: networks are represented as relations (nodes, links and paths relations).
    2. Graph Modeling and Analysis in Oracle (.PDF)
    3. Access and Analyze graphs in Oracle Spatial Network Data Model using Cytoscape (.PDF)
    4. Graph Data Representation in Oracle Database 10g: Case Studies in Life Sciences (.PDF)
  4. Graph Transformation in Relational Databases (.PDF) presented at GraBaTs'04.
  5. Storing and evaluating Horn-clause rules in a relational database (.pdf), about representing terms as trees in a RDBMS.
  6. Storage and Retrieval of First Order Logic Terms in a Database (.PDF) by Peter Gursky.
    Abstract. In this paper we present a storage method for sets of first order logic terms in a relational database using function symbols based indexing method of Discrimination trees. This is an alternative method to a published one, based on attribute indexing. This storage enables effective implementation of several retrieval operations: unification, generalization, instantation and variation of a given query term in the language of first order predicate calculus. In our solution each term has unique occurrence in the database. This is very useful when we need to store a large set of terms that have identical many subterms.
  7. Maintaining Transitive Closure of Graphs in SQL: notable, in this paper, the idea of Icremental Evaluation System (IES) that increases the expressive power of Relational Algebra, a pleasant theoretical result with pratical application (a good theory is pratical).
    ... Coming back to transitive closure, it cannot be expressed in relational databases using SQL without incremental evaluation but can be expressed in relational databases using SQL in the setting of an incremental evaluation system. In other words, avoidance of the cost of recomputation is not even the issue here, for the query is not even do-able in SQL without incremental evaluation in the first place! ...
  8. Incremental Maintenance of Shortest Distance and Transitive Closure in First Order Logic and SQL


--MaD70