Hierarchical to Relational Databases
I find it interesting that the ideas and conflicts from political history are often reflected in the history of computer design. This article looks at the transition from hierarchical database model (wikipedia drawn 12/2015) to the relational model (wikipedia, drawn 12/15)
Political theories from ancient times through Feudal Europed tended to hold that mankind need to be organized and constrained by a hierarchical class structure. This structure had kings and emperors as the top. They ruled through a network of dukes and feudual lords. Merchants and manufacturers were seen as a troublesome middle class with the bulk of mankind belonging to a working class or peasant class.
Classical liberals began questioning the legitimacy of this structure during a period called "The Enlightenment." The American and French Revolutions posed a existential threat to this structure.
The first computers were, inadvertenly, designed around a hierarchical model. When you turn on a computer you are at a known state. Software engineers designed code to branch out from that known state.
Most computers organize files in a directory tree. The highest level of the tree is often called the "root." The root directory generally contains folders for software, users and other resources, which in turn have subdirectories. In Unix and DOS, you can navigate the tree with the "cd" command (cchange ddirectory. You can navigate down the directory by typing "cd name" where "name" is the name of the subdirectory. You can go up a tree by typing "cd ..". The command "cd /" brings you to the root. You can navigate to a known file by typing its full "path" in the tree. On my new server, this file lives in the "/var/www/html/resources/" directory.
You can find a path between any two resources on a computer by moving up and down the directory tree. My home directory on this computer is "/home/kevin". The path from my home directory to the html directory is "/../../var/html/"
As a tree becomes complex, it becomes desirable to place links between the subdirectories. Microsoft Windows calls these links "shortcuts." Unix calls them "symlinks."
The beautiful SVG graphic below (which doesn't exist yet) shows a directory tree with links.
[Insert Beautiful Graphic Here]
The topology of a directory tree with symlinks is quite interesting. As mentioned, there is a known path between each point in the directory tree.
People who spend a great deal of time navigating through directory trees with simlinks that there is an unlimited number of paths between points and the directory and that there isn't a logical reason why any of the directories in the tree couldn't be treated as root. The data does not need to be stored in a hierarchy.
The Wikipedia Articles I referenced tell us that IBM, and others, were developing hierachical databases in the 1960s. In 1969, an IBM Programmer named Edgar Frank Codd (1923-2003) realized that the links between records in the database were more important than the hierarchy. Codd penned out a theory using the terms "tuples" and "relations" [a "tuple" is an ordered list]. The term "relational database" refers to databases built with tuples (aka Tables) and links between the tables.
SQL (Structured Query Language) is one of many different query languages designed to access data in a relational database. SQL combines ideas from Edgar Codd with Set Theory which was developed a century earlier. SQL allows you to CREATE, ALTER and DELETE Table. The Data Modification commands in SQL let you INSERT, UPDATE and DELETE rows in tables. You can access the data with the SELECT command.
There were better languages on the market when I began programming, but SQL became the ANSI standard in 1986. The language does the job; So, we have to live with it.
Hiearchical v. Relational Design
It is tempting to talk about a conflict between hierarchical and relational database design, but I find this type of talk unproductive and believe that it is best to understand these two models and different perspectives of the same thing.
Data stored in a hierarchy can be transitioned into a relational database. Likewise it is easy to express a hierarchy in a relational table. One can do this simply by adding a parent relation in the table.
For example my web site irivers.com contains a very simple link directory. Each link belongs to a Link_Category. Each category has a unique name and includes a field called "parent" which points to parent category. I call the parent top.
I chose this structure simply because it is an easy to program and intuitive way to navigate the directory. I simply created a directory hierarchy in a relational table. There is nothing special about "top." To be frank, I wish I could funnel users into the Shopping Directory. This directory has some affiliate advertisers who will pay me if people shopped from the directory (hint, hint ... proceeds fund this site).
My little commercial example shows that, while relational concepts are more versatile than hierarchical design, there is not a conflict between the ideas. Adding symlinks to a hierarchy creates a relational design, and one can easily eexpress hierarchies in relational tables.
Cause and Effect
Some things in history seem to create natural hierarchies.
The theory of evolution presents an extremely interesting hierarchy.
We each had parents who had parents who had parents—through the generations.
There must have been some point at which our ancestors split from other members of the Hominidae family, which must have split from other species.
When we begin classifying the animals and plants that live today along with the fossil record of things living in the past, it is possible to break down life on this planet into a hierarchical tree of life with many of the splits between species taking place eons ago. The theory of evolution is but one of many natural hierarchies that we can find in history.
I was browsing through open source projects on github today. It is not uncommon for people working on a project to want to take the project in different directions. When this happens, they create a "fork" in the project. These forks create a natural hierarchical database.
Like simlinks, sometimes the forks merge back together. Sometimes minor forcks evolve into completely different projects or form the base of competing companies. All of the different Linux distributions are forks from Unix.
Philosophers like to wax philosophic about chains of cause and effect.
When people record history, the information they record seems to fall into a very clear hierarchical chain of cause and effect ... just like the forks in Linux. As history becomes complex and people discuss multiple causes for events, the talk veers from a hierarchical discussion to a relational discussion.
As the conversation evolves, we attempt to clarify history by creating outlines. For example, were I to write a book on programming, I would have to choose a chapter to be the first chapter ... but is a chapter by chapter reading of history the best way to go?
An encyclopedia is a book that presents knowledge of the past in alphabetical order. Online encyclopedias doen't even require an alphabetic component. Such an encyclopedia can be a loosely arranged collection of articles.
While we can use hierarchical structurs to help understand things and there even appear to be natural hiearchies in this universe such as the Tree of Life. I find it most productive to concentrate on the links between things and to see hierarchical structures as just one possible structure for links.
I believe that attempts to force things into hierarchies is destructive. Notably, I point to the efforts in Fuedalism to force society into a rigid class structure as destructive.
Object Relational Conflict
The final part of this piece will just bring up the alleged conflict between object and relational database design.
I think I will mention that Shopping Directory one more time and finish this section tomorrow (today is 12/6/15).