A Knowledge Base Approach on Python

I would like to introduce to this post my Phd research project and some of it key concepts. Since this is still an ongoing research, I would your feedback, ideas, contribution or even your own implementations.

Basic Concepts and ideas behind DeepGraphs


DeepGraphs is a open source implementation of a distributed graph database based on RDF. It also implements RDFS and OWL-RL reasoning with a custom public subscribe mechanism for a distributed graph closure. It also has builtin support for SWRL (RIF and RuleML are also planned) at least in an integrity constraints layer at the moment. Furthermore, to make this stack of technologies to everyone, DeepGraphs is backed by a fragment of the English language (almost 100.000 words) as an interface to add knowledge, rules and for the basic database functionalities.
The greatest accomplishment, which was also the inspiration for all this effort, is the question answering part. You can ask the database and answer to you. Not in a Google way(here are the first 10.000 results..), but in a (nearly) human way.

Reasoning

I am not going to discuss here the basic reasoning concepts as well nor the resolution algorithm. A good place to start understanding the basics is as always wikipedia which outlines the reason. Anyone with a basic experience with boolean algebra might be already familiar with the basic concepts of first order logic and reasoning.

Rdfs and Owl-rl

Since we already mentioned a bit about reasoning it makes sense to talk about possible implementations and of course it makes sense to start simple.
The two simplest approach that are of my interest within this project are rdfs( rdf-schema ) and owl2-rl (rule language) were both have resolution algorithms that are at least exponential.

Tbox Closure


Well, having identified the basic reasoning tasks we implement, it is important to show how we do it in DeepGraphs.

This will follow in a next post in more detail.
~Distributed Knowledge Base Closure~

Integrity Constraints


Checking database satisfiability, soundness and completeness is not an easy task and even for simple logics like propositional there are only exponential algorithms to check those.
I will explore a little bit what soundness and the rest of the terms listed above mean in terms of a database.
A knowledge base is sound whenever all sentences can be proved through the deduction rules this base provides and its explicit and implicit knowledge.
Complete is if any true statement can be analysed to the semantics of the knowledge base. If there is a procedure to prove that any complicated statement can be analysed into simpler ones according to the semantics and axioms for the specific knowledge base.
Satisfiability is a close notion to the aforementioned, where there exist an interpretation of a formula making it at least some time true. This is really important, because if a formula is ALWAYS false, then it is unsatisfiable.
The above concepts are borrowed from logic and some of you may already be familiar if you had a course in your undergraduate studies. BUT, there are really important in real world.
Ultimately, in a dreamy land, instead of sql or no-sql databases, we would prefer to have knowledge bases.
Those dreamy artefacts would be capable of store information in any schema-structure, store rules, correspondences and relations between all those, then reason about those. The result would be an epistemic database that would decide on its own about its soundness and completeness.
What could do more, if someone said that all students are people and all students are not animal that would be a compact database and everything would work fine.
If someone said that all people are homo-sapiens and homo-sapiens are animals, this would still work for almost all databases. BUT, in our dreamy database here there would be an error, because students would be animals and not animals which is obviously a contradiction.
Integrity constraints are there to check such contradictions and improve the database overall health.

Epistemic Query Optimisation 


Since a distributed graph database is more vulnerable to changes and the responses on queries can be never as good as a local implementation, we focused on a new mechanism to overcome this. We identified that not all queries have the desired outcome or there may be consistencies in knowledge so the query can not be executed properly. For that reason we focused on the integrity constraints to check whether a query is valid against the knowledge, and if so it is executed.
To further optimise this, we created a "wise" epistemic agent that has a solely purpose to check the database consistency against the constraints.


Conclusion 


This is just a brief outline of what is happening in this project. Hopefully, I will have more time in the future to keep posting and updating with our newer accomplishments in this area.
You can always contribute on our github repository, or our page (under construction )

Σχόλια

Δημοφιλείς αναρτήσεις από αυτό το ιστολόγιο

One day at the WS-REST workshop in Florence

PyAPI: A python library to play with various API Standards, like Hydra, Swagger, RAML and more

What is the Semantic Web?