Context Navigation

source: Doc/Backend/DBInterface.tex @ 189

Last change on this file since 189 was 189, checked in by jkraaijeveld, 13 years ago
Updated .pdf documentation. Model diagram is still outdated though.
File size: 12.4 KB

Line
1	\documentclass{article}
2	\title{Database Connection for the CPS Facilitator Tool}
3	\author{Jos Kraaijeveld}
4	\date{\today}
5	\usepackage{moreverb}
6	\usepackage{fullpage}
7	\usepackage{framed}
8	\usepackage{float}
9	\usepackage{multirow}
10	\usepackage[final]{pdfpages}
11	\usepackage{hyperref}
12	\begin{document}
13
14	\maketitle
15
16	\pagebreak
17
18	\tableofcontents
19
20	\pagebreak
21
22	\section{Introduction}
23	This document describes the database design and implementation for the CPS Facilitator Tool. The goal is to make
24	sure everyone can communicate with the RDF-based database without having to write queries. The DBInterface is
25	written in PHP and requires slight PHP knowledge before use. It is heavily based on initial work by Bas van Nuland. \\
26
27	If you use this framework, you can skip to the `How to use'-part of this document (\ref{howtouse}). If you are
28	planning to improve this framework, I recommend reading the RDF Primer\footnote{http://www.w3.org/TR/rdf-primer/} and
29	SPARQL Query Language for RDF\footnote{http://www.w3.org/TR/rdf-sparql-query/}. A full specification of the current
30	implementation can be found in section \ref{spec}. The areas I suggest improving upon first are described in section
31	\ref{futurework}.
32
33	\section{How to use}\label{howtouse}
34	\subsection{Initialization}
35	Before the database can be used, you should initialize a \textit{DatabaseInterface} object. This is the only
36	database class you will use. This is done calling the DatabaseInterface constructor.
37	\begin{verbatimtab}
38	$db = new DatabaseInterface();
39	\end{verbatimtab}
40
41	\subsection{Basic queries}
42	\subsubsection{Retrieving data}
43	Getting data is done by calling the get method of the DatabaseInterface class. The specification for this
44	method is as follows. \\ \\
45	\begin{tabular}{\| l \|\| r \|} \hline
46	\multicolumn{2}{\|c\|}{Function get()} \\ \hline
47	First argument: \$type & String \\ \hline
48	Second argument: \$arguments & Array \\ \hline
49	Return type: & Array/Variable \\ \hline
50	\end{tabular} \\ \\
51
52	The return type is an Array of objects with the type specified as the first argument. For instance, if
53	the first argument is ``user", the function will return an array of User objects. The second argument can be
54	left out if you want to retrieve all entries of a specific type. Below are a few examples showing the usage
55	of get(). A description of what arguments are possible per type is given in section \ref{arguments}.
56
57	\begin{figure}[H]
58	\caption{Retrieving all questions and echo their title.}
59	\begin{framed}
60	\begin{verbatimtab}
61	$questions = $db->get("question");
62	foreach ($questions as $question)
63	{
64	echo $question->title;
65	}
66	\end{verbatimtab}
67	\end{framed}
68	\end{figure}
69
70	\begin{figure}[H]
71	\caption{Retrieving all surveys containing questions with IDs q1 and q2, created by the user
72	Jos, printing all questions in those surveys}
73	\begin{framed}
74	\begin{verbatimtab}
75	//To get all surveys created by the user Jos, we need his UID.
76	//We first query the database for the User object belonging to Jos.
77	$userResults = $db->get("user", array("name" => "Jos"));
78	//Assuming there is a result, the UID is:
79	$josUID = $userResults[0]->uid;
80	//Now to get the requested surveys:
81	$surveys = $db->get("survey", array("questions" => array("q1", "q2"),
82	"creator" => $josUID));
83	//And to print all the questions in these surveys:
84	foreach($surveys as $survey)
85	{
86	echo "All the questions in " . $survey->name;
87	foreach($survey->questions as $question)
88	{
89	print_r($question);
90	}
91	}
92	\end{verbatimtab}
93	\end{framed}
94	\end{figure}
95
96	\subsubsection{Storing data}
97	Storing data is easy, as long as you stick to using the given PHP Classes. Retrieve these classes by using
98	the get() function, and pass them as a parameter to the set() function of the DatabaseInterface. This set()
99	function will determine what type the object is and store it at the correct location. The method overview is
100	as follows: \\ \\
101	\begin{tabular}{\| l \|\| r \|} \hline
102	\multicolumn{2}{\|c\|}{Function set()} \\ \hline
103	First argument: \$rToolObject & Variable \\ \hline
104	\end{tabular} \\ \\
105
106	Another important thing to note is that currently it will overwrite a previous object with the same UID in
107	the database. The following examples show how to create new objects and save them, as well as edit old
108	objects and save them.
109
110	\begin{figure}[H]
111	\caption{Creating a new Answer object and storing this in the database.}
112	\begin{framed}
113	\begin{verbatimtab}
114	//For this example, I choose a random question to answer
115	$questions = $db->get("question");
116	$question = $questions[2];
117	//Note two things:
118	//1 - If you pass 'null' as first argument when creating any object
119	// A new UID will be generated, indicating a new object.
120	//2 - Depending on the question, there can be multiple answers
121	// This means the values-argument (third argument) is an array.
122	$answer = new Answer(null, $question, array("12345", "four"));
123	//Save the answer in the database
124	$db->set($answer);
125	\end{verbatimtab}
126	\end{framed}
127	\end{figure}
128
129	\begin{figure}[H]
130	\caption{Getting a Survey object from the database and removing the first question. Also alter this
131	question.}
132	\begin{framed}
133	\begin{verbatimtab}
134	//Get the survey
135	$surveyResults = $db->get("survey", array("uid" => "b91d39e8667372e220bb861b3f94b5bd"));
136	$survey = surveyResults[0];
137	//Remove the question
138	$question = $survey->questions[0];
139	unset($survey->questions[0]);
140	//Change the question
141	$question->title = "New Title";
142	//Save the survey and question
143	$db->batchSet(array($survey, $question));
144	\end{verbatimtab}
145	\end{framed}
146	\end{figure}
147
148
149	\subsection{Arguments}\label{arguments}
150	The arguments you can supply when calling the get() function differ greatly per type. Unfortunately, there is no
151	way around this, so here is a complete overview of arguments per type. \\
152	\begin{tabular}{\| l \| c \| c \| l \|} \hline
153	\textbf{Type} & \textbf{Argument name} & \textbf{Argument type} & \textbf{Extra notes} \\ \hline \hline
154	\multirow{3}{*}{Answer} & uid & String & \\
155	& question & String & UID of the question \\
156	& values & Array of Strings & \\ \hline \hline
157	\multirow{4}{*}{AnswerSet} & uid & String & \\
158	& survey & String & UID of the Survey \\
159	& respondent & String & UID of the Respondent \\
160	& answers & Array of Strings & UIDs of the Answers \\ \hline \hline
161	\multirow{4}{*}{Application} & uid & String & \\
162	& title & String & \\
163	& description & String & \\
164	& style & String & \\ \hline \hline
165	\multirow{5}{*}{Question} & code & String & \\
166	& title & String & \\
167	& type & String & \\
168	& description & String & \\
169	& category & String & \\
170	& definedanswers & Array of Strings & String values of possible answers \\ \hline \hline
171	\multirow{3}{*}{Respondent/User}& uid & String & \\
172	& name & String & \\
173	& password & String & Use hashes to store passwords \\ \hline \hline
174	\multirow{6}{*}{Session} & uid & String & \\
175	& title & String & \\
176	& creator & String & UID of the User \\
177	& datetime & String & Unix Timestamp. No way to search on intervals (yet). \\
178	& applications & Array of Strings & UIDs of the Applications \\
179	& surveys & Array of Strings & UIDs of the Surveys \\
180	& answersets & Array of Strings & UIDs of the AnswerSets \\ \hline \hline
181	\multirow{4}{*}{Survey} & uid & String & \\
182	& title & String & \\
183	& description & String & \\
184	& creator & String & UID of the user that created this survey \\
185	& questions & Array of Strings & UIDs of the Questions \\ \hline \hline
186	\end{tabular} \\
187
188	Whenever an Array has to be supplied, it will match for results that satisfy \textit{all} the constraints given
189	in the array.
190
191	\section{Specification}\label{spec}
192	\subsection{Models}
193	The framework heavily relies on the Object Oriented Programming paradigm, and only allows you to create, edit
194	and store data through instances of precreated classes. All these classes inherit from a global
195	ResearchToolObject class. A UML class diagram of the model classes can be found in Appendix \ref{models}.
196	The most important thing to note is that references to other objects are not evaluated immediately. Rather,
197	these classes with references have to override the evaluate() function, which in turn tries to query the
198	database and resolve those UIDs to model objects. This gives us two advantages: initial queries do not scale
199	exponentially because of the 'lazy' evaluation, and the boolean return value notifies us when there exists an
200	incorrect reference. By evaluating before saving an object, we ensure that there cannot exist invalid values in
201	the database.
202	\subsection{Connectors}
203	Although the front end developer only uses one general DatabaseInterface, the different files for the different
204	datatypes are accessed by seperate connectors. A connector has to implement the IConnector interface, which
205	mainly enforces the get() and set() methods to ensure a connector can perform as expected. The rest of the
206	methods are implemented in the baseclass Connector, which should be extended. In every connector,
207	the get() and set() methods are different. A get() method builds the querystring based on the given arguments,
208	then retrieves the data and performs other necessary queries (like retrieving a set of fields of unspecified
209	length) to ensure the PHP DataModel is complete. A set() method has to store the data accordingly.
210
211	\subsection{PHPDoc}
212	PHPDoc for the database classes can be found at
213	\href{http://svn.tbm.tudelft.nl/TBM-CPS/RESEARCHTOOL/Doc/Backend/PHPDoc/index.html}{SVN}.
214
215	\section{Future Work}\label{futurework}
216	\subsection{Adhere to Open-Closed Principle}
217	At the moment, \textit{the DatabaseInterface.php} class violates the Open-Closed principle. Every new connector,
218	four lines need to be added in this class. This is unwanted if new RDF datatypes are added. The same holds for
219	the method createArguments in this class: extra cases have to be added if more types of arguments become possible.
220	Adhering to the Open-Closed principle will allow for easier extention and maintenance of the database-related classes.
221
222	\subsection{Deal with invalid or missing values}
223	This goes in two parts. Firstly, the database now assumes that given an entry, all the fields for its type are there and
224	filled in. This is unwanted since it can cause issues with backwards compatability of certain .rdf files. Secondly, it
225	assumes that fields containing a UID to a different entry actually point to a valid entry. For instance, a Survey query will
226	try to get a User object as its creator value from the database as well, assuming it exists. It will break if this user object
227	does not exist.
228
229	\subsection{Optimize queries}
230	Queries as they are at the moment can be a bottleneck in performance if the sets get really large. This is because of a
231	couple of things. For one, there are duplicate queries being executed: if a Session object is retrieved, it also retrieves
232	all the corresponding AnswerSets, which in turn retrieves the corresponding Questions. However, the Session also
233	retrieves the Surveys related to it, which afterwards also retrieve the same questions. There has been no benchmarking
234	of performance so I have no clue at what size of the .rdf files this will become an issue.
235
236	I have a gut feeling that the SPARQL queries can be optimized some as well. This is regarding entries with zero or more
237	elements of a specific type, like Survey. Now, there is a second query for getting all the IDs for all the questions, but
238	there must be a way to introduce this into the first query without getting the exponential growth of the RDF 'different
239	tree options'.
240
241
242	\appendix
243	\pagebreak
244	\section{Class diagram of the models}\label{models}
245	\begin{figure}
246	\includepdf[scale=1.22]{DBModels.pdf}
247	\end{figure}
248
249	\end{document}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: