1 | \documentclass{article}
|
---|
2 | \title{Database Connection for the CPS Facilitator Tool}
|
---|
3 | \author{Jos Kraaijeveld}
|
---|
4 | \date{\today}
|
---|
5 | \usepackage{moreverb}
|
---|
6 | \usepackage{fullpage}
|
---|
7 | \usepackage{framed}
|
---|
8 | \usepackage{float}
|
---|
9 | \usepackage{multirow}
|
---|
10 | \usepackage[final]{pdfpages}
|
---|
11 | \usepackage{hyperref}
|
---|
12 | \begin{document}
|
---|
13 |
|
---|
14 | \maketitle
|
---|
15 |
|
---|
16 | \pagebreak
|
---|
17 |
|
---|
18 | \tableofcontents
|
---|
19 |
|
---|
20 | \pagebreak
|
---|
21 |
|
---|
22 | \section{Introduction}
|
---|
23 | This document describes the database design and implementation for the CPS Facilitator Tool. The goal is to make
|
---|
24 | sure everyone can communicate with the RDF-based database without having to write queries. The DBInterface is
|
---|
25 | written in PHP and requires slight PHP knowledge before use. It is heavily based on initial work by Bas van Nuland. \\
|
---|
26 |
|
---|
27 | If you use this framework, you can skip to the `How to use'-part of this document (\ref{howtouse}). If you are
|
---|
28 | planning to improve this framework, I recommend reading the RDF Primer\footnote{http://www.w3.org/TR/rdf-primer/} and
|
---|
29 | SPARQL Query Language for RDF\footnote{http://www.w3.org/TR/rdf-sparql-query/}. A full specification of the current
|
---|
30 | implementation can be found in section \ref{spec}. The areas I suggest improving upon first are described in section
|
---|
31 | \ref{futurework}.
|
---|
32 |
|
---|
33 | \section{How to use}\label{howtouse}
|
---|
34 | \subsection{Initialization}
|
---|
35 | Before the database can be used, you should initialize a \textit{DatabaseInterface} object. This is the only
|
---|
36 | database class you will use. This is done calling the DatabaseInterface constructor.
|
---|
37 | \begin{verbatimtab}
|
---|
38 | $db = new DatabaseInterface();
|
---|
39 | \end{verbatimtab}
|
---|
40 |
|
---|
41 | \subsection{Basic queries}
|
---|
42 | \subsubsection{Retrieving data}
|
---|
43 | Getting data is done by calling the get method of the DatabaseInterface class. The specification for this
|
---|
44 | method is as follows. \\ \\
|
---|
45 | \begin{tabular}{| l || r |} \hline
|
---|
46 | \multicolumn{2}{|c|}{Function get()} \\ \hline
|
---|
47 | First argument: \$type & String \\ \hline
|
---|
48 | Second argument: \$arguments & Array \\ \hline
|
---|
49 | Return type: & Array/Variable \\ \hline
|
---|
50 | \end{tabular} \\ \\
|
---|
51 |
|
---|
52 | The return type is an Array of objects with the type specified as the first argument. For instance, if
|
---|
53 | the first argument is ``user", the function will return an array of User objects. The second argument can be
|
---|
54 | left out if you want to retrieve all entries of a specific type. Below are a few examples showing the usage
|
---|
55 | of get(). A description of what arguments are possible per type is given in section \ref{arguments}.
|
---|
56 |
|
---|
57 | \begin{figure}[H]
|
---|
58 | \caption{Retrieving all questions and echo their title.}
|
---|
59 | \begin{framed}
|
---|
60 | \begin{verbatimtab}
|
---|
61 | $questions = $db->get("question");
|
---|
62 | foreach ($questions as $question)
|
---|
63 | {
|
---|
64 | echo $question->title;
|
---|
65 | }
|
---|
66 | \end{verbatimtab}
|
---|
67 | \end{framed}
|
---|
68 | \end{figure}
|
---|
69 |
|
---|
70 | \begin{figure}[H]
|
---|
71 | \caption{Retrieving all surveys containing questions with IDs q1 and q2, created by the user
|
---|
72 | Jos, printing all questions in those surveys}
|
---|
73 | \begin{framed}
|
---|
74 | \begin{verbatimtab}
|
---|
75 | //To get all surveys created by the user Jos, we need his UID.
|
---|
76 | //We first query the database for the User object belonging to Jos.
|
---|
77 | $userResults = $db->get("user", array("name" => "Jos"));
|
---|
78 | //Assuming there is a result, the UID is:
|
---|
79 | $josUID = $userResults[0]->uid;
|
---|
80 | //Now to get the requested surveys:
|
---|
81 | $surveys = $db->get("survey", array("questions" => array("q1", "q2"),
|
---|
82 | "creator" => $josUID));
|
---|
83 | //And to print all the questions in these surveys:
|
---|
84 | foreach($surveys as $survey)
|
---|
85 | {
|
---|
86 | echo "All the questions in " . $survey->name;
|
---|
87 | foreach($survey->questions as $question)
|
---|
88 | {
|
---|
89 | print_r($question);
|
---|
90 | }
|
---|
91 | }
|
---|
92 | \end{verbatimtab}
|
---|
93 | \end{framed}
|
---|
94 | \end{figure}
|
---|
95 |
|
---|
96 | \subsubsection{Storing data}
|
---|
97 | Storing data is easy, as long as you stick to using the given PHP Classes. Retrieve these classes by using
|
---|
98 | the get() function, and pass them as a parameter to the set() function of the DatabaseInterface. This set()
|
---|
99 | function will determine what type the object is and store it at the correct location. The method overview is
|
---|
100 | as follows: \\ \\
|
---|
101 | \begin{tabular}{| l || r |} \hline
|
---|
102 | \multicolumn{2}{|c|}{Function set()} \\ \hline
|
---|
103 | First argument: \$rToolObject & Variable \\ \hline
|
---|
104 | \end{tabular} \\ \\
|
---|
105 |
|
---|
106 | Another important thing to note is that currently it will overwrite a previous object with the same UID in
|
---|
107 | the database. The following examples show how to create new objects and save them, as well as edit old
|
---|
108 | objects and save them.
|
---|
109 |
|
---|
110 | \begin{figure}[H]
|
---|
111 | \caption{Creating a new Answer object and storing this in the database.}
|
---|
112 | \begin{framed}
|
---|
113 | \begin{verbatimtab}
|
---|
114 | //For this example, I choose a random question to answer
|
---|
115 | $questions = $db->get("question");
|
---|
116 | $question = $questions[2];
|
---|
117 | //Note two things:
|
---|
118 | //1 - If you pass 'null' as first argument when creating any object
|
---|
119 | // A new UID will be generated, indicating a new object.
|
---|
120 | //2 - Depending on the question, there can be multiple answers
|
---|
121 | // This means the values-argument (third argument) is an array.
|
---|
122 | $answer = new Answer(null, $question, array("12345", "four"));
|
---|
123 | //Save the answer in the database
|
---|
124 | $db->set($answer);
|
---|
125 | \end{verbatimtab}
|
---|
126 | \end{framed}
|
---|
127 | \end{figure}
|
---|
128 |
|
---|
129 | \begin{figure}[H]
|
---|
130 | \caption{Getting a Survey object from the database and removing the first question. Also alter this
|
---|
131 | question.}
|
---|
132 | \begin{framed}
|
---|
133 | \begin{verbatimtab}
|
---|
134 | //Get the survey
|
---|
135 | $surveyResults = $db->get("survey", array("uid" => "b91d39e8667372e220bb861b3f94b5bd"));
|
---|
136 | $survey = surveyResults[0];
|
---|
137 | //Remove the question
|
---|
138 | $question = $survey->questions[0];
|
---|
139 | unset($survey->questions[0]);
|
---|
140 | //Change the question
|
---|
141 | $question->title = "New Title";
|
---|
142 | //Save the survey and question
|
---|
143 | $db->batchSet(array($survey, $question));
|
---|
144 | \end{verbatimtab}
|
---|
145 | \end{framed}
|
---|
146 | \end{figure}
|
---|
147 |
|
---|
148 |
|
---|
149 | \subsection{Arguments}\label{arguments}
|
---|
150 | The arguments you can supply when calling the get() function differ greatly per type. Unfortunately, there is no
|
---|
151 | way around this, so here is a complete overview of arguments per type. \\
|
---|
152 | \begin{tabular}{| l | c | c | l |} \hline
|
---|
153 | \textbf{Type} & \textbf{Argument name} & \textbf{Argument type} & \textbf{Extra notes} \\ \hline \hline
|
---|
154 | \multirow{3}{*}{Answer} & uid & String & \\
|
---|
155 | & question & String & UID of the question \\
|
---|
156 | & values & Array of Strings & \\ \hline \hline
|
---|
157 | \multirow{4}{*}{AnswerSet} & uid & String & \\
|
---|
158 | & survey & String & UID of the Survey \\
|
---|
159 | & respondent & String & UID of the Respondent \\
|
---|
160 | & answers & Array of Strings & UIDs of the Answers \\ \hline \hline
|
---|
161 | \multirow{4}{*}{Application} & uid & String & \\
|
---|
162 | & title & String & \\
|
---|
163 | & description & String & \\
|
---|
164 | & style & String & \\ \hline \hline
|
---|
165 | \multirow{5}{*}{Question} & code & String & \\
|
---|
166 | & title & String & \\
|
---|
167 | & type & String & \\
|
---|
168 | & description & String & \\
|
---|
169 | & category & String & \\
|
---|
170 | & definedanswers & Array of Strings & String values of possible answers \\ \hline \hline
|
---|
171 | \multirow{3}{*}{Respondent/User}& uid & String & \\
|
---|
172 | & name & String & \\
|
---|
173 | & password & String & Use hashes to store passwords \\ \hline \hline
|
---|
174 | \multirow{6}{*}{Session} & uid & String & \\
|
---|
175 | & title & String & \\
|
---|
176 | & creator & String & UID of the User \\
|
---|
177 | & datetime & String & Unix Timestamp. No way to search on intervals (yet). \\
|
---|
178 | & applications & Array of Strings & UIDs of the Applications \\
|
---|
179 | & surveys & Array of Strings & UIDs of the Surveys \\
|
---|
180 | & answersets & Array of Strings & UIDs of the AnswerSets \\ \hline \hline
|
---|
181 | \multirow{4}{*}{Survey} & uid & String & \\
|
---|
182 | & title & String & \\
|
---|
183 | & description & String & \\
|
---|
184 | & creator & String & UID of the user that created this survey \\
|
---|
185 | & questions & Array of Strings & UIDs of the Questions \\ \hline \hline
|
---|
186 | \end{tabular} \\
|
---|
187 |
|
---|
188 | Whenever an Array has to be supplied, it will match for results that satisfy \textit{all} the constraints given
|
---|
189 | in the array.
|
---|
190 |
|
---|
191 | \section{Specification}\label{spec}
|
---|
192 | \subsection{Models}
|
---|
193 | The framework heavily relies on the Object Oriented Programming paradigm, and only allows you to create, edit
|
---|
194 | and store data through instances of precreated classes. All these classes inherit from a global
|
---|
195 | ResearchToolObject class. A UML class diagram of the model classes can be found in Appendix \ref{models}.
|
---|
196 | The most important thing to note is that references to other objects are not evaluated immediately. Rather,
|
---|
197 | these classes with references have to override the evaluate() function, which in turn tries to query the
|
---|
198 | database and resolve those UIDs to model objects. This gives us two advantages: initial queries do not scale
|
---|
199 | exponentially because of the 'lazy' evaluation, and the boolean return value notifies us when there exists an
|
---|
200 | incorrect reference. By evaluating before saving an object, we ensure that there cannot exist invalid values in
|
---|
201 | the database.
|
---|
202 | \subsection{Connectors}
|
---|
203 | Although the front end developer only uses one general DatabaseInterface, the different files for the different
|
---|
204 | datatypes are accessed by seperate connectors. A connector has to implement the IConnector interface, which
|
---|
205 | mainly enforces the get() and set() methods to ensure a connector can perform as expected. The rest of the
|
---|
206 | methods are implemented in the baseclass Connector, which should be extended. In every connector,
|
---|
207 | the get() and set() methods are different. A get() method builds the querystring based on the given arguments,
|
---|
208 | then retrieves the data and performs other necessary queries (like retrieving a set of fields of unspecified
|
---|
209 | length) to ensure the PHP DataModel is complete. A set() method has to store the data accordingly.
|
---|
210 |
|
---|
211 | \subsection{PHPDoc}
|
---|
212 | PHPDoc for the database classes can be found at
|
---|
213 | \href{http://svn.tbm.tudelft.nl/TBM-CPS/RESEARCHTOOL/Doc/Backend/PHPDoc/index.html}{SVN}.
|
---|
214 |
|
---|
215 | \section{Future Work}\label{futurework}
|
---|
216 | \subsection{Adhere to Open-Closed Principle}
|
---|
217 | At the moment, \textit{the DatabaseInterface.php} class violates the Open-Closed principle. Every new connector,
|
---|
218 | four lines need to be added in this class. This is unwanted if new RDF datatypes are added. The same holds for
|
---|
219 | the method createArguments in this class: extra cases have to be added if more types of arguments become possible.
|
---|
220 | Adhering to the Open-Closed principle will allow for easier extention and maintenance of the database-related classes.
|
---|
221 |
|
---|
222 | \subsection{Deal with invalid or missing values}
|
---|
223 | This goes in two parts. Firstly, the database now assumes that given an entry, all the fields for its type are there and
|
---|
224 | filled in. This is unwanted since it can cause issues with backwards compatability of certain .rdf files. Secondly, it
|
---|
225 | assumes that fields containing a UID to a different entry actually point to a valid entry. For instance, a Survey query will
|
---|
226 | try to get a User object as its creator value from the database as well, assuming it exists. It will break if this user object
|
---|
227 | does not exist.
|
---|
228 |
|
---|
229 | \subsection{Optimize queries}
|
---|
230 | Queries as they are at the moment can be a bottleneck in performance if the sets get really large. This is because of a
|
---|
231 | couple of things. For one, there are duplicate queries being executed: if a Session object is retrieved, it also retrieves
|
---|
232 | all the corresponding AnswerSets, which in turn retrieves the corresponding Questions. However, the Session also
|
---|
233 | retrieves the Surveys related to it, which afterwards also retrieve the same questions. There has been no benchmarking
|
---|
234 | of performance so I have no clue at what size of the .rdf files this will become an issue.
|
---|
235 |
|
---|
236 | I have a gut feeling that the SPARQL queries can be optimized some as well. This is regarding entries with zero or more
|
---|
237 | elements of a specific type, like Survey. Now, there is a second query for getting all the IDs for all the questions, but
|
---|
238 | there must be a way to introduce this into the first query without getting the exponential growth of the RDF 'different
|
---|
239 | tree options'.
|
---|
240 |
|
---|
241 |
|
---|
242 | \appendix
|
---|
243 | \pagebreak
|
---|
244 | \section{Class diagram of the models}\label{models}
|
---|
245 | \begin{figure}
|
---|
246 | \includepdf[scale=1.22]{DBModels.pdf}
|
---|
247 | \end{figure}
|
---|
248 |
|
---|
249 | \end{document}
|
---|