\documentclass[conference,letterpaper]{IEEEtran}
\usepackage{graphicx}
%\usepackage{multirow}
%\usepackage{ragged2e}
\usepackage{algpseudocode}
\usepackage[cmex10]{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{fancyvrb}
\usepackage{pstricks,pst-node}
\usepackage[pdftitle={On-demand RDF to Relational Query Translation in
            Samizdat RDF Store},
            pdfauthor={Dmitry Borodaenko},
            pdfkeywords={Semantic Web, RDF, relational databases, query
            language, Samizdat},
            pdfborder={0 0 0}]{hyperref}
%\urlstyle{rm}
\emergencystretch=8pt
\interdisplaylinepenalty=2500
%
\begin{document}
%
\title{On-demand RDF to Relational Query Translation in Samizdat RDF
Store}
%
\author{\IEEEauthorblockN{Dmitry Borodaenko}
\IEEEauthorblockA{Belarusian State University of Informatics and
Radioelectronics\\
6 Brovki st., Minsk, Belarus\\
Email: angdraug@debian.org}}

\maketitle              % typeset the title of the contribution

\begin{abstract}

This paper presents an algorithm for on-demand translation of RDF
queries that allows to map any relational data structure to RDF model,
and to perform queries over a combination of mapped relational data and
arbitrary RDF triples with a performance comparable to that of
relational systems. Query capabilities implemented by the algorithm
include optional and negative graph patterns, nested sub-patterns, and
limited RDFS and OWL inference backed by database triggers.

\end{abstract}


\section{Introduction}
\label{introduction}

% motivation for the proposed solution

A wide range of solutions that map relational data to RDF data model has
accumulated to date~\cite{triplify}. There are several factors that make
integration of RDF and relational data important for the adoption of the
Semantic Web. One reason, shared with RDF stores based on a triples
table, is the wide availability of mature relational database
implementations which had seen decades of improvements in reliability,
scalability, and performance. Second is the fact that most of structured
data available online is backed by relational databases. This data is
not likely to be replaced by pure RDF stores in the near future, so it
has to be mapped in one way or another to become available to RDF
agents. Finally, properly normalized and indexed application-specific
relational database schema allows a DBMS to optimize complex queries in
ways that are not possible for a tree of joins over a single triples
table~\cite{sp2b}.

% what is unique about the proposed solution

In the Samizdat open publishing engine, most of the data fits into the
relational model, with the exception of reified RDF statements which are
used in collaborative decision making process~\cite{samizdat-collreif}
and require a more generic triple store. The need for a generic RDF
store with performance on par with a relational database is the primary
motivation behind the design of Samizdat RDF storage module, which is
different from both triples table based RDF stores and relational to RDF
mapping systems. Unlike the former, Samizdat can run optimized SQL
queries over application-specific tables, but unlike the latter, it is
not limited by the relational database schema and can fall back, within
the same query, to a triples table for RDF predicates that are not
mapped to the relational model.

% structure of the paper

The following sections of this paper describe: targeted relational data,
database triggers required for RDFS and OWL inference, query translation
algorithm, update request execution algorithm, details of algorithm
implementation in Samizdat, analysis of its performance, comparison with
related work, and outline for future work.


\section{Relational Data}
\label{relational-data}

% formal definition of data targeted for storage

Samizdat RDF storage module does not impose additional restrictions on
the underlying relational database schema beyond the requirements of the
SQL standard. Any legacy database may be adapted for RDF access while
retaining backwards compatibility with existing SQL queries.

The adaptation process involves adding attributes, foreign keys, tables,
and triggers to the database to enable RDF query translation and support
optional features of Samizdat RDF store, such as statement reification
and inference for {\em rdfs:sub\-Class\-Of}, {\em
rdfs:sub\-Property\-Of}, and {\em owl:Transitive\-Property\/} rules.

Following database schema changes are required for all cases:

\begin{itemize}

\item create {\em rdfs:Resource\/} superclass table with autogenerated
primary key;

\item replace primary keys of mapped subclass tables with foreign keys
referencing the {\em rdfs:Resource\/} table (existing foreign keys may
need to be updated to reflect this change);

\item register {\em rdfs:subClassOf\/} inference database triggers to
update the Resource table and maintain foreign keys integrity on all
changes in mapped subclass tables.

\end{itemize}

Following changes may be necessary to support optional RDF mapping
features:

\begin{itemize}

\item register database triggers for other cases of {\em
rdfs:sub\-Class\-Of\/} entailment;

\item create triples table (required to represent non-relational RDF
data and RDF statement reification);

\item add subproperty qualifier attributes referencing property URIref
entry in the {\em rdfs:Resource\/} table for each attribute mapped to a
superproperty;

\item create transitive closure tables, register {\em
owl:TransitivePro\-perty\/} inference triggers.

\end{itemize}


\section{Inference and Database Triggers}
\label{inference-triggers}

Samizdat RDF storage module implements entailment rules for following
RDFS predicates and OWL classes: {\em rdfs:sub\-Class\-Of}, {\em
rdfs:sub\-Property\-Of}, {\em owl:Transitive\-Property}. Database
triggers are used to minimize impact of RDFS and OWL inference on query
performance:

{\em rdfs:subClassOf\/} inference triggers are invoked on every insert
into and delete from a subclass table. When a tuple without a primary
key is inserted,\footnote{Insertion into subclass table with explicit
primary key is used in two-step resource insertion during execution of
RDF update command (described in section~\ref{update-execution}).} a
template tuple is inserted into superclass table and the produced
primary key is added to the new subclass tuple. Delete operation is
cascaded to all subclass and superclass tables.

{\em rdfs:subPropertyOf\/} inference is performed during query
translation, with help of a stored procedure that returns the attribute
value when subproperty qualifier attribute is set, and NULL otherwise.

{\em owl:TransitiveProperty\/} inference uses a separate transitive
closure table for each relational attribute mapped to a transitive
property. Transitive closure tables are maintained by triggers invoked
on each insert, update, and delete operation involving such an
attribute.

The transitive closure update algorithm is presented in
\figurename~\ref{transitive-closure}. The input to the algorithm is:

\begin{itemize}

\item directed labeled graph $G = \langle N, A \rangle$ where $N$ is a
set of nodes representing RDF resources and $A$ is a set of arcs $a =
\langle s, p, o \rangle$ representing RDF triples;

\item transitive property $\tau$;

\item subgraph $G_\tau \subseteq G$ such that:

\begin{equation}
a_\tau = \langle s, p, o \rangle \in G_\tau \iff
a_\tau \in G \, \wedge \, p = \tau \, ;
\end{equation}

\item graph $G_\tau^+$ containing transitive closure of $G_\tau$;

\item update operation $\omega \in \{insert, update, delete\}$ and its
parameters $a_{old} = \langle s_\omega, \tau, o_{old} \rangle$, $a_{new}
= \langle s_\omega, \tau, o_{new} \rangle$ such that:

\begin{equation}
G_\tau' = (G_\tau \setminus \{ a_{old} \}) \cup \{ a_{new} \} \, .
\end{equation}

\end{itemize}

The algorithm transforms $G_\tau^+$ into a transitive closure of
$G_\tau'$. The algorithm assumes that $G_\tau$ is and should remain
acyclic.

\begin{figure}
\begin{algorithmic}[1]

\If {$o_{new} = s_\omega$ or $\langle o_{new}, \tau, s_\omega \rangle \in G_\tau^+$}
  \State stop
  \Comment refuse to create a cycle in $G_\tau$
\EndIf

\State $G_\tau \gets G_\tau'$
\Comment apply $\omega$

\If {$\omega \in \{update, delete\}$}
  \State $G_\tau^+ \gets G_\tau^+ \setminus
    \{ \langle s, \tau, o \rangle \mid
    (s = s_\omega \, \vee \,
      \langle s, \tau, s_\omega \rangle \in G_\tau^+) \, \wedge \,
    \langle s_\omega, \tau, o \rangle \in G_\tau^+ \}$
  \Comment remove obsolete arcs from $G_\tau^+$
\EndIf

\If {$\omega \in \{insert, update\}$}
  \Comment add new arcs to $G_\tau^+$

  \State $G_\tau^+ \gets G_\tau^+ \cup
    \{ \langle s_\omega, \tau, o \rangle \mid
      o = o_{new} \, \vee \,
      \langle o_{new}, \tau, o \rangle \in G_\tau^+ \}$

  \State $G_\tau^+ \gets G_\tau^+ \cup
    \{ \langle s, \tau, o \rangle \mid
      \langle s, \tau, s_\omega \rangle \in G_\tau^+ \, \wedge \,
      \langle s_\omega, \tau, o \rangle \in G_\tau^+ \}$
\EndIf

\end{algorithmic}
\caption{Update transitive closure}
\label{transitive-closure}
\end{figure}


\section{Query Pattern Translation}
\label{query-translation}

Class structure of the Samizdat RDF storage module is as follows.
External API is provided by the {\tt RDF} class. RDF storage
configuration as described in section~\ref{relational-data} is
encapsulated in {\tt RDFConfig} class. The concrete syntax of
Squish~\cite{samizdat-rel-rdf,squish} and SQL is abstracted into {\tt
SquishQuery} and its subclasses. The query pattern translation algorithm
is implemented by the {\tt SqlMapper} class. 

% prerequisites

The input to the algorithm is as follows:

\begin{itemize}

\item mappings $M = \langle M_{rel}, M_{attr}, M_{sub}, M_{trans}
\rangle$ where $M_{rel}: P \to R$, $M_{attr}: P \to \Phi$, $M_{sub}: P
\to S$, $M_{trans} \to T$; $P$ is a set of mapped RDF properties, $R$ is
a set of relations, $\Phi$ is a set of relation attributes, $S \subset
P$ is a subset of RDF properties that have configured subproperties, $T
\subset R$ is a set of transitive closures (as described in
sections~\ref{relational-data} and \ref{inference-triggers});

\item graph pattern $\Psi = \langle \Psi_{nodes}, \Psi_{arcs} \rangle =
\Pi \cup N \cup \Omega$, where $\Pi$, $N$, and $\Omega$ are main ("must
bind"), negative ("must not bind"), and optional ("may bind") graph
patterns respectively, such that $\Pi$, $N$, and $\Omega$ share no arcs,
and $\Pi$, $\Pi \cup N$ and $\Pi \cup \Omega$ are joint
graphs.\footnote{Arcs with the same subject, object, and predicate but
different bind mode are treated as distinct.}

\item global filter condition $F_g \in F$ and local filter conditions
$F_c: \Psi_{arcs} \to F$ where $F$ is a set of all literal conditions
expressible in the query language syntax.

\end{itemize}

For example, consider the following Squish query and its graph pattern
$\Psi$ presented in \figurename~\ref{graph-pattern}.

\begin{Verbatim}[fontsize=\scriptsize]
SELECT ?msg
WHERE (rdf::predicate ?stmt dc::relation)
      (rdf::subject ?stmt ?msg)
      (rdf::object ?stmt ?tag)
      (dc::date ?stmt ?date)
      (s::rating ?stmt ?rating
          FILTER ?rating >= :threshold)
EXCEPT (dct::isPartOf ?msg ?parent)
OPTIONAL (dc::language ?msg ?original_lang)
      (s::isTranslationOf ?msg ?translation)
      (dc::language ?translation ?translation_lang)
LITERAL ?original_lang = :lang
     OR ?translation_lang = :lang
GROUP BY ?msg
ORDER BY max(?date) DESC
\end{Verbatim}

\begin{figure}

\centering
\psset{unit=3.8mm,labelsep=0.2pt}
\begin{pspicture}[showgrid=false](0,0)(23,12)
\footnotesize

\rput(2.5,5.5){\ovalnode{msg}{\sl ?msg}}
\rput(10,8){\ovalnode{stmt}{\sl ?stmt}}
\rput(2.5,8){\ovalnode{rel}{\it dc:relation}}
\rput(5,10.5){\ovalnode{tag}{\sl ?tag}}
\rput(14,10.5){\ovalnode{date}{\sl ?date}}
\rput(17,8){\ovalnode{rating}{\sl ?rating}}
\rput(14,5.5){\ovalnode{parent}{\sl ?parent}}
\rput(8,1){\ovalnode{origlang}{\sl ?original\_lang}}
\rput(11.2,3.3){\ovalnode{trans}{\sl ?translation}}
\rput(19.2,1){\ovalnode{translang}{\sl ?translation\_lang}}

\ncline{<-}{msg}{stmt} \aput{:U}(0.4){\it rdf:subject}
\ncline{<-}{rel}{stmt} \aput{:U}{\it rdf:predicate}
\ncline{<-}{tag}{stmt} \aput{:U}{\it rdf:object}
\ncline{->}{stmt}{date} \aput{:U}{\it dc:date}
\ncline{->}{stmt}{rating} \aput{:U}{\it s:rating}
\ncline{->}{msg}{parent} \aput{:U}(0.6){\it dct:isPartOf}
\ncline{->}{msg}{origlang} \aput{:U}(0.6){\it dc:language}
\ncline{<-}{msg}{trans} \aput{:U}(0.65){\it s:isTranslationOf}
\ncline{->}{trans}{translang} \aput{:U}(0.6){\it dc:language}

\psccurve[curvature=0.75 0.1 0,linestyle=dashed,showpoints=false]%
(0.3,5)(0.3,10)(3,11.3)(20,9.5)(20,7)(8.5,7)(2.5,4.5)
\rput(18.8,10){$\Pi$}
\rput(16.5,5.5){$N$}
\rput(12.5,1.5){$\Omega$}

\end{pspicture}

\caption{Graph pattern $\Psi$ for the example query}
\label{graph-pattern}
\end{figure}

The output of the algorithm is a join expression $F$ and condition $W$
ready for composition into {\tt FROM} and {\tt WHERE} clauses of an SQL
{\tt SELECT} statement.

In the algorithm description below, $\mathrm{id}(r)$ is used to denote
primary key of relation $r \in R$, and $\rho(n)$ is used to denote value
of $\mathrm{id}(Resource)$ for non-variable node $n \in \Psi_{nodes}$
where such value is known during query translation.\footnote{E.g.
Samizdat uses {\em site-ns/resource-id} notation for internal resource
URIrefs.}

% the algorithm

Key steps of the query pattern translation algorithm correspond to the
following private methods of {\tt SqlMapper}:

{\tt label\_pattern\_components}: Label every connected component of
$\Pi$, $N$, and $\Omega$ with different colors $K$ such that $K_\Pi:
\Pi_{nodes} \to \mathbb{K}, K_N: N_{nodes} \to \mathbb{K}, K_\Omega:
\Omega_{nodes} \to \mathbb{K}, K(n) = K_\Pi(n) \cup K_N(n) \cup
K_\Omega(n)$. The Two-pass Connected Component Labeling
algorithm~\cite{shapiro} is used with a special case to exclude nodes
present in $\Pi$ from neighbour lists while labeling $N$ and $\Omega$.
The special case ensures that parts of $N$ and $\Omega$ which are only
connected through a node in $\Pi$ are labeled with different colors.

{\tt map\_predicates}: Map each arc $c = \langle s, p, o \rangle \in
\Psi_{arcs}$ to the relational data model according to $M$: define
mapping $M_{attr}^{pos}: \Psi_{arcs} \times \Psi_{nodes} \to \Phi$ such
that $M_{attr}^{pos}(c, s) = \mathrm{id}( M_{rel}(p) ),
M_{attr}^{pos}(c, o) = M_{attr}(p)$; replace each unmapped arc with its
reification and map the resulting arcs in the same manner;\footnote{$M$
is expected to map reification properties to the triples table.} for
each arc labeled with a subproperty predicate, add an arc mapped to the
subproperty qualifier attribute. For each node $n \in \Psi_{nodes}$,
find adjacent arcs $\Psi_{nodes}^n = \{\langle s, p, o \rangle \mid n
\in \{s, o\}\}$ and determine its binding mode $\beta_{node}:
\Psi_{nodes} \to \{ \Pi, N, \Omega \}$ such that $\beta_{node}(n) =
max(\beta_{arc}(c) \, \forall c \in \Psi_{nodes}^n)$ where
$\beta_{arc}(c)$ reflects which of the graph patterns $\{ \Pi, N, \Omega
\}$ contains arc $c$, and the order of precedence used by $max$ is $\Pi
> N > \Omega$.

{\tt define\_relation\_aliases}: Map each node in $\Psi$ to one or more
relation aliases $a \in \mathbb{A}$ according to the algorithm described
in \figurename~\ref{define-relation-aliases}. The algorithm produces
mapping $C_a: \Psi_{arcs} \to \mathbb{A}$ which links every arc in
$\Psi$ to an alias, and mappings $A = \langle A_{rel}, A_{node},
A_\beta, A_{filter} \rangle$ where $A_{rel}: \mathbb{A} \to R$,
$A_{node}: \mathbb{A} \to \Psi_{nodes}$, $A_\beta: \mathbb{A} \to \{
\Pi, N, \Omega \}$, $A_{filter}: \mathbb{A} \to F)$ which record
relation, node, bind mode, and a filter condition for each alias.

\begin{figure}
\begin{algorithmic}[1]

\ForAll {$n \in \Psi_{nodes}$}
  \ForAll {$c = \langle s, p, o \rangle \in \Psi_{arcs} \mid s = n \, \wedge \, C_a(c) = \emptyset$}
    \If {$\exists c' = \langle s', p', o' \rangle \mid
      n \in \{s', o'\} \, \wedge \,
      C_a(c') \not= \emptyset \, \wedge \,
      M_{rel}(p') = M_{rel}(p)$}

      \State $C_a(c) \gets C_a(c')$
      \Comment Reuse the alias assigned to an arc adjacent to $n$ and
      mapped to the same relation

    \Else
      \Comment Create new alias
      \State $a = max(\mathbb{A}) + 1$;
        $\mathbb{A} \gets \mathbb{A} \cup \{ a\}$;
        $C_a(c) \gets a$
      \State $A_{node}(a) \gets n$, $A_{filter}(a) \gets \emptyset$

      \If {$M_{trans}(p) = \emptyset$}
        \Comment Use base relation
        \State $A_{rel}(a) \gets M_{rel}(p)$
        \State $A_\beta(a) \gets \beta_{node}(n)$

      \Else
        \Comment Use transitive closure
        \State $A_{rel}(a) \gets M_{trans}(p)$
        \State $A_\beta(a) \gets \beta_{arc}(c)$
        \State \Comment Use arc's bind mode instead of node's
      \EndIf
    \EndIf
  \EndFor
\EndFor

\ForAll {$c \in \Psi_{arcs}$}
  \State $A_{filter}( C_a(c) ) \gets A_{filter}( C_a(c) ) \cup F_c(c)$
  \State \Comment Add arc filter to the linked alias filters
\EndFor

\end{algorithmic}
\caption{Define relation aliases}
\label{define-relation-aliases}
\end{figure}

{\tt transform}: Define bindings $B: \Psi_{nodes} \to \mathbb{B}$ where
$\mathbb{B} = \{\{ \langle a, f \rangle \mid a \in \mathbb{A}, f \in
\Phi \}\}$ of graph pattern nodes to sets of pairs of relation aliases
and attributes, such that

\begin{equation}
\begin{split}
\langle a, f \rangle \in B(n) \iff
  &\exists c \in \Psi_{arcs}^n \\
  &C_a(c) = a, M_{attr}^{pos}(c, n) = f \, .
\end{split}
\end{equation}

Transform graph pattern $\Psi$ into relational query graph $Q = \langle
\mathbb{A}, J \rangle$ where nodes $\mathbb{A}$ are relation aliases
defined earlier and edges $J = \{ \langle b_1, b_2, n \rangle \mid b_1 =
\langle a_1, f_1 \rangle \in B(n), b_2 = \langle a_2, f_2 \rangle \in
B(n), a_1 \not= a_2 \}$ are join conditions. Ground non-variable nodes
according to the algorithm defined in
\figurename~\ref{ground-non-variable-nodes}. Record list of grounded nodes $G
\subseteq \Psi_{nodes}$ such that

\begin{equation}
\begin{split}
n \in G \iff &n \in F_g
              \,\vee\, \exists \langle b_1, b_2, n \rangle \in J \\
             &\vee\, \exists b \in B(n) \, \exists a \in \mathbb{A} \:
                     b \in A_{filter}(a) \, .
\end{split}
\end{equation}

\begin{figure}
\begin{algorithmic}[1]

\State $\exists b = \langle a, f \rangle \in B(n)$
\Comment Take any binding of $n$
\If {$n$ is an internal resource and $\rho(n) = i$}
  \State $A_{filter}(a) \gets A_{filter}(a) \cup (b = i)$
\ElsIf {$n$ is a query parameter or a literal}
  \State $A_{filter}(a) \gets A_{filter}(a) \cup (b = n)$
\ElsIf {$n$ is a URIref}
  \Comment Add a join to a URIref tuple in Resource relation
  \State $\mathbb{A} \gets \mathbb{A} \cup \{ a_r \}$;
    $A_{node}(a_r) = n$;
    $A_{rel}(a_r) = Resource$;
    $A_\beta(a_r) = \beta_{node}(n)$
  \State $B(n) \gets B(n) \cup \langle a_r, \mathrm{id}(Resource) \rangle;
    J \gets J \cup
    \{ \langle b, \langle a_r, \mathrm{id}(Resource) \rangle, n \rangle \}$
  \State $A_{filter}(a_r) = A_{filter}(a_r) \cup (
    \langle a_r, literal \rangle = f \wedge
    \langle a_r, uriref \rangle = t \wedge
    \langle a_r, label \rangle = n )$
\EndIf

\end{algorithmic}
\caption{Ground non-variable nodes}
\label{ground-non-variable-nodes}
\end{figure}

Transformation of the example query presented above will result in a
relational query graph in \figurename~\ref{join-graph}.

\begin{figure}

\centering
\psset{unit=3.8mm,labelsep=0.2pt}
\begin{pspicture}[showgrid=false](0,0)(23,13)
\footnotesize

\rput(1,6){\circlenode{b}{\vphantom{Ij}b}}
\rput(6.7,6){\circlenode{a}{\vphantom{Ij}a}}
\rput(12.8,6){\circlenode{c}{\vphantom{Ij}c}}
\rput(2,11){\circlenode{d}{\vphantom{Ij}d}}
\rput(1,1){\circlenode{g}{\vphantom{Ij}g}}
\rput(22,11){\circlenode{f}{\vphantom{Ij}f}}
\rput(20,1){\circlenode{e}{\vphantom{Ij}e}}

\ncline{-}{b}{a} \aput{:U}(0.4){a.id = b.id} \bput{:U}(0.35){?stmt}
\ncline{-}{a}{c} \aput{:U}{a.subject = c.id} \bput{:U}{?msg}
\ncline{-}{d}{a} \aput{:U}{a.subject = d.id} \bput{:U}(0.4){?msg}
\ncline{-}{g}{a} \aput{:U}(0.43){a.predicate = g.id} \bput{:U}{\it dc:relation}
\ncline{-}{c}{f} \aput{:U}{c.part\_of\_subproperty = f.id} \bput{:U}{\it s:isTranslationOf}
\ncline{-}{c}{e} \aput{:U}{c.part\_of = e.id} \bput{:U}{?translation}

\pspolygon[linestyle=dashed,linearc=0.8](0.1,0.1)(0.1,11.9)(14.5,11.9)(14.5,0.1)
\rput(13.8,1){$P_1$}

\end{pspicture}

\caption{Relational query graph $Q$ for the example query}
\label{join-graph}
\end{figure}

{\tt generate\_tables\_and\_conditions}: Produce ordered connected
minimum edge-disjoint tree cover $P$ for relational query graph $Q$ such
that $\forall P_i \in P$ \, $\forall j = \langle b_{j1}, b_{j2}, n_j
\rangle \in P_i$ \, $\forall k = \langle b_{k1}, b_{k2}, n_k \rangle \in
P_i$:

\begin{gather}
K(n_j) \cap K(n_k) \not= \emptyset \, , \\
\beta_{node}(n_j) = \beta_{node}(n_k) = \beta_{tree}(P_i) \, ,
\end{gather}

starting with $P_1$ such that $\beta_{tree}(P_1) = \Pi$ (it follows from
definitions of $\Psi$ and {\tt transform} that $P_1$ is the only such
tree and covers all join conditions $\langle b_1, b_2, n \rangle \in J$
such that $\beta_{node}(n) = \Pi$). Encode $P_1$ as the root inner join.
Encode other trees with at least one edge as subqueries. Left join
subqueries and aliases representing roots of zero-length trees into join
expression $F$. For each $P_i$ such that $\beta_{tree}(P_i) = N$, find a
binding $b = \langle a, f \rangle \in P_i$ such that $a \in P_1 \cap
P_i$ and add ($b$ {\tt IS NULL}) condition to $W$. For each non-grounded
node $n \not\in G$ such that $\langle a, f \rangle \in B(n) \, \wedge \,
a \in P_1$, add ($b$ {\tt IS NOT NULL}) condition to $W$ if
$\beta_{node}(n) = \Pi$, or ($b$ {\tt IS NULL}) condition if
$\beta_{node}(n) = N$. Add $F_g$ to $W$.

Translation of the example query presented earlier will result in the
following SQL:

\begin{Verbatim}[fontsize=\scriptsize]
SELECT DISTINCT a.subject, max(b.published_date)
FROM Statement AS a
INNER JOIN Resource AS b ON (a.id = b.id)
INNER JOIN Resource AS c ON (a.subject = c.id)
INNER JOIN Message AS d ON (a.subject = d.id)
INNER JOIN Resource AS g ON (a.predicate = g.id)
  AND (g.literal = 'false' AND g.uriref = 'true'
  AND g.label = 'http://purl.org/dc/elements/1.1/relation')
LEFT JOIN (
  SELECT e.language AS _field_b, c.id AS _field_a
  FROM Message AS e
  INNER JOIN Resource AS f ON (f.literal = 'false'
         AND f.uriref = 'true' AND f.label =
'http://www.nongnu.org/samizdat/rdf/schema#isTranslationOf')
  INNER JOIN Resource AS c ON (c.part_of_subproperty = f.id)
         AND (c.part_of = e.id)
) AS _subquery_a ON (c.id = _subquery_a._field_a)
WHERE (b.published_date IS NOT NULL)
  AND (a.object IS NOT NULL) AND (a.rating IS NOT NULL)
  AND (c.part_of IS NULL) AND (a.rating >= ?)
  AND (d.language = ? OR _subquery_a._field_b = ?)
GROUP BY a.subject ORDER BY max(b.published_date) DESC
\end{Verbatim}


\section{Update Command Execution}
\label{update-execution}

Update command uses the same graph pattern structure as a query, and
additionally defines a set $\Delta \subset \Psi_{nodes}$ of variables
representing new RDF resources and a mapping $U: \Psi_{nodes} \to
\mathbb{L}$ of variables to literal values. Execution of an update
command starts with query pattern translation using the algorithm
described in section~\ref{query-translation}. The variables $\Psi$, $A$,
$Q$, etc. produced by pattern translation are used in the subsequent
stages as described below:

\begin{enumerate}

% node values

\item Construct node values mapping $V: \Psi_{nodes} \to \mathbb{L}$
using the algorithm defined in \figurename~\ref{node-values}. Record
resources inserted into the database during this stage in $\Delta_{new}
\subset \Psi_{nodes}$ (it follows from the algorithm definition that
$\Delta \subseteq \Delta_{new}$).

\begin{figure}
\begin{algorithmic}[1]

\ForAll {$n \in \Psi_{nodes}$}
  \If {$n$ is an internal resource and $\rho(n) = i$}
    \State $V(n) \gets i$
  \ElsIf {$n$ is a query parameter or a literal}
    \State $V(n) \gets n$
  \ElsIf {$n$ is a variable}
    \If {$\nexists c = \langle n, p, o \rangle \in \Psi_{arcs}$}
      \State \Comment If found only in object position
      \State $V(n) \gets U(n)$
    \Else
      \If {$n \not\in \Delta$}
        \State $V(n) \gets \mathrm{SquishSelect}(n, \Psi^{n*})$
      \EndIf
      \If {$V(n) = \emptyset$}
        \State Insert $n$ into $Resource$ relation
        \State $V(n) \gets \rho(n)$
        \State $\Delta_{new} \gets \Delta_{new} \cup n$
      \EndIf
    \EndIf
  \ElsIf {$n$ is a URIref}
    \State Select $n$ from $Resource$ relation, insert if missing
    \State $V(n) \gets \rho(n)$ 
  \EndIf
\EndFor

\end{algorithmic}
\caption{Determine node values. $\Psi^{n*}$ is a subgraph of $\Psi$
reachable from $n$. $\mathrm{SquishSelect}(n, \Psi)$ finds a mapping of
variable $n$ that satisfies pattern $\Psi$.}
\label{node-values}
\end{figure}

% data assignment

\item For each alias $a \in \mathbb{A}$, find a subset of graph pattern
$\Psi_{arcs}^a \subseteq \Psi_{arcs}$ such that $c \in \Psi_{arcs}^a
\iff C_a(c) = a$, select a key node $k$ such that $\exists c = \langle
k, p, o \rangle \in \Psi_{arcs}^a$, and collect a map $D_a: \Phi \to
\mathbb{L}$ of fields to values such that $\forall c = \langle s, p, o
\rangle \in \Psi_{arcs}^a \; \exists D_a(o) = V(o)$. If $k \in
\Delta_{new}$ and $A_{rel}(a) \not= Resource$, transform $D_a$ into an
SQL {\tt INSERT} into $A_{rel}(a)$ with explicit primary key assignment
$\mathrm{id}_k(A_{rel}(a)) \gets V(k)$. Otherwise, transform $D_a$
into an {\tt UPDATE} statement on the tuple in $A_{rel}(a)$ for which
$\mathrm{id}_k(A_{rel}(a)) = V(k)$.

% iterative assertions

\item Execute the SQL statements produced in the previous stage inside
the same transaction in the order that resolves their mutual references.

\end{enumerate}


\section{Implementation}

The algorithms described in previous sections are implemented by the
Samizdat RDF storage module, which is used as the primary means of data
access in the Samizdat open publishing system. The module is written in
Ruby programming language, supported by several triggers written in
procedural SQL. The module and the whole Samizdat engine are available
under GNU General Public License.

Samizdat exposes all RDF resources underpinning the structure and
content of the site. HTTP request with a URL of any internal resource
yields a page with detailed information about the resource and its
relation with other resources. Furthermore, Samizdat provides a
graphical interface that allows to compose arbitrary Squish
queries.\footnote{Complexity of user queries is limited to a
configurable maximum number of triples in the graph pattern to prevent
abuse.} Queries may be published so that other users may modify and
reuse them, results of a query may be accessed either as plain HTML or
as an RSS feed.


\section{Evaluation of Results}
\label{evaluation}

%\enlargethispage{-1ex}

Samizdat performance was measured using Berlin SPARQL Benchmark
(BSBM)~\cite{bsbm}, with following variations: a functional equivalent
of BSBM test driver was implemented in Ruby and Squish (instead of Java
and SPARQL); the test platform included Intel Core 2 Duo (instead of
Quad) clocked at the same frequency, and 2GB of memory (instead of 8GB).
In this environment, Samizdat was able to process 25287 complete query
mixes per second (QMpH) on a dataset with 1M triples, and achieved 18735
QMpH with 25M triples, in both cases exceeding figures for all RDF
stores reported in~\cite{bsbm}.

In production, Samizdat was able to serve without congestion peak loads
of up to 5K hits per hour for a site with a dataset sized at 100K
triples in a shared VPS environment. Regeneration of the site frontpage
on the same dataset executes 997 Squish queries and completes in 7.7s,
which is comparable to RDBMS-backed content management systems.


\section{Comparison with Related Work}
\label{related-work}

As mentioned in section~\ref{introduction}, there exists a wide range of
solutions for relational to RDF mapping. Besides Samizdat, the approach
based on automatic on-demand translation of RDF queries into SQL is also
implemented by Federate~\cite{federate}, D2RQ~\cite{d2rq}, and
Virtuoso~\cite{virtuoso}.

While being one of the first solutions to provide on-demand relational
to RDF mapping, Samizdat remains one of the most advanced in terms of
query capabilities. Its single largest drawback is lack of compatibility
with SPARQL; in the same time, in some regards it exceeds capabilities
of other solutions.

The alternative that is closest to Samizdat in terms of query
capabilities is Virtuoso RDF Views: it is the only other
relational-to-RDF mapping solution that provides partial RDFS and OWL
inference, aggregation, and an update language. Still, there are
substantial differences between these two projects. First of all,
Samizdat RDF store is a small module (1000 lines of Ruby and 200 lines
of SQL) that can be used with a variety of RDBMSes, while Virtuoso RDF
Views is tied to its own RDBMS. Virtuoso doesn't support implicit
statement reification, although its design is compatible with this
feature. Finally, Virtuso relies on SQL unions for queries with
unspecified predicates and RDFS and OWL inference. While allowing for
greater flexibility than the database triggers described in
section~\ref{inference-triggers}, iterative union operation has a
considerable impact on query performance.


\section{Future Work}
\label{future-work}

Since the SPARQL Recommendation has been published by W3C~\cite{sparql},
SPARQL support has been at the top of the Samizdat RDF store to-do list.
SPARQL syntax is considerably more expressive than Squish and will
require some effort to implement in Samizdat, but, since design of the
implementation separates syntactic layer from the query translation
logic, the same algorithms as described in this paper can be used to
translate SPARQL patterns to SQL with minimal changes. Most substantial
changes are expected to be required for the explicit grouping of
optional graph patterns and the associated filter scope
issues~\cite{cyganiak}.

Samizdat RDF store should be made more adaptable to a wider variety of
problem domains. Query translation algorithm should be augmented to
translate an ambiguously mapped query (including queries with
unspecified predicates) to a union of alternative interpretations.
Mapping of relational schema should be generalized, including support
for multi-part keys and more generic stored procedures for reification
and inference. Standard RDB2RDF mapping should be implemented when W3C
publishes a specification to that end.


\section{Conclusions}

The on-demand RDF to relational query translation algorithm described
in this paper utilizes existing relational databases to their full
potential, including indexing, transactions, and procedural SQL, to
provide efficient access to RDF data. Implementation of this algorithm
in Samizdat RDF storage module has been tried in production environment
and demonstrated how Semantic Web technologies can be introduced into an
application serving thousands of users without imposing additional
requirements on hardware resources.

\vspace{1ex}


% ---- Bibliography ----
%
\begin{thebibliography}{19}

%\bibitem {expressive-power-of-sparql}
%Anglez, R., Gutierrez, C.:
%The Expressive Power of SPARQL. In: A. Sheth et al. (Eds.) ISWC 2008.
%LNCS, vol. 5318, pp. 82-97. Springer, Heidelberg (2008)\\
%\url{http://www.dcc.uchile.cl/~cgutierr/papers/expPowSPARQL.pdf}

\bibitem {triplify}
Auer, S., Dietzold, S. Lehman, J., Hellmann, S., Aumueller, D.:
Triplify -- Light-Weight Linked Data Publication from Relational
Databases. WWW 2009, Madrid, Spain (2009)\\
\url{http://www.informatik.uni-leipzig.de/~auer/publication/triplify.pdf}

%\bibitem {swad-storage}
%Beckett, Dave:
%Semantic Web Scalability and Storage: Survey of Free Software / Open
%Source RDF storage systems. SWAD-Europe Deliverable 10.1 (2001)\\
%\url{http://www.w3.org/2001/sw/Europe/reports/rdf\_scalable\_storage\_report}

%\bibitem {swad-rdbms-mapping}
%Beckett, D., Grant, J.:
%Semantic Web Scalability and Storage: Mapping Semantic Web Data with
%RDBMSes, SWAD-Europe Deliverable 10.2 (2001)\\
%\url{http://www.w3.org/2001/sw/Europe/reports/scalable\_rdbms\_mapping\_report}

%\bibitem {cwm}
%Berners-Lee, T., Kolovski, V., Connolly, D., Hendler, J. Scharf, Y.:
%A Reasoner for the Web. Theory and Practice of Logic Programming (TPLP),
%special issue on Logic Programming and the Web (2000)\\
%\url{http://www.w3.org/2000/10/swap/doc/paper/}

\bibitem {bsbm}
Bizer, C., Schultz, A.:
The Berlin SPARQL Benchmark. International Journal On Semantic Web and
Information Systems (IJSWIS), Volume 5, Issue 2 (2009)\\
\url{http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/}

\bibitem {d2rq}
Bizer, C., Seaborne, A.:
D2RQ - Treating non-RDF databases as virtual RDF graphs. In: ISWC 2004
(posters)\\
\url{http://www.wiwiss.fu-berlin.de/bizer/D2RQ/spec/}

%\bibitem {samizdat-euruko}
%Borodaenko, Dmitry:
%RDF storage for Ruby: the case of Samizdat. EuRuKo 2003, Karlsruhe (June
%2003)\\
%\url{http://samizdat.nongnu.org/slides/euruko2003\_samizdat.html}

%\bibitem {samizdat-impl-report}
%Borodaenko, Dmitry:
%Samizdat RDF Implementation Report. RDF Interest ML (September 2003)\\
%\url{http://lists.w3.org/Archives/Public/www-rdf-interest/2003Sep/0043.html}

\bibitem {samizdat-rel-rdf}
Borodaenko, Dmitry:
Accessing Relational Data with RDF Queries and Assertions (April 2004)\\
\url{http://samizdat.nongnu.org/papers/rel-rdf.pdf}

\bibitem {samizdat-collreif}
Borodaenko, Dmitry:
Model for Collaborative Decision Making Based on RDF Reification (April
2004)\\
\url{http://samizdat.nongnu.org/papers/collreif.pdf}

\bibitem {cyganiak}
Cyganiak, R.:
A relational algebra for SPARQL. Technical Report HPL-2005-170, HP Labs
(2005)\\
\url{http://www.hpl.hp.com/techreports/2005/HPL-2005-170.html}

\bibitem {virtuoso}
Erling, O., Mikhailov I.:
RDF support in the Virtuoso DBMS. In: Proceedings of the 1st Conference
on Social Semantic Web, volume P-113 of GI-Edition -- Lecture Notes in
Informatics (LNI), ISSN 1617-5468. Bonner K\"{o}llen Verlag (2007)\\
\url{http://virtuoso.openlinksw.com/dav/wiki/Main/VOSArticleRDF}

%\bibitem {rdf-mt}
%Hayes, Patrick:
%RDF Semantics. W3C Recommendation (February 2004)\\
%\url{http://www.w3.org/TR/rdf-mt/}

%\bibitem {rdf-syntax-1999}
%Lassila, O., Swick, R.~R.:
%Resource Description Framework (RDF) Model and Syntax Specification, W3C
%Recommendation (February 1999)\\
%\url{http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/}

%\bibitem {rdb2rdf-xg-report}
%Malhotra, Ashok:
%W3C RDB2RDF Incubator Group Report. W3C Incubator Group Report (January
%2009)\\
%\url{http://www.w3.org/2005/Incubator/rdb2rdf/XGR-rdb2rdf/}

%\bibitem {melnik}
%Melnik, S.:
%Storing RDF in a relational database. Stanford University (2001)\\
%\url{http://infolab.stanford.edu/~melnik/rdf/db.html}

\bibitem {squish}
Miller, Libby, Seaborne, Andy, Reggiori, Alberto:
Three Implementations of SquishQL, a Simple RDF Query Language. In:
Horrocks, I., Hendler, J. (Eds) ISWC 2002. LNCS vol. 2342, pp. 423-435.
Springer, Heidelberg (2002)\\
\url{http://ilrt.org/discovery/2001/02/squish/}

%\bibitem {nuutila}
%Nuutila, Esko:
%Efficient Transitive Closure Computation in Large Digraphs. Acta
%Polytechnica Scandinavica, Mathematics and Computing in Engineering
%Series No. 74, Helsinki (1995)\\
%\url{http://www.cs.hut.fi/~enu/thesis.html}

%\bibitem {owl-semantics}
%Patel-Schneider, Peter F., Hayes, Patrick, Horrocks, Ian:
%OWL Web Ontology Language Semantics and Abstract Syntax. W3C
%Recommendation (February 2004)\\
%\url{http://www.w3.org/TR/owl-semantics/}

\bibitem {federate}
Prud'hommeaux, Eric:
RDF Access to Relational Databases (2003)\\
\url{http://www.w3.org/2003/01/21-RDF-RDB-access/}

\bibitem {sparql}
Prud'hommeaux, Eric, Seaborne, Andy:
SPARQL Query Language for RDF. W3C Recommendation (January 2008)\\
\url{http://www.w3.org/TR/rdf-sparql-query/}

\bibitem {shapiro}
Shapiro, L., Stockman, G:
Computer Vision, pp. 69-73. Prentice-Hall (2002)\\
\url{http://www.cse.msu.edu/~stockman/Book/2002/Chapters/ch3.pdf}

\bibitem {sp2b}
Schmidt, M., Hornung, T., K\"{u}chlin, N., Lausen, G., Pinkel, C.:
An Experimental Comparison of RDF Data Management Approaches in a SPARQL
Benchmark Scenario. In: A. Sheth et al. (Eds.) ISWC 2008. LNCS vol.
5318, pp. 82-97. Springer, Heidelberg (2008)\\
\url{http://www.informatik.uni-freiburg.de/~mschmidt/docs/sp2b\_exp.pdf}

%\bibitem {treehugger}
%Steer, D.:
%TreeHugger -- XSLT for RDF (2003)\\
%\url{http://rdfweb.org/people/damian/treehugger/}

\end{thebibliography}
\end{document}