-
Notifications
You must be signed in to change notification settings - Fork 4
Expand file tree
/
Copy pathdatasetd.html
More file actions
223 lines (221 loc) · 15.8 KB
/
datasetd.html
File metadata and controls
223 lines (221 loc) · 15.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
<!DOCTYPE html>
<html>
<head>
<title>Dataset Project</title>
<link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'>
<link rel="stylesheet" href="https://caltechlibrary.github.io/css/site.css">
</head>
<body>
<header>
<a href="http://library.caltech.edu" title="link to Caltech Library Homepage"><img src="https://caltechlibrary.github.io/assets/liblogo.gif" alt="Caltech Library logo"></a>
</header>
<nav>
<ul>
<li><a href="/">Home</a></li>
<li><a href="../">README</a></li>
<li><a href="../LICENSE">LICENSE</a></li>
<li><a href="../INSTALL.html">INSTALL</a></li>
<li><a href="../user_manual.html">User Manual</a></li>
<li><a href="../about.html">About</a></li>
<li><a href="../search.html">Search</a></li>
<li><a href="https://github.com/caltechlibrary/dataset">GitHub</a></li>
</ul>
</nav>
<section>
<h1 id="datasetd">Datasetd</h1>
<h2 id="overview">Overview</h2>
<p><em>datasetd</em> is a minimal web service intended to run on
localhost port 8485. It presents one or more dataset collections as a
web service. It features a subset of functionallity available with the
dataset command line program. <em>datasetd</em> does support
multi-process/asynchronous update to a dataset collection.</p>
<p><em>datasetd</em> is notable in what it does not provide. It does not
provide user/role access restrictions to a collection. It is not
intended to be a standalone web service on the public internet or local
area network. It does not provide support for search or complex
querying. If you need these features I suggest looking at existing
mature NoSQL data management solutions like Couchbase, MongoDB, MySQL
(which now supports JSON objects) or Postgres (which also support JSON
objects). <em>datasetd</em> is a simple, miminal service.</p>
<p>NOTE: You could run <em>datasetd</em> could be combined with a front
end web service like Apache 2 or NginX and through them provide access
control based on <em>datasetd</em>’s predictable URL paths. That would
require a robust understanding of the front end web server, it’s access
control mechanisms and how to defend a proxied service. That is beyond
the skope of this project.</p>
<h2 id="configuration">Configuration</h2>
<p><em>datasetd</em> can make one or more dataset collections visible
over HTTP. The dataset collections hosted need to be avialable on the
same file system as where <em>datasetd</em> is running.
<em>datasetd</em> is configured by reading a “settings.json” file in
either the local directory where it is launch or by a specified
directory on the command line to a appropriate JSON settings.</p>
<p>The “settings.json” file has the following structure</p>
<div class="sourceCode" id="cb1"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"host"</span><span class="fu">:</span> <span class="st">"localhost:8485"</span><span class="fu">,</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"dsn_url"</span><span class="fu">:</span> <span class="st">"mysql://DB_USER:DB_PASSWORD</span><span class="er">\@</span><span class="st">DB_NAME"</span><span class="fu">,</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"collections"</span><span class="fu">:</span> <span class="ot">[</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"dataset"</span><span class="fu">:</span> <span class="st">"<PATH_TO_DATASET_COLLECTION>"</span><span class="fu">,</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">"keys"</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> <span class="dt">"create"</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> <span class="dt">"read"</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> <span class="dt">"update"</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"delete"</span><span class="fu">:</span> <span class="kw">false</span><span class="fu">,</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="dt">"attach"</span><span class="fu">:</span> <span class="kw">false</span><span class="fu">,</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="dt">"retrieve"</span><span class="fu">:</span> <span class="kw">false</span><span class="fu">,</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> <span class="dt">"prune"</span><span class="fu">:</span> <span class="kw">false</span><span class="fu">,</span></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> <span class="dt">"frame-read"</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a> <span class="dt">"frame-write"</span><span class="fu">:</span> <span class="kw">false</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span></code></pre></div>
<p>In the “collections” object the “<COLLECTION_ID>” is a string which
will be used as the start of the path in the URL. The “dataset”
attribute sets the path to the dataset collection made available at
“<PATH_TO_DATASET_COLLECTION>”. For each collection you can allow the
following sub-paths for JSON object interaction “keys”, “create”,
“read”, “update” and “delete”. JSON document attachments are supported
by “attach”, “retrieve”, “prune”. If any of these attributes are missing
from the settings they are assumed to be set to false.</p>
<p>The sub-paths correspond to their counter parts in the dataset
command line tool. By varying the settings of these you can support read
only collections, drop off collections or function as a object store
running behind a web application.</p>
<h2 id="running-datasetd">Running datasetd</h2>
<p><em>datasetd</em> runs as a HTTP service and as such can be exploited
in the same manner as other services using HTTP. You should only run
<em>datasetd</em> on localhost on a trusted machine. If the machine is a
multi-user machine all users can have access to the collections exposed
by <em>datasetd</em> regardless of the file permissions they may in
their account.</p>
<p>Example: If all dataset collections are in a directory only allowed
access to be the “web-data” user but another users on the machine have
access to curl they can access the dataset collections based on the
rights of the “web-data” user by access the HTTP service. This is a
typical situation for most localhost based web services and you need to
be aware of it if you choose to run <em>datasetd</em>.</p>
<p><em>datasetd</em> should NOT be used to store confidential, sensitive
or secret information.</p>
<h2 id="supported-features">Supported Features</h2>
<p><em>datasetd</em> provides a limitted subset of actions supportted by
the standard datset command line tool. It only supports the following
actions</p>
<ul>
<li>collections (return a list of collections available)</li>
<li>collection (return the codemeta for a collection)</li>
<li>keys (return the list of keys in a collection)</li>
<li>has-keys (return the list of keys in a collection)</li>
<li>object (CRUD operations on a JSON document via REST calls)</li>
<li>frames (return a list of frames available in a collection)</li>
<li>has-frame (return a true if frame exists, false otherwise)</li>
<li>frame (CRUD operations on a frame via REST calls)</li>
<li>frame-objects (get a frame’s list of objects)</li>
<li>frame-keys (get a frame’s list of keys)</li>
<li>attachments (list attachments for a JSON document)</li>
<li>attachment (CRUD operations on attachment via REST calls)</li>
</ul>
<p>Each of theses “actions” can be restricted in the configuration (
i.e. “settings.json” file) by setting the value to “false”. If the
attribute for the action is not specified in the JSON settings file then
it is assumed to be “false”.</p>
<h2 id="use-case">Use case</h2>
<p>In this use case a dataset collection called “recipes.ds” has been
previously created and populated using the command line tool.</p>
<p>If I have a settings file for “recipes” based on the collection
“recipes.ds” and want to make it read only I would make the attribute
“read” set to true and if I want the option of listing the keys in the
collection I would set that true also.</p>
<div class="sourceCode" id="cb2"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"host"</span><span class="fu">:</span> <span class="st">"localhost:8485"</span><span class="fu">,</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"collections"</span><span class="fu">:</span> <span class="fu">{</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="dt">"recipes"</span><span class="fu">:</span> <span class="fu">{</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"dataset"</span><span class="fu">:</span> <span class="st">"recipes.ds"</span><span class="fu">,</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"keys"</span><span class="fu">:</span> <span class="kw">true</span><span class="fu">,</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> <span class="dt">"read"</span><span class="fu">:</span> <span class="kw">true</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="fu">}</span></span></code></pre></div>
<p>I would start <em>datasetd</em> with the following command line.</p>
<pre class="shell"><code> datasetd settings.json</code></pre>
<p>This would display the start up message and log output of the
service.</p>
<p>In another shell session I could then use curl to list the keys and
read a record. In this example I assume that “waffles” is a JSON
document in dataset collection “recipes.ds”.</p>
<pre class="shell"><code> curl http://localhost:8485/recipies/read/waffles</code></pre>
<p>This would return the “waffles” JSON document or a 404 error if the
document was not found.</p>
<p>Listing the keys for “recipes.ds” could be done with this curl
command.</p>
<pre class="shell"><code> curl http://localhost:8485/recipies/keys</code></pre>
<p>This would return a list of keys, one per line. You could show all
JSON documents in the collection be retrieving a list of keys and
iterating over them using curl. Here’s a simple example in Bash.</p>
<pre class="shell"><code> for KEY in $(curl http://localhost:8485/recipes/keys); do
curl "http://localhost/8485/recipe/read/${KEY}"
done</code></pre>
<p>Add a new JSON object to a collection.</p>
<pre class="shell"><code> KEY="sunday"
curl -X POST -H 'Content-Type:application/json' \
"http://localhost/8485/recipe/create/${KEY}" \
-d '{"ingredients":["banana","ice cream","chocalate syrup"]}'</code></pre>
<h2 id="end-points">End points</h2>
<p>The following end points are planned for <em>datasetd</em> in version
2.</p>
<ul>
<li><code>/collections</code> returns a list of available
collections.</li>
<li><code>/collection/<COLLECTION_ID></code> with an HTTP GET
returns the codemeta document describing the collection.</li>
</ul>
<p>The following end points are per collection. They are available for
each collection where the settings are set to true. The end points are
generally RESTful so one end point will often map to a CRUD style
operations via http methods POST to create an object, GET to “read” or
retrieve an object, a PUT to update an object and DELETE to remove
it.</p>
<p>The terms “<COLLECTION_ID>” and “<KEY>” refer to the collection path,
the string representing the “key” to a JSON document. For attachment
then a base filename is used to identify the attachment associate with a
“key” in a collection.</p>
<ul>
<li><code>/<COLLECTION_ID>/keys</code> returns a list of keys
available in the collection</li>
<li><code>/<COLLECTION_ID>/has-key/<KEY></code> returns true
if a key is found for a JSON document or false otherwise</li>
<li><code>/<COLLECTION_ID>/object/<KEY></code> performs CRUD
operations on a JSON document, a GET retrieves the JSON document, a POST
creates it, PUT updates it and DELETE removes it.</li>
<li><code>/<COLLECTION_ID>/attachments/<KEY></code> returns
a list of attachments assocated with the JSON document</li>
<li><code>/<COLLECTION_ID>/attachment/<KEY>/<FILENAME></code>
allows you to perform CRUD operations on an attachment. Create is done
with a POST, read (retrieval) is done wiht a GET, replacement is done
with a PUT and deleting an attachment (pruning) is done with a DELETE
http method.</li>
<li><code>/<COLLECTION_ID>/frames</code> list the frames defined
for a collection</li>
<li><code>/<COLLECTION_ID>/has-frame/<FRAME_NAME></code>
returns true if frame is defined otherwise false</li>
<li><code>/<COLLECTION_ID>/frame/<FRAME_NAME></code> a GET
will return the frame definition, a POST will create a frame, a DELETE
will remove a frame, a PUT without a body will cause the frame to be
refreshed and a PUT with an array of keys will cause the frame to be
reframed</li>
<li><code>/<COLLECTION_ID>/frame-objects/<FRAME_NAME></code>
will return a list of the frame’s objects</li>
<li><code>/<COLLECTION_ID>/frame-keys/<FRAME_NAME></code>
will return a list of keys in the frame</li>
</ul>
</section>
<footer>
<span>© 2022 <a href="https://www.library.caltech.edu/copyright">Caltech Library</a></span>
<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address>
<span><a href="mailto:library@caltech.edu">Email Us</a></span>
<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span>
</footer>
</body>
</html>