learningOrchestra · LaChapeliere · Oct 9, 2020 · Oct 9, 2020 · Oct 9, 2020 · Oct 9, 2020
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -14,7 +14,7 @@ When creating an issue for a bug or other undefined/unwanted behaviour remember
 
 ## Pull Request process
 
-1. [Fork](https://github.com/riibeirogabriel/learningOrchestra/fork) the repository
+1. [Fork](https://github.com/learningOrchestra/docs/fork) the repository
 2. [Clone](https://git-scm.com/docs/git-clone) your fork to your local environment
 3. Navigate into the project root directory
 4. Create a new [branch](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging), the branch should be named according to what feature/fix you're implementing
@@ -26,7 +26,7 @@ When creating an issue for a bug or other undefined/unwanted behaviour remember
 7. Create a Pull Request
 
 Remember to describe what feature or fix you're implementing in the Pull Request window.
-In the Pull Request window remember to include a quick summary of what the committed code does and how it is an improvement. 
+In the Pull Request window remember to include a quick summary of what the committed code does and how it is an improvement.
 
 After the Pull Request the repository owner will review your request.\
 Be patient, if they require you to make changes to your request, do so.
@@ -37,4 +37,4 @@ Don't be rude, use crude language or harass other users.
 
 ## License
 The repository is currently licenses under GNU General Public License v3.0.
-By contributing to the project you agree that your contributions will be licensed under the same license and provisions.
+By contributing to the project you agree that your contributions will be licensed under the same license and provisions.
diff --git a/README.md b/README.md
@@ -2,16 +2,14 @@
 
 # learningOrchestra Docs
 
-To make changes clone the repo:
-
-`git clone https://github.com/learningOrchestra/learningOrchestra-docs.git`
-
-`cd learningOrchestra-docs`
-
-Install mkdocs:
-
+This repository contains the files to generate the user documentation of [learningOrchestra](https://github.com/learningOrchestra/learningOrchestra). The content of the documentation is created manually and can be found in [the docs folder](https://github.com/learningOrchestra/docs/tree/main/docs). The documentation website is created with [MkDocs](https://www.mkdocs.org/).
+
+To make changes please read the [contributing guide](https://github.com/learningOrchestra/docs/blob/main/CONTRIBUTING.md) then:
+1. Clone the repo
+`git clone https://github.com/learningOrchestra/docs.git`
+2. Move inside the repo
+`cd docs`
+3. Install mkdocs
 `pip install mkdocs`
-
-Run:
-
+4. Run MkDocs
 `mkdocs serve`
diff --git a/docs/about.md b/docs/about.md
@@ -0,0 +1,13 @@
+# About learningOrchestra
+
+Nowadays, **data science relies on a wide range of computer science skills**, from data management to algorithm design, from code optimization to cloud infrastructures. Data scientists are expected to have expertise in these diverse fields, especially when working in small teams or for academia.
+
+This situation can constitute a barrier to the actual extraction of new knowledge from collected data,
+which is why the last two decades have seen more efforts to facilitate and streamline the development of
+data mining workflows. The tools created can be sorted into two categories: **high-level** tools facilitate
+the building of **automatic data processing pipelines** (e.g. [Weka](https://www.cs.waikato.ac.nz/ml/weka/))
+while **low-level** ones support the setup of appropriate physical and virtual infrastructure (e.g. [Spark](https://spark.apache.org/)).
+
+However, this landscape is still missing a tool that **encompasses all steps and needs of a typical data science project**. This is where learningOrchestra comes in.
+
+Read our [first research monograph](https://drive.google.com/file/d/1ZDrTR58pBuobpgwB_AOOFTlfmZEY6uQS/view) (under construction) to know more about the research behind the project.
diff --git a/docs/database-python.md b/docs/database-python.md
@@ -0,0 +1,43 @@
+## Database API
+
+### read_resume_files
+
+```python
+read_resume_files(pretty_response=True)
+```
+* `pretty_response`: returns indented `string` for visualization(default: `True`, returns `dict` if `False`)
+(default `True`, if `False`, return dict)
+
+### read_file
+
+```python
+read_file(filename, skip=0, limit=10, query={}, pretty_response=True)
+```
+
+* `filename` : name of file
+* `skip`: number of rows  to skip in pagination(default: `0`)
+* `limit`: number of rows to return in pagination(default: `10`)
+(maximum is set at `20` rows per request)
+* `query`: query to make in MongoDB(default: `empty query`)
+* `pretty_response`: returns indented `string` for visualization(default: `True`, returns `dict` if `False`)
+
+### create_file
+
+```python
+create_file(filename, url, pretty_response=True)
+```
+
+* `filename`: name of file to be created
+* `url`: url to CSV file
+* `pretty_response`: returns indented `string` for visualization
+(default: `True`, returns `dict` if `False`)
+
+### delete_file
+
+```python
+delete_file(filename, pretty_response=True)
+```
+
+* `filename`: name of the file to be deleted
+* `pretty_response`: returns indented `string` for visualization
+(default: `True`, returns `dict` if `False`)
diff --git a/docs/database-api.md → docs/database-rest.md b/docs/database-api.md → docs/database-rest.md
@@ -1,12 +1,12 @@
-# Database API 
+# Database API
 
-The **Database API** microservice creates a level of abstraction through a REST API. 
+The **Database API** microservice creates a level of abstraction through a REST API.
 
 Using MongoDB, datasets are downloaded in CSV format and parsed into JSON format where the primary key for each document is the filename field contained in the JSON file POST request.
 
 ## GUI tool to handle database files
 
-There are GUI tools to handle database files, like [NoSQLBooster](https://nosqlbooster.com) can interact with mongoDB used in database, and makes several tasks which are limited in `learning-orchestra-client` package, as schema visualization and files extraction and download to formats as CSV and JSON. 
+There are GUI tools to handle database files, like [NoSQLBooster](https://nosqlbooster.com) can interact with mongoDB used in database, and makes several tasks which are limited in `learning-orchestra-client` package, as schema visualization and files extraction and download to formats as CSV and JSON.
 
 You also can navigate in all inserted files in easy way and visualize each row from determined file, to use this tool connect with the url `cluster\_ip:27017` and use the credentials:
 
@@ -91,7 +91,7 @@ Returns an array of metadata files from the database, where each file contains a
 * `F1` - F1 Score from model accuracy
 * `accuracy` - Accuracy from model prediction
 * `classificator` - Initials from used classificator
-* `filename` - Name of the file 
+* `filename` - Name of the file
 * `fit_time` - Time taken for the model to be fit during training
 
 ## List file content
@@ -112,7 +112,7 @@ The first row in the query is always the metadata file.
 `POST CLUSTER_IP:5000/files`
 
 Insert a CSV into the database using the POST method, JSON must be contained in the body of the HTTP request.
-The following fields are required: 
+The following fields are required:
 
 ```json
 {

diff --git a/docs/datatype-python.md b/docs/datatype-python.md
@@ -0,0 +1,12 @@
+## Data type handler API
+
+### change_file_type
+
+```python
+change_file_type(filename, fields_dict, pretty_response=True)
+```
+
+* `filename`: name of file
+* `fields_dict`: dictionary with `field`:`number` or `field`:`string` keys  
+* `pretty_response`: returns indented `string` for visualization
+(default: `True`, returns `dict` if `False`)
diff --git a/docs/datatype-api.md → docs/datatype-rest.md b/docs/datatype-api.md → docs/datatype-rest.md
diff --git a/docs/histogram-python.md b/docs/histogram-python.md
@@ -0,0 +1,15 @@
+
+## Histogram API
+
+### create_histogram
+
+```python
+create_histogram(filename, histogram_filename, fields,
+                 pretty_response=True)
+```
+
+* `filename`: name of file to make histogram
+* `histogram_filename`: name of file used to create histogram
+* `fields`: list with fields to make histogram
+* `pretty_response`: returns indented `string` for visualization
+(default: `True`, returns `dict` if `False`)
diff --git a/docs/histogram-api.md → docs/histogram-rest.md b/docs/histogram-api.md → docs/histogram-rest.md
diff --git a/docs/index.md b/docs/index.md
@@ -1,18 +1,23 @@
-# learningOrchestra Docs
+# learningOrchestra user documentation
 
-**learningOrchestra** is a distributed processing tool that facilitates and streamlines iterative processes in a Data Science project pipeline like:
+learningOrchestra aims to facilitate the development of complex data mining workflows by **seamlessly interfacing different data science tools and services**. From a single interoperable Application Programming Interface (API), users can **design their analytical pipelines and deploy them in an environment with the appropriate capabilities**.
 
-* Data Gathering
-* Data Cleaning
-* Model Building
-* Validating the Model
-* Presenting the Results
+learningOrchestra is designed for data scientists from both engineering and academia backgrounds, so that they can **focus on the discovery of new knowledge** in their data rather than library or maintenance issues.
 
-With learningOrchestra, you can:
+learningOrchestra is organised into interoperable microservices. They offer access to third-party libraries, frameworks and software to **gather data**, **clean data**, **train machine learning models**, **tune machine learning models**, **evaluate machine learning models** and **visualize data and results**.
 
-* load a dataset from an URL (in CSV format).
-* accomplish several pre-processing tasks with datasets.
-* create highly customised model predictions against a specific dataset by providing their own pre-processing code.
-* build prediction models with different classifiers simultaneously using a spark cluster transparently.
+The current version of learningOrchestra offers 7 microservices:
+- The **Database API is the central microservice**. It holds all the data, including the analysis results.
+- The **Data type API is a preprocessing microservice** dedicated to changing the type of data fields.
+- The **Projection, Histogram, t-SNE and PCA APIs are data exploration microservices**. They transform the map the data into new representation spaces so it can be visualized. They can be used on the raw data as well as on the intermediate and final results of the analysis pipeline.
+- The **Model builder API is the main analysis microservice**. It includes some preprocessing features and machine learning features to train models, evaluate models and predict information using trained models.
 
-And so much more! Check the [Usage](https://learningorchestra.github.io/docs/usage/) section for more.
+The microservices can be called on from any computer, including one that is not part of the cluster learningOrchestra is deployed on. learningOrchestra provides two options to access its features: a **microservice REST API** and a **Python package**.
+
+Use this documentation to [learn more about the learningOrchestra project](about.md), [learn how to install and deploy learningOrchestra on a cluster](install.md), learn how to use the [REST APIs](rest-apis.md) and [Python package](python-package.md) to access learningOrchestra microservices, or [find options to get support](support.md)
+
+You can also visit the repositories of the learningOrchestra project:
+- [learningOrchestra](https://github.com/learningOrchestra/learningOrchestra) for the definition of the microservices and the REST APIs,
+- [learningOrchestra-python-client](https://github.com/learningOrchestra/learningOrchestra-python-client) for the Python package,
+- [docs](https://github.com/learningOrchestra/docs) for the content of the present documentation, and
+- [learningOrchestra.github.io](https://github.com/learningOrchestra/learningOrchestra.github.io) for the code of the present website.
diff --git a/docs/install.md b/docs/install.md
@@ -0,0 +1,70 @@
+# Install and deploy learningOrchestra on a cluster
+
+:bell: This documentation assumes that the users are familiar with a number of advanced computer science concepts. We have tried to link to learning resources to support beginners, as well as introduce some of the concepts in the [last section](#concepts). But if something is still not clear, don't hesitate to [ask for help](support.md).
+
+## Setting up your cluster
+
+learningOrchestra operates from a [cluster](#what-is-a-cluster) of Docker [containers](#what-is-a-container).
+
+All your hosts must operate under Linux distributions and have [Docker Engine](https://docs.docker.com/engine/install/) installed.
+
+Configure your cluster in [swarm mode](https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/). Install [Docker Compose](https://docs.docker.com/compose/install/) on your manager instance.
+
+You are ready to deploy! :tada:
+
+## Deploy learningOrchestra
+
+Clone the main learningOrchestra repository on your manager instance.
+- Using HTTP protocol, `git clone https://github.com/learningOrchestra/learningOrchestra.git`
+- Using SSH protocol, `git clone git@github.com:learningOrchestra/learningOrchestra.git`
+- Using GitHub CLI, `gh repo clone learningOrchestra/learningOrchestra`
+
+Move to the root of the directory, `cd learningOrchestra`.
+
+Deploy with `sudo ./run.sh`. The deploy process should take a dozen minutes.
+
+### Interrupt learningOrchestra
+
+Run `docker stack rm microservice`.
+
+### Check cluster status
+
+To check the deployed microservices and machines of your cluster, run `CLUSTER_IP:80` where *CLUSTER_IP* is replaced by the external IP of a machine in your cluster.
+
+The same can be done to check Spark cluster state with `CLUSTER_IP:8080`.
+
+## Install-and-deploy questions
+
+###### My computer runs on Windows/OSX, can I still use learningOrchestra?
+
+You can use the microservices that run on a cluster where learningOrchestra is deployed, but **not deploy learningOrchestra**.
+
+###### I have a single computer, can I still use learningOrchestra?
+
+Theoretically, you can, if your machine has 12 Gb of RAM, a quad-core processor and 100 Gb of disk. However, your single machine won't be able to cope with the computing demanding for a real-life sized dataset.
+
+###### What happens if learningOrchestra is killed while using a microservice?
+
+If your cluster fails while a microservice is processing data, the task may be lost. Some fails might corrupt the database systems.
+
+If no processing was in progress when your cluster fails, the learningOrchestra will automatically re-deploy and reboot the affected microservices.
+
+###### What happens if my instances loose the connection to each other?
+
+If the connection between cluster instances is shutdown, learningOrchestra will try to re-deploy the microservices from the lost instances on the remaining active instances of the cluster.
+
+## Concepts
+
+###### What is a container?
+
+Containers are a software that package code and everything needed to run this code together, so that the code can be run simply in any environment. They also isolate the code from the rest of the machine. They are [often compared to shipping containers](https://www.ctl.io/developers/blog/post/docker-and-shipping-containers-a-useful-but-imperfect-analogy).
+
+###### What is a cluster?
+
+A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. (From [Wikipedia](https://en.wikipedia.org/wiki/Computer_cluster))
+
+###### What are microservices?
+
+Microservices - also known as the microservice architecture - is an architectural style that structures an application as a collection of services that are: highly maintainable and testable, loosely coupled, independently deployable, organized around business capabilities, owned by small team.
+
+[An overview of microservice architecture](https://medium.com/hashmapinc/the-what-why-and-how-of-a-microservices-architecture-4179579423a9)
diff --git a/docs/installation.md b/docs/installation.md