IBM · TathagataChakraborti · May 30, 2020 · May 30, 2020 · May 30, 2020 · Jun 11, 2020
diff --git a/.gitignore b/.gitignore
@@ -113,5 +113,8 @@ data
 nohup.out
 
 # hide wa config
-*nlc2cmd/remote/config.json
+clai/server/plugins/nlc2cmd/remote/config.json
 
+# hide local gitbot stuff config
+clai/server/plugins/gitbot/config.json
+!clai/server/plugins/gitbot/rasa/data
diff --git a/README.md b/README.md
@@ -220,7 +220,7 @@ As before, CLAI skill will not execute without your permission unless `auto` mod
 
 ## :robot: Want to build your own skills?
 
-[`fixit`](clai/server/plugins/fix_bot) &nbsp; [`nlc2cmd`](clai/server/plugins/nlc2cmd) &nbsp; [`helpme`](clai/server/plugins/helpme) &nbsp; [`howdoi`](clai/server/plugins/howdoi) &nbsp; [`man page explorer`](clai/server/plugins/manpage_agent) &nbsp; [`ibmcloud`](clai/server/plugins/ibmcloud)  
+[`fixit`](clai/server/plugins/fix_bot) &nbsp; [`nlc2cmd`](clai/server/plugins/nlc2cmd) &nbsp; [`helpme`](clai/server/plugins/helpme) &nbsp; [`howdoi`](clai/server/plugins/howdoi) &nbsp; [`man page explorer`](clai/server/plugins/manpage_agent) &nbsp; [`ibmcloud`](clai/server/plugins/ibmcloud) &nbsp; [`tellina`](clai/server/plugins/tellina) &nbsp; [`dataxplore`](clai/server/plugins/dataxplore) &nbsp; [`gitbot`](clai/server/plugins/gitbot) 
 
 Project CLAI is intended to rekindle the spirit of AI softbots by providing a plug-and-play framework and simple interface abstractions to the Bash and its underlying operating system. Developers can access the command line through a simple `sense-act` API for rapid prototyping of newer and more complex AI capabilities.
 

diff --git a/clai/emulator/run.gif b/clai/emulator/run.gif
diff --git a/clai/emulator/stop.gif b/clai/emulator/stop.gif
diff --git a/clai/server/README.md b/clai/server/README.md
@@ -30,7 +30,7 @@ CLAI comes with a set of orchestrators to help you get the best out of the Orche
 
 > [`threshold_orchestrator`](orchestration/patterns/threshold_orchestrator) This is similar to the `max_orchestrator` but it maintains thresholds specific to each skill, and updates them according to how the end user reacts to them.
 
-> [`bandit_orchestrator`](orchestration/patterns/bandit_orchestrator) This learns user preferences using contextual bandits. 
+> [`bandit_orchestrator`](orchestration/patterns/rltk_bandit_orchestrator) This learns user preferences using contextual bandits. 
 
 These are housed in the [orchestration/patterns/](orchestration/patterns) folder under packages with the same name. Follow them as examples to build your own favorite orchestration pattern. 
 
@@ -197,7 +197,7 @@ current_state_pre.command.suggested_command = clear
 
 > **Note:** The feedback is recorded in the next action since once way want to look at the follow-up to see whether the user is using a suggestion, i.e. the feedback may not always be directly tied to the user response on `y/n/e` during the current pre-process stage. This is especially the case when skills -- such as the [`nlc2cmd skill`](plugins/nlc2cmd) -- do not suggest a command that can be used directly. 
 
-Check out the `bandit_orchestrator` for an [example](orchestration/patterns/bandit_orchestrator/bandit_orchestrator.py#L82). 
+Check out the `bandit_orchestrator` for an [example](orchestration/patterns/rltk_bandit_orchestrator/rltk_bandit_orchestrator.py). 
 
 ### Save and Load
 
@@ -218,6 +218,10 @@ Check out the `threshold_orchestrator` for an example of [maintaining state](orc
 
 ## Related Publications and Links
 
-> Upadhyay, S., Agarwal, M., Bounneffouf, D., & Khazaeni, Y. (2019). 
-A Bandit Approach to Posterior Dialog Orchestration Under a Budget. 
+> A Bandit Approach to Posterior Dialog Orchestration Under a Budget. 
+Sohini Upadhyay, Mayank Agarwal, Djallel Bounneffouf, Yasaman Khazaeni.
 NeurIPS 2018 Conversational AI Workshop.
+
+> A Unified Conversational Assistant Framework for Business Process Automation. 
+Yara Rizk, Abhisekh Bhandwalder, Scott Boag, Tathagata Chakraborti, Vatche Isahagian, Yasaman Khazaeni, 
+Falk Pollock, and Merve Unuvar. AAAI 2020 Workshop on Intelligent Process Automation. 
diff --git a/clai/server/orchestration/patterns/bandit_orchestrator/README.md b/clai/server/orchestration/patterns/bandit_orchestrator/README.md
diff --git a/clai/server/orchestration/patterns/bandit_orchestrator/bandit_orchestrator.py b/clai/server/orchestration/patterns/bandit_orchestrator/bandit_orchestrator.py
diff --git a/clai/server/orchestration/patterns/bandit_orchestrator/config.yml b/clai/server/orchestration/patterns/bandit_orchestrator/config.yml
diff --git a/clai/server/orchestration/patterns/bandit_orchestrator/install.sh b/clai/server/orchestration/patterns/bandit_orchestrator/install.sh
diff --git a/clai/server/orchestration/patterns/rltk_bandit_orchestrator/README.md b/clai/server/orchestration/patterns/rltk_bandit_orchestrator/README.md
@@ -0,0 +1,39 @@
+# Bandit-based Orchestration
+
+> :warning: :warning: This orchestration pattern is developed on top of IBM Research's 
+internal `rltk` toolkit for reward-based learning and **would not run on general machine**. 
+You are welcome to develop with your own favorite ML platform until such time `rltk` 
+becomes open source. 
+
+This is an illustration of an orchestration pattern that learns based on user feedback 
+using contextual bandits. The context is given by the active skills and their corresponding 
+self-reported confidences, while the reward is either received: 
+
++ directly if the user accepts a suggestion with a `y/n` response 
+(e.g. for the `howdoi` or `man page explorer` skills); or 
++ indirectly if they execute a command that follows the suggestion closely 
+(e.g. for the `nlc2cmd` or `fixit` skills). 
+
+An orchestration layer that can adapt to user interactions over time allows you to 
+develop CLIs that are personalized to the needs of individual users or user types, 
+as well as deal with miscalibrated confidences of skills.
+
+Bandits - and Reinforcement Learning based agents in general - require an initial 
+phase of exploration which can adversely affect the end-user experience. To bypass
+this phase, the bandits can be warm-started with a particular profile. Four profiles
+are included in the package:
+
+- `max-orchestrator`: Starts the bandit orchestrator as a max orchestrator. This behavior
+then changes over time with the user behavior. 
+- `ignore-clai`: Ignores CLAI altogether and treats each command as a native bash command
+- `ignore-skill`: Ignores a particular skill while retaining `max-orchestrator` 
+behavior for the rest, and
+- `prefer-skill`: Prefers one skill over another and is useful in scenarios where a user
+prefers one skill from a pool of skills with overlapping domains.
+
+|  Warm-start behavior     |   Preview    |
+| ----- | ----- |
+| `max-orchestrator` | <img src="https://www.dropbox.com/s/t0s9l066ntfd5v4/max-orchestrator.png?raw=1" />  |
+| `ignore-clai` | <img src="https://www.dropbox.com/s/ji8t8mraav9xszh/noop.png?raw=1" />  |
+| `ignore-nlc2cmd` | <img src="https://www.dropbox.com/s/a28s965vit3fshj/ignore-nlc2cmd.png?raw=1" />  |
+| `prefer-manpage-over-nlc2cmd`   | <img src="https://www.dropbox.com/s/meho56ix1srfe9j/manpage-over-nlc2cmd.png?raw=1" />  |
diff --git a/.../patterns/bandit_orchestrator/__init__.py → ...erns/rltk_bandit_orchestrator/__init__.py b/.../patterns/bandit_orchestrator/__init__.py → ...erns/rltk_bandit_orchestrator/__init__.py
diff --git a/clai/server/orchestration/patterns/rltk_bandit_orchestrator/bandit_config.json b/clai/server/orchestration/patterns/rltk_bandit_orchestrator/bandit_config.json
@@ -0,0 +1,9 @@
+{
+  "noop_confidence": 0.1,
+  "warm_start": true,
+  "warm_start_config": {
+    "type": "max-orchestrator",
+    "kwargs": {}
+  },
+  "reward_match_threshold": 0.7
+}
diff --git a/clai/server/orchestration/patterns/rltk_bandit_orchestrator/config.yml b/clai/server/orchestration/patterns/rltk_bandit_orchestrator/config.yml
@@ -0,0 +1,10 @@
+# Config file using the contextual/thompson pattern and providing its parameters
+# This configuration causes the bandit to do no logging of its activity
+
+pattern: contextual/thompson
+num_actions: 10
+context_size: 10
+
+# Number of actions is set to a maximum of 10. This means a maximum of 10 installed skills
+# (including a NOOP action) are supported.
+# Context size should be equal to the number of actions
diff --git a/clai/server/orchestration/patterns/rltk_bandit_orchestrator/install.sh b/clai/server/orchestration/patterns/rltk_bandit_orchestrator/install.sh
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+
+echo "==============================================================="
+echo ""
+echo " Phase 1: Installing necessary tools"
+echo ""
+echo "==============================================================="
+
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
+FRAMEWORK_DIR="${DIR}/framework"
+
+if [ -d "${FRAMEWORK_DIR}" ]; then
+  rm -rf "${FRAMEWORK_DIR}"
+fi
+
+mkdir -p "${FRAMEWORK_DIR}"
+
+
+echo "  >> Cloning framework libraries"
+echo "==============================================================="
+
+cd "${FRAMEWORK_DIR}"
+
+# Download and install RLTK library into the rltk folder and uncomment the
+# bottom two lines
+
+
+echo "  >> Installing RLTK library"
+echo "==============================================================="
+
+# cd "${FRAMEWORK_DIR}/rltk"
+# python3 -m pip install -q --user .
+
+
+echo "  >> Installing python dependencies"
+echo "==============================================================="
+
+python3 -m pip install -r requirements.txt
diff --git a/...s/bandit_orchestrator/manifest.properties → ...k_bandit_orchestrator/manifest.properties b/...s/bandit_orchestrator/manifest.properties → ...k_bandit_orchestrator/manifest.properties
diff --git a/...erns/bandit_orchestrator/requirements.txt → ...rltk_bandit_orchestrator/requirements.txt b/...erns/bandit_orchestrator/requirements.txt → ...rltk_bandit_orchestrator/requirements.txt