Let's rework PAPR's architecture!

### Rough problem statement

Right now, our architecture is very much like a waterfall.
Events on GitHub cause a linear cascade of events that
eventually fires off PAPR to run for those specific events.

This has severe limitations:
- monolithic architecture makes it harder to try out locally
  and thus harder to contribute
- reliance on multiple linear infrastructures (CI bus,
  Jenkins, OpenStack) results in a higher fault rate
- no easy way to scale horizontally for HA

Other issues plaguing the current architecture:
- queue is not easily visible/not public so it's hard to
  tell what's going on without manually checking the
  internal Jenkins queue (also homu queue is sort of visible
  but could be way better)
- no job prioritization, purely FIFO. But ideally, we want
  to be able to apply a set of rules as to how jobs should
  prioritized. E.g. 'auto' and 'try' branch before PRs, PRs
  with certain labels before others, etc...
- the combination of GHPRB and Homu is confusing and creates
  an inconsistent user experience

### Rough solution

We split up the architecture into multiple small services:

1. Homu/PAPR Scheduler (PAPRQ)
   Proposed to run in CentOS CI
2. PAPR Workers
   Need either OpenStack or Docker/Kubernetes
   Bin packing problem - can we e.g. reuse Kubernetes
   https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
3. PAPR Publishers

The scheduler receives events from GitHub and queues up the
jobs. It understands .papr.yml and splits them into
individual jobs each representing a testsuite. This allows
e.g. workers that only support containerized workloads to
still participate in the pool. It also allows container work
to be weighed differently from VM/cluster work.

Workers periodically (with forced polls on events if
implementable) query PAPRQ for available jobs. PAPRQ
prioritizes jobs by a given set of rules.

### What it would entail

1. The largest piece of work will be to enhance (fork?) Homu
to also handle PR events and add them to its queue. This
naturally resolves the confusing UX experience, and makes
optimizations like https://github.com/servo/homu/pull/54
trivial to implement.

E.g. @rh-atomic-bot retry will actually know whether to
retry testing the PR, or retry testing the merge.

It also allows for more sophisticated syntax, like:
@rh-atomic-bot retry f26-sanity

2. Teach PAPR to connect to PAPRQ for jobs. This is either a
long-running service that polls, or is periodically started
by an external service (e.g. Jenkins, OCP)

3. This can come later. Rather than the workers publishing
to e.g. S3 themselves, do similar to what Cockpit does and
stream logs and updates back to PAPRQ itself. This allows us
to (1) have publicly visible streaming logs, and (2) keep
all the secrets in PAPRQ and only require workers to have a
single token.

Let's finalize this work and split it up amongst team
members so that everyone understands how it works, and can
help manage it.

### Risks

* Contributions/blocking on Servo/Homu team - getting review time is hard

Sub-alternative:
Sidecar/wrapper for Homu - PAPR intercepts github
events and forwards them to Homu as well, but also builds
its own state.
(Investigate organization-wide github events)

### Transition plan

Can take down per-PR testing while keeping up testing on auto branch.

### Alternatives

Customize Jenkins (Integrate better with CentOS CI)
Relationship with GHPRB there?

Hop on http://prow.k8s.io/ with Origin

Rely on Travis more

### Other discussion

Standard test roles vs PAPR
  PAPR describes more things, handles tasks like provisioning more declaratively
Could have stdtest in upstream git?

Colin: PAPR runs stdtest?  Jonathan: Problem: Test in separate git repo. Unless upstream repo also holds stdtest definition?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Let's rework PAPR's architecture! #62

Rough problem statement

Rough solution

What it would entail

Risks

Transition plan

Alternatives

Other discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Let's rework PAPR's architecture! #62

Description

Rough problem statement

Rough solution

What it would entail

Risks

Transition plan

Alternatives

Other discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions