The Agents that Crossref operates for Event Data. An Agent talks to a data source (usually an API) and builds Evidence Records, which are sent to the Percolator.
This codebase includes all Agents that Crossref operates. As such, all Agents are versioned in step. Prior to this repository all Agents had different codebases and versioning.
Most agents run on a schedule:
lein run start-schedule «agent-name»*
Some require daemons (e.g. Wikipedia and Twitter)
lein run run-daemons «agent-name»*
Run a one-off cycle of the scheduled job:
lein run run-schedule-once «agent-name»*
Current options:
Schedule
- hypothesis
- newsfeed
- reddit-links
- stackexchange
- twitter (update rules on schedule)
- carberry (FOR TESTING ONLY!)
- f1000
Daemon
- twitter (ingest stream)
- wikipedia
The following environment variables must be set:
AGENT_CHECKPOINT_S3_BUCKET_NAMEAGENT_CHECKPOINT_S3_KEYAGENT_CHECKPOINT_S3_REGION_NAMEAGENT_CHECKPOINT_S3_SECRETGLOBAL_ARTIFACT_URL_BASE, e.g. https://artifact.eventdata.crossref.orgGLOBAL_JWT_SECRETSGLOBAL_KAFKA_BOOTSTRAP_SERVERSGLOBAL_STATUS_TOPICPERCOLATOR_INPUT_EVIDENCE_RECORD_TOPICPERCOLATOR_ROBOTS_CACHE_REDIS_HOSTPERCOLATOR_ROBOTS_CACHE_REDIS_PORTTWITTER_GNIP_PASSWORDTWITTER_GNIP_RULES_URLTWITTER_GNIP_USERNAMETWITTER_POWERTRACK_ENDPOINTF1000_DUMP_PATH