Skip to content

Add ORCID registration task to article publication flow#494

Open
Copilot wants to merge 3 commits intomasterfrom
copilot/add-orcid-registration-task
Open

Add ORCID registration task to article publication flow#494
Copilot wants to merge 3 commits intomasterfrom
copilot/add-orcid-registration-task

Conversation

Copy link
Contributor

Copilot AI commented Mar 11, 2026

O que esse PR faz?

Adiciona registro de ORCID ao fluxo de publicação de artigos. Após salvar um documento com sucesso em try_register_documents, envia os dados de cada autor com ORCID para o serviço orcid-push via POST /works. Falhas no push ORCID são logadas mas não bloqueiam o registro do documento.

Controlado por duas variáveis de ambiente:

  • ENABLE_ORCID_PUSH — liga/desliga a funcionalidade (default: false)
  • ORCID_PUSH_URL — URL do serviço (ex: https://orcid-push.scielo.org/)

Onde a revisão poderia começar?

airflow/dags/operations/orcid_operations.py — módulo novo com post_data (retry com tenacity) e register_orcid (usa build_payload do packtools para gerar payloads a partir do XML).

Depois, a integração em airflow/dags/operations/sync_kernel_to_website_operations.py linhas ~737-745.

Como este poderia ser testado manualmente?

  1. Configurar ENABLE_ORCID_PUSH=true e ORCID_PUSH_URL=https://orcid-push.scielo.org/
  2. Executar o DAG sync_kernel_to_website com documentos que possuam autores com ORCID no XML
  3. Verificar nos logs que as requisições POST para /works foram realizadas
  4. Para testar o toggle, definir ENABLE_ORCID_PUSH=false e confirmar que nenhuma requisição é feita

Algum cenário de contexto que queira dar?

Alterações principais:

  • operations/orcid_operations.pypost_data() com retry exponencial (5 tentativas, baseado no pattern do scms-upload requester.py); register_orcid() que parseia XML, gera payloads via packtools e posta para cada autor
  • sync_kernel_to_website_operations.py — chamada a register_orcid após document.save(), envolvida em try/except isolado
  • requirements.txt — packtools 2.9.54.16.1 para acesso a packtools.sps.formats.orcid.payload_builder.build_payload
  • docker-compose.yml / docker-compose-dev.yml — novas env vars ENABLE_ORCID_PUSH, ORCID_PUSH_URL
  • tests/test_orcid_operations.py — 14 testes unitários cobrindo retry, toggle, erros, payloads múltiplos

Screenshots

N/A

Quais são tickets relevantes?

N/A

Referências

Original prompt

This section details on the original issue you should resolve

<issue_title>Adicionar ao fluxo de publicação de artigo uma tarefa de registrar o ORCID</issue_title>
<issue_description>### Descrição da tarefa

Adicionar ao fluxo de publicação de artigo a tarefa de registrar o ORCID, fazendo uma requisição para https://orcid-push.scielo.org/

Documentação do serviço:
https://github.com/lepidus/orcid-profile-publication-service

Use packtools para criar o payload

Use https://github.com/scieloorg/scms-upload/blob/97c1b2720902f17caf4b0036fbb478fc4021eadd/core/utils/requester.py#L39 para fazer a requisição

Possíveis pontos onde inserir o código da requisição.
https://github.com/scieloorg/opac-airflow/blob/master/airflow/dags/operations/sync_kernel_to_website_operations.py#L82
9:38

Atenção:
Criar uma variável de ambiente para ligar / desligar a funcionalidade
Crie as variáveis mencionadas em
https://github.com/lepidus/orcid-profile-publication-service
</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits March 11, 2026 17:23
- Create operations/orcid_operations.py with post_data and register_orcid
- Integrate ORCID registration in try_register_documents after document save
- Add ENABLE_ORCID_PUSH and ORCID_PUSH_URL environment variables
- Update packtools to 4.16.1 for build_payload support
- Add unit tests for ORCID operations

Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Add ORCID registration task to article publication flow Add ORCID registration task to article publication flow Mar 11, 2026
@robertatakenaka robertatakenaka marked this pull request as ready for review March 11, 2026 17:42
Copilot AI review requested due to automatic review settings March 11, 2026 17:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an ORCID “push” side-effect to the article publication flow so that, after a document is saved to OPAC, author/work data derived from the article XML is sent to the orcid-push service (POST /works) behind an environment-variable toggle.

Changes:

  • Bumps packtools version to access build_payload for ORCID payload generation.
  • Introduces operations/orcid_operations.py with a retrying HTTP POST helper and register_orcid() that builds and posts payloads per author ORCID.
  • Integrates register_orcid() into try_register_documents() after document.save(), plus adds unit tests and docker-compose env wiring.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
requirements.txt Updates packtools dependency to a newer version used by ORCID payload building.
docker-compose.yml Adds ENABLE_ORCID_PUSH / ORCID_PUSH_URL env passthrough for local runs.
docker-compose-dev.yml Same env passthrough for dev compose setup.
airflow/dags/operations/orcid_operations.py New ORCID push module with tenacity retry and payload generation via packtools.
airflow/dags/operations/sync_kernel_to_website_operations.py Calls ORCID registration after saving the article.
airflow/tests/test_orcid_operations.py Adds unit tests covering toggle behavior, posting, and error handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +737 to +740
try:
if fetch_document_xml:
document_xml = fetch_document_xml(document_id)
register_orcid(document_id, document_xml)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a second fetch_document_xml(document_id) call for every saved article. The XML was already fetched earlier inside article_factory (same file) to build fields from SPS_Package, so this doubles Kernel traffic and adds latency per document. Consider fetching the XML once in try_register_documents and reusing it for both article_factory and register_orcid (or refactor article_factory to return/cache the XML).

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +36
stop=stop_after_attempt(5),
)
def post_data(url, data=None, headers=None, timeout=10, verify=True):
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With timeout=10 and stop_after_attempt(5), a degraded ORCID Push endpoint can block the publication flow for up to ~60s per author (plus exponential waits) even though failures are meant to be non-blocking. Consider reducing timeout/attempts for this best-effort side effect and/or adding a total time cap (e.g., stop-after-delay) so ORCID push outages don’t significantly slow down the DAG run.

Suggested change
stop=stop_after_attempt(5),
)
def post_data(url, data=None, headers=None, timeout=10, verify=True):
stop=stop_after_attempt(3),
)
def post_data(url, data=None, headers=None, timeout=5, verify=True):

Copilot uses AI. Check for mistakes.
Comment on lines +17 to +20
logger = logging.getLogger(__name__)

ENABLE_ORCID_PUSH = os.environ.get("ENABLE_ORCID_PUSH", "false").lower() == "true"
ORCID_PUSH_URL = os.environ.get("ORCID_PUSH_URL", "")
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module introduces logger while other operations/* modules consistently use Logger = logging.getLogger(__name__) (e.g., operations/docs_utils.py:22, operations/check_website_operations.py:22). For consistency across the codebase, consider renaming this to Logger and updating references in this module.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,229 @@
import os
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused import: os is imported but not used in this test module. Removing it will keep the test file clean and avoids failing CI if linting is enabled later.

Suggested change
import os

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return

try:
xml_tree = et.XML(document_xml)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

XML is parsed with et.XML(document_xml) using the default parser. Elsewhere in this repo XML parsing uses an XMLParser(no_network=True) (e.g., docs_utils.py) to avoid network/entity resolution risks. Consider using an explicit secure parser here (e.g., no_network=True and disabling entity resolution) when parsing document_xml to mitigate XXE/SSRF vectors.

Suggested change
xml_tree = et.XML(document_xml)
parser = et.XMLParser(no_network=True, resolve_entities=False)
xml_tree = et.XML(document_xml, parser=parser)

Copilot uses AI. Check for mistakes.
Comment on lines +741 to +744
except Exception as exc:
logging.error(
"Could not register ORCID for document '%s': %s",
document_id, exc,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except Exception logs only the exception message, which can make diagnosing ORCID push failures difficult (no traceback). Since this is intentionally non-blocking, consider using logging.exception(...) here (or including exc_info=True) so the stack trace is captured in logs.

Suggested change
except Exception as exc:
logging.error(
"Could not register ORCID for document '%s': %s",
document_id, exc,
except Exception:
logging.exception(
"Could not register ORCID for document '%s'",
document_id,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adicionar ao fluxo de publicação de artigo uma tarefa de registrar o ORCID

3 participants