Skip to content

Add deployment command documentation for Mexico collection publication year update#24

Draft
Copilot wants to merge 3 commits intobetafrom
copilot/fix-publication-year-filter
Draft

Add deployment command documentation for Mexico collection publication year update#24
Copilot wants to merge 3 commits intobetafrom
copilot/fix-publication-year-filter

Conversation

Copy link

Copilot AI commented Oct 14, 2025

Problem

The Mexico collection in SciELO Analytics is displaying incorrect publication years in the document filters, showing impossible future dates like 2250, 2026, and 2027. As shown in the screenshot, the "Publishing year" filter displays a period range from 2012 to 2250, indicating data inconsistencies.

Publishing year filter showing incorrect dates

While these incorrect publication years have already been corrected in the ArticleMeta database, the PublicationStats ElasticSearch index has not been updated to reflect these changes. This causes the Analytics interface to continue showing outdated and incorrect data.

Affected Articles

Several articles were identified with incorrect publication years:

  • Tecnología y ciencias del agua v12n1, v12n2, v12n3 - Previously showing 2026
  • Acta médica Grupo Ángeles v21n4 - Previously showing 2027
  • Investigación en educación médica v13n50 - Previously showing 2250

Solution

This PR provides comprehensive deployment documentation with the exact command needed to update the PublicationStats ElasticSearch index from the corrected ArticleMeta data.

Command for Deployment Team

The main command to execute on the production server:

publicationstats_loaddata --collection mex --doc_type article --differential

Parameters:

  • --collection mex - Targets the Mexico collection specifically
  • --doc_type article - Updates article documents (where publication_year field is stored)
  • --differential - Efficient mode that compares ArticleMeta and PublicationStats indexes, updating only the documents with differences

Alternative Options

For scenarios requiring additional cleanup:

# With deletion of obsolete documents
publicationstats_loaddata --collection mex --doc_type article --differential --delete

# With detailed logging for monitoring
publicationstats_loaddata --collection mex --doc_type article --differential --logging_level INFO

Files Added

  1. COMANDO_DEPLOY_MEX.txt - Quick reference text file with commands in Portuguese for the deployment team
  2. DEPLOYMENT_COMMAND_MEX.md - Comprehensive documentation including:
    • Problem description and context
    • Command syntax with parameter explanations
    • Alternative command options
    • Environment requirements
    • List of affected articles
    • Post-execution verification steps

Verification

After executing the command, the deployment team should verify:

  1. Access https://analytics.scielo.org/?journal=2007-5057&collection=mex
  2. Confirm the "Publishing year" filter shows realistic date ranges
  3. Verify that future years (2250, 2026, etc.) no longer appear
  4. Check that the document count (685 documents) remains consistent

Technical Details

The publicationstats_loaddata command is a console script entry point (defined in setup.py) that executes the main() function in processing/loaddata.py. The differential mode works by:

  1. Loading all document IDs from ArticleMeta for the specified collection
  2. Loading all document IDs from the ElasticSearch index
  3. Comparing the two sets to identify differences
  4. Updating only the documents that have changed in ArticleMeta

This approach is efficient and safe, minimizing the risk of data loss while ensuring the ElasticSearch index reflects the current state of ArticleMeta.

Original prompt

This section details on the original issue you should resolve

<issue_title>Filtro de ano de publicação com valores errados - Coleção México</issue_title>
<issue_description>### Descrição do problema
Os representantes da Coleção relataram haver problema nos valores apresentados no filtro "ano de publicação" para alguns periódicos. Alguns documentos desses periódicos estavam com o ano de publicação errado, e isso já foi corrigido no ArticleMeta. Porém, no SciELO Analytics, esse problema persiste (vide figura a seguir).

Image

Em resumo, constatou-se que é necessário executar o script loaddata.py, que está disponível em processing/loaddata.py na aplicação PublicationStats, para atualizar o índice que povoa os filtros do Analytics. Algo como publicationstats_loaddata na linha de comando do servidor da aplicação deverá resolver o problema.

Passos para reproduzir o problema

  1. Acesse a página ...
  2. Clique no link SciELO Analytics, no menu à direita (https://analytics.scielo.org/?journal=2007-5057&collection=mex)
  3. Observe, no barra vertical "Document filters", que o intervalo de datas vai de 2012 a 2250 (vide Figura na seção Anexos).
  4. Configure o valor máximo do filtro para qualquer ano anterior a 2250 (por exemplo, 2100 e clique em aplicar)
  5. Observe que o número de documentos foi reduzido de 685 para 684. Isso significa que ainda há um artigo na coleção México, cujo ano de publicação é 2250 - isso persistiu mesmo o SciELO ArticleMeta ter sido atualizado

Anexos

Artigos e periódicos afetados

Procedimentos já analisados

  • Investigou-se mais a fundo a aplicação SciELO Analytics e constatou-se que um cliente Thrift chamado PublicationStats é quem povoa os filtros das telas. Mais especificamente, há dois repositórios, a saber, https://github.com/scieloorg/publicationstatsapi e https://github.com/scieloorg/publication_stats, que representam, respectivamente, uma API de consulta baseada em pyramid e a aplicação PublicationStats, que adota um índice ElasticSearch. Acredita-se que este é o local do problema e que deve conter dados. Veja uma consulta à informação do filtro de anos:
from publicationstats.client import ThriftClient as PublicationStatsThriftClient

pst = PublicationStatsThriftClient()
pst.client.document_publication_years()

# Saída
[
	aggs(key='2021', count=79259),
	aggs(key='2022', count=73255),
	aggs(key='2020', count=72064),
	aggs(key='2023', count=69551),
	...,
	aggs(key='1920', count=6),
	aggs(key='1925', count=6),
	aggs(key='1923', count=4),
	aggs(key='2029', count=4),
	aggs(key='1921', count=2),
	aggs(key='2028', count=2),
	aggs(key='2032', count=2),
	aggs(key='2323', count=2),
	aggs(key='2923', count=2),
	aggs(key='2120', count=1),
	aggs(key='2211', count=1),
	aggs(key='2250', count=1),
	aggs(key='2320', count=1),
	aggs(key='2921', count=1),
	aggs(key='2924', count=1),
	aggs(key='2925', count=1),
	aggs(key='3023', count=1),
	aggs(key='3035', count=1)
]
  • Observe que há o ano de publicação 2250 e outros inconsistentes. A partir deste ponto, seria necessário descobrir que procedimento faz a alteração dos valores retornados por essa aplicação (que código-fonte atualiza o índice ElasticSearch do PublicationStats). Constatou-se que é necessário executar o script loaddata.py, que está disponível em processing/loaddata.py, para atualizar no índice os dados que estão no ArticleMeta. Algo como publicationstats_loaddata na linha de comando do servidor desta aplicação deveria resolver o problema.

Ambiente utilizado

N/A</issue_description>

<agent_instructions>considere a descrição do issue, considere mex é a coleção e que o script é def main():

parser = argparse.ArgumentParser(
    description="Load SciELO Network data no analytics production"
)

parser.add_argument(
    'issns',
    nargs='*',
    help='ISSN\'s separated by spaces'
)

parser.add_argument(
    '--collection',
    '-c',
    help='Collection Acronym'
)

parser.add_argument(
    '--from_...

Fixes #23


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

Copilot AI and others added 2 commits October 14, 2025 18:27
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix publication year filter values in SciELO Analytics Add deployment command documentation for Mexico collection publication year update Oct 14, 2025
Copilot AI requested a review from robertatakenaka October 14, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filtro de ano de publicação com valores errados - Coleção México

2 participants