Skip to content

Exceção ao otimizar pacote no airflow #1127

@robertatakenaka

Description

@robertatakenaka

Descrição do problema

obtive isso ao usar o ptimiser do packtools [2026-01-09 21:58:44,242] {taskinstance.py:901} INFO - Executing <Task(ShortCircuitOperator): optimize_package_task_id> on 2026-01-09T20:31:57.066931+00:00
[2026-01-09 21:58:44,250] {standard_task_runner.py:54} INFO - Started process 4137959 to run task
[2026-01-09 21:58:44,348] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'sync_documents_to_kernel', 'optimize_package_task_id', '2026-01-09T20:31:57.066931+00:00', '--job_id', '405724', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/sync_documents_to_kernel.py', '--cfg_path', '/tmp/tmpp8qn7kqa']
[2026-01-09 21:58:44,352] {standard_task_runner.py:78} INFO - Job 405724: Subtask optimize_package_task_id
[2026-01-09 21:58:44,750] {logging_mixin.py:112} INFO - Running %s on host %s <TaskInstance: sync_documents_to_kernel.optimize_package_task_id 2026-01-09T20:31:57.066931+00:00 [running]> airflow-778f6d87b4-t6sx2
[2026-01-09 21:58:44,979] {utils.py:792} INFO - Generating new SciELO Publishing Package /tmp/tmpa1r6o4ki/2026-01-05-13-30-59-496018_bjos_v24.zip
[2026-01-09 21:58:44,980] {utils.py:803} INFO - Optimizing XML file 1677-3225-bjos-24-e256503.xml [0/1]
[2026-01-09 21:58:45,778] {taskinstance.py:1150} ERROR - image has wrong mode
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 158, in execute
    condition = super(ShortCircuitOperator, self).execute(context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/sync_documents_to_kernel.py", line 95, in optimize_package
    _sps_package, new_sps_zip_dir
  File "/usr/local/airflow/dags/operations/sync_documents_to_kernel_operations.py", line 136, in optimize_sps_pkg_zip_file
    preserve_files=False
  File "/usr/local/lib/python3.7/site-packages/packtools/utils.py", line 807, in optimise
    new_package_file_path, xml_filename, zipped_filenames
  File "/usr/local/lib/python3.7/site-packages/packtools/utils.py", line 714, in _optimise_to_zipfile
    optimised_xml = xml_web_optimiser.get_xml_file()
  File "/usr/local/lib/python3.7/site-packages/packtools/utils.py", line 636, in get_xml_file
    image_filename, self._add_assets_thumbnails
  File "/usr/local/lib/python3.7/site-packages/packtools/utils.py", line 600, in _get_optimised_image_with_filename
    return add_image(image_filename)
  File "/usr/local/lib/python3.7/site-packages/packtools/utils.py", line 570, in _add_assets_thumbnails
    thumbnail_bytes = web_image_generator.get_thumbnail_bytes()
  File "/usr/local/lib/python3.7/site-packages/packtools/utils.py", line 421, in get_thumbnail_bytes
    self._image_object.thumbnail(self.thumbnail_size)
  File "/usr/local/lib/python3.7/site-packages/PIL/Image.py", line 2619, in thumbnail
    im = self.resize(size, resample, box=box, reducing_gap=reducing_gap)
  File "/usr/local/lib/python3.7/site-packages/PIL/Image.py", line 2193, in resize
    return self._new(self.im.resize(size, resample, box))
ValueError: image has wrong mode
[2026-01-09 21:58:45,801] {taskinstance.py:1194} INFO - Marking task as FAILED. dag_id=sync_documents_to_kernel, task_id=optimize_package_task_id, execution_date=20260109T203157, start_date=20260109T215843, end_date=20260109T215845
[2026-01-09 21:58:48,871] {local_task_job.py:102} INFO - Task exited with return code 1

Passos para reproduzir o problema

  1. Acesse a página ...
  2. Clique no link ...
  3. Role a página até ...
  4. Observe o erro apresentado

Comportamento esperado

...

Screenshots ou vídeos

n/a

Anexos

scieloorg/ajuda#15

Ambiente utilizado

opac-airflow v1.0.0-rc84

Diagnóstico

Análise do Erro

O erro ValueError: image has wrong mode ocorre no PIL/Pillow ao tentar redimensionar uma imagem para gerar o thumbnail. Isso geralmente acontece quando a imagem está em um modo não suportado para resize com certos resamplers — como modo P (paleta/indexed), CMYK, LA, ou I (32-bit integers).

O problema está em packtools/utils.py, linha 421:

self._image_object.thumbnail(self.thumbnail_size)

Causa Raiz

O PIL não consegue fazer resize com LANCZOS (ou outros resamplers de alta qualidade) em imagens com modos como P, CMYK, I, F, ou LA sem conversão prévia.


Solução

1. Correção no packtools (se você mantém o fork)

Em packtools/utils.py, antes de chamar thumbnail, converter a imagem para RGB ou RGBA:

def get_thumbnail_bytes(self):
    img = self._image_object

    # Normaliza o modo para compatibilidade com resize/thumbnail
    if img.mode in ("P", "CMYK", "I", "F", "1"):
        img = img.convert("RGB")
    elif img.mode in ("LA", "PA"):
        img = img.convert("RGBA")

    img.thumbnail(self.thumbnail_size)
    # ... resto do código

2. Workaround no pipeline Airflow (sem alterar packtools)

Se não for possível alterar o packtools agora, você pode pré-processar as imagens do pacote antes de chamar o optimizer, em sync_documents_to_kernel_operations.py:

from PIL import Image
import zipfile
import io
import os

UNSAFE_MODES = {"P", "CMYK", "I", "F", "1", "LA", "PA"}

def normalize_images_in_zip(zip_path: str, output_path: str) -> None:
    """Converte imagens com modos problemáticos antes de otimizar."""
    with zipfile.ZipFile(zip_path, "r") as zin, \
         zipfile.ZipFile(output_path, "w", zipfile.ZIP_DEFLATED) as zout:

        for item in zin.infolist():
            data = zin.read(item.filename)
            ext = os.path.splitext(item.filename)[1].lower()

            if ext in (".jpg", ".jpeg", ".png", ".gif", ".tif", ".tiff"):
                try:
                    img = Image.open(io.BytesIO(data))
                    if img.mode in UNSAFE_MODES:
                        target_mode = "RGBA" if img.mode in ("LA", "PA") else "RGB"
                        img = img.convert(target_mode)
                        buf = io.BytesIO()
                        fmt = "PNG" if target_mode == "RGBA" else "JPEG"
                        img.save(buf, format=fmt)
                        data = buf.getvalue()
                except Exception as e:
                    # Loga mas não interrompe — deixa o packtools lidar
                    logging.warning(f"Não foi possível normalizar {item.filename}: {e}")

            zout.writestr(item, data)

E no fluxo principal, chamar antes do optimize_sps_pkg_zip_file:

normalized_zip = zip_path.replace(".zip", "_normalized.zip")
normalize_images_in_zip(zip_path, normalized_zip)
optimize_sps_pkg_zip_file(normalized_zip, ...)

Diagnóstico do pacote afetado

Para identificar qual imagem causou o problema no pacote 2026-01-05-13-30-59-496018_bjos_v24.zip:

from PIL import Image
import zipfile, io

def check_image_modes(zip_path):
    with zipfile.ZipFile(zip_path) as z:
        for name in z.namelist():
            if name.lower().endswith((".jpg", ".jpeg", ".png", ".tif", ".gif")):
                try:
                    img = Image.open(io.BytesIO(z.read(name)))
                    print(f"{name}: mode={img.mode}, size={img.size}")
                except Exception as e:
                    print(f"{name}: ERRO - {e}")

check_image_modes("/caminho/para/2026-01-05-13-30-59-496018_bjos_v24.zip")

Recomendação

A correção mais robusta é no packtools (opção 1), pois resolve o problema na raiz e beneficia todos os pacotes futuros. O workaround no pipeline é útil como medida emergencial para desbloquear o processamento agora.

Qual é a sua situação com o packtools — você mantém um fork próprio ou depende do release oficial?

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions