Skip to content

Training problems #6

@AllenReder

Description

@AllenReder

感谢您出色的工作!
我在尝试训练时面临一些问题:
GPUs: 2 x 80g A800

train_qwen2p5_3b_stage1.sh:

export PYTHONPATH=$PYTHONPATH:$(pwd)
#!/bin/bash
NNODES=1
NODE_RANK=0
MASTER_ADDR=127.0.0.1  # 或 localhost
MASTER_PORT=12345

# MODIFY HERE: please prepare the env related variables
PR1_PATH="./"
CHECKPOINT_PATH="./outputs" # directory to save the checkpoint
RUN_NAME="qwen2p5_stage1" # describe what your experiment is about

# Default Setting
OUTPUT_DIR="${CHECKPOINT_PATH}/${RUN_NAME}" # path to save the output
SRC_PATH="${OUTPUT_DIR}/src" # path to backup the source code

export LOG_DIR="${OUTPUT_DIR}/logs" # path to save the log
export WANDB_PROJECT="LENS" # project name in wandb
export WANDB_TAGS="qwen2p5_stage1" # tags for the experiment in wandb
export WANDB_MODE=offline 

if [ ! -d "${OUTPUT_DIR}"/src ]; then
    mkdir -p ${OUTPUT_DIR}/src
fi

# backup the source code
cp -r ${PR1_PATH}/src ${SRC_PATH}
mkdir -p ${LOG_DIR}

# run the training
torchrun \
    --nproc_per_node="2" \
    --nnodes="${NNODES}" \
    --node_rank="${NODE_RANK}" \
    --master_addr="${MASTER_ADDR}" \
    --master_port="${MASTER_PORT}" \
    ${PR1_PATH}/src/open_r1/grpo_vllm_sam_stage1.py \
    --deepspeed ${PR1_PATH}/configs/zero3.json \
    --output_dir "${OUTPUT_DIR}" \
    --model_name_or_path ./pretrained/Qwen/Qwen2.5-VL-3B-Instruct \
    --max_prompt_length 2048 \
    --max_completion_length 768 \
    --per_device_train_batch_size 8 \ # 修改了 batch size
    --gradient_accumulation_steps 64 \
    --num_generations 8 \
    --logging_steps 1 \
    --bf16 True \
    --gradient_checkpointing true \
    --attn_implementation flash_attention_2 \
    --report_to wandb \
    --max_pixels 1000000 \
    --num_train_epochs 25 \
    --run_name ${RUN_NAME} \
    --save_steps 100 \
    --reward_funcs "pr1_grounding" "pr1_grounding_format" \
    --save_only_model true \
    --system_prompt_template "default" \
    --question_template "pr1_grounding" \
    --train_sample_size 500000000000 \
    --skip_special_tokens false \
    --answer_template "default" \
    --if_freeze_llm true   \
    --learning_rate 3e-5 \
    --num_of_query 64 \
    --warmup_steps 150 \
    --lr_scheduler_type "cosine" \
    --if_use_qwen_connector true \
    --coord_norm_type "qwen2p5vl"
  1. 两张80G在较小batchsize下仍无法训练,stage1是否需要更大显存?
  2. 有些终端输出我很在意:
You are using a model of type qwen2_5_vl to instantiate a model of type qwen2_vl. This is not supported for all configurations of models and can yield errors.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with `model.to('cuda')`.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen2_5_VisionTransformerPretrainedModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`

有关模型加载问题和flash_attn警告,是否正常?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions