Skip to content

How to reduce the value of training loss #240

@ColeSu-n

Description

@ColeSu-n

I found a dataset of 148,000 Thai audios and corresponding texts from Common Voice for training, with 50 epochs. However, the loss value is in the range of 54% and has not decreased for a long time. Is there any way to reduce the loss?

This is my training configuration

base_model: pretrained_models/Spark-TTS-0.5B/LLM
# Automatically upload checkpoint and final model to HF
# hub_model_id: username/custom_model_name

trust_remote_code: true

strict: false

datasets:
  - path: .
    data_files: ["./output_prompt/th.jsonl"]
    type: completion
    #type: json                      
    
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/out


sequence_len: 2048
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 2
num_epochs: 50
optimizer: adamw_torch_fused
lr_scheduler: cosine
learning_rate: 0.0005

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: true

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 50
xformers_attention: false
flash_attention: false

warmup_steps: 50
evals_per_epoch: 1
save_steps: 500
debug:
deepspeed:
weight_decay: 0.0

This is the training log

 74%|███████████████████████████████████████████████████████████████████████████████████████████████████████{'loss': 3.3971, 'grad_norm': 6.500714302062988, 'learning_rate': 7.698310311972412e-05, 'memory/max_mem_active(gib)': 14.37, 'memory/max_mem_allocated(gib)': 14.37, 'memory/device_mem_reserved(gib)': 41.93, 'epoch': 37.13}
 74%|███████████████████████████████████████████████████████████████████████████████████████████████████████ 74%|

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions