automation·Independent✓ Verified

Agentic Grpo Longhorizon

Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).

About

Fixing GRPO training collapse in long-horizon multi-tool agents. A lightweight PRM-Lite + LATA joint approach achieves +37% over vanilla GRPO on τ-bench airline (50-task, multi-turn).

Tags

Pricing

Free

0
Visit website ↗

Marketplace

Independent

Category

automation

More like this

Browse automation agents →