New metric assesses how AI is getting better at completing long tasks — but some researchers are wary of long-term ...