sft-gsm8k-test-train
Tables
results
id, model_name, task_name, sample_id, prompt_actual, prompt_full, gold_answer, is_correct, extracted_answers, stop_reasons, response_1
Many rows
summaries
id, task_name, model_tag, total_examples, correct, accuracy, no_answer_count, stop_reason_counts, duration_human, pass_k, temperature, top_p, max_tokens, error, model
463 rows