developer_uid: chai_backend_admin
submission_id: function_gehof_2025-12-19
model_name: abtest_kimi
model_group:
status: torndown
timestamp: 2025-12-22T04:01:17+00:00
num_battles: 5802
num_wins: 3131
celo_rating: 1321.16
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: abtest_kimi
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-21
win_ratio: 0.5396415029300241
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 6.114629745483398s
Received healthy response to inference request in 3.286226749420166s
Received healthy response to inference request in 2.220568895339966s
Received healthy response to inference request in 2.6928999423980713s
Received healthy response to inference request in 2.6630542278289795s
Received healthy response to inference request in 2.484837770462036s
Received healthy response to inference request in 2.2836990356445312s
Received healthy response to inference request in 2.1256885528564453s
Received healthy response to inference request in 4.113373756408691s
Received healthy response to inference request in 3.2390544414520264s
10 requests
0 failed requests
5th percentile: 2.1683847069740296
10th percentile: 2.211080861091614
20th percentile: 2.271073007583618
30th percentile: 2.4244961500167848
40th percentile: 2.5917676448822022
50th percentile: 2.6779770851135254
60th percentile: 2.911361742019653
70th percentile: 3.2532061338424683
80th percentile: 3.4516561508178714
90th percentile: 4.313499355316162
95th percentile: 5.214064550399778
99th percentile: 5.934516706466675
mean time: 3.122403311729431
Pipeline stage StressChecker completed in 32.59s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
function_gehof_2025-12-19 status is now deployed due to DeploymentManager action
function_gehof_2025-12-19 status is now inactive due to auto deactivation removed underperforming models
function_gehof_2025-12-19 status is now torndown due to DeploymentManager action