developer_uid: chai_backend_admin
submission_id: function_fomub_2025-12-19
model_name: abtest_kimi
model_group:
status: torndown
timestamp: 2025-12-22T05:21:18+00:00
num_battles: 681782
num_wins: 332763
celo_rating: 1335.85
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: abtest_kimi
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-21
win_ratio: 0.4880783006884899
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.243706226348877s
Received healthy response to inference request in 3.318645715713501s
Received healthy response to inference request in 4.172079563140869s
Received healthy response to inference request in 3.303558588027954s
Received healthy response to inference request in 3.029221534729004s
Received healthy response to inference request in 5.250255346298218s
Received healthy response to inference request in 3.2037577629089355s
Received healthy response to inference request in 3.444490909576416s
Received healthy response to inference request in 2.711178779602051s
10 requests
1 failed requests
5th percentile: 2.8542980194091796
10th percentile: 2.9974172592163084
20th percentile: 3.1688505172729493
30th percentile: 3.2317216873168944
40th percentile: 3.2796176433563233
50th percentile: 3.3111021518707275
60th percentile: 3.368983793258667
70th percentile: 3.6627675056457516
80th percentile: 4.387714719772339
90th percentile: 6.7362473249435375
95th percentile: 13.423211228847489
99th percentile: 18.772782351970676
mean time: 5.178706955909729
%s, retrying in %s seconds...
Received healthy response to inference request in 4.010415554046631s
Received healthy response to inference request in 3.8979313373565674s
Received healthy response to inference request in 6.658357381820679s
Received healthy response to inference request in 3.1486685276031494s
Received healthy response to inference request in 2.6604983806610107s
Received healthy response to inference request in 3.925124168395996s
Received healthy response to inference request in 3.835969924926758s
Received healthy response to inference request in 2.537853479385376s
Received healthy response to inference request in 2.785212278366089s
Received healthy response to inference request in 2.560060739517212s
10 requests
0 failed requests
5th percentile: 2.5478467464447023
10th percentile: 2.5578400135040282
20th percentile: 2.640410852432251
30th percentile: 2.7477981090545653
40th percentile: 3.003286027908325
50th percentile: 3.4923192262649536
60th percentile: 3.8607544898986816
70th percentile: 3.906089186668396
80th percentile: 3.942182445526123
90th percentile: 4.275209736824035
95th percentile: 5.466783559322354
99th percentile: 6.420042617321014
mean time: 3.6020091772079468
Pipeline stage StressChecker completed in 91.00s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_fomub_2025-12-19 status is now deployed due to DeploymentManager action
function_fomub_2025-12-19 status is now inactive due to auto deactivation removed underperforming models
function_fomub_2025-12-19 status is now torndown due to DeploymentManager action