developer_uid: chai_backend_admin
submission_id: function_bumub_2025-12-05
model_name: function_bumub_2025-12-05
model_group:
status: torndown
timestamp: 2025-12-12T18:28:40+00:00
num_battles: 5958
num_wins: 3487
celo_rating: 1354.67
family_friendly_score: 0.571
family_friendly_standard_error: 0.006999414261207862
submission_type: function
display_name: function_bumub_2025-12-05
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-05
win_ratio: 0.5852635112453843
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.1401920318603516s
Received healthy response to inference request in 4.151013612747192s
Received healthy response to inference request in 3.9080843925476074s
Received healthy response to inference request in 2.573502779006958s
Received healthy response to inference request in 3.4448721408843994s
Received healthy response to inference request in 0.4100027084350586s
Received healthy response to inference request in 0.3489682674407959s
Received healthy response to inference request in 0.4215700626373291s
Received healthy response to inference request in 0.5414001941680908s
10 requests
1 failed requests
5th percentile: 0.3764337658882141
10th percentile: 0.40389926433563234
20th percentile: 0.419256591796875
30th percentile: 0.5054511547088623
40th percentile: 1.7606617450714113
50th percentile: 2.856847405433655
60th percentile: 3.2620640754699703
70th percentile: 3.583835816383362
80th percentile: 3.9566702365875246
90th percentile: 5.74617273807525
95th percentile: 12.924388802051528
99th percentile: 18.666961653232576
mean time: 3.9042211055755613
%s, retrying in %s seconds...
Received healthy response to inference request in 2.8598194122314453s
Received healthy response to inference request in 5.459525108337402s
Received healthy response to inference request in 3.132108211517334s
Received healthy response to inference request in 3.4598045349121094s
Received healthy response to inference request in 3.0217769145965576s
Received healthy response to inference request in 3.2114455699920654s
Received healthy response to inference request in 0.5749447345733643s
Received healthy response to inference request in 0.42566442489624023s
Received healthy response to inference request in 0.9075667858123779s
Received healthy response to inference request in 1.3409790992736816s
10 requests
0 failed requests
5th percentile: 0.492840564250946
10th percentile: 0.5600167036056518
20th percentile: 0.8410423755645752
30th percentile: 1.2109554052352904
40th percentile: 2.25228328704834
50th percentile: 2.9407981634140015
60th percentile: 3.065909433364868
70th percentile: 3.1559094190597534
80th percentile: 3.261117362976074
90th percentile: 3.659776592254638
95th percentile: 4.559650850296018
99th percentile: 5.279550256729126
mean time: 2.439363479614258
Pipeline stage StressChecker completed in 67.13s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.61s
Shutdown handler de-registered
function_bumub_2025-12-05 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 3949.68s
Shutdown handler de-registered
function_bumub_2025-12-05 status is now inactive due to auto deactivation removed underperforming models
function_bumub_2025-12-05 status is now torndown due to DeploymentManager action