function_bosum_2025-12-14

developer_uid: chai_evaluation_service

submission_id: function_bosum_2025-12-14

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-17T18:01:16+00:00

num_battles: 10638

num_wins: 5298

celo_rating: 1256.41

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-14

win_ratio: 0.49802594472645234

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.4937713146209717s
Received healthy response to inference request in 1.889636754989624s
Received healthy response to inference request in 1.9347727298736572s
Received healthy response to inference request in 2.6929776668548584s
Received healthy response to inference request in 2.9724090099334717s
Received healthy response to inference request in 2.506251335144043s
Received healthy response to inference request in 1.876828908920288s
Received healthy response to inference request in 2.000131130218506s
Received healthy response to inference request in 3.5107343196868896s
10 requests
1 failed requests
5th percentile: 1.8825924396514893
10th percentile: 1.8883559703826904
20th percentile: 1.9257455348968506
30th percentile: 1.9805236101150512
40th percentile: 2.303803253173828
50th percentile: 2.5996145009994507
60th percentile: 2.8047502040863037
70th percentile: 3.1288177013397216
80th percentile: 3.497163915634155
90th percentile: 5.169594812393182
95th percentile: 12.634467029571516
99th percentile: 18.60636480331421
mean time: 4.297685241699218
%s, retrying in %s seconds...
Received healthy response to inference request in 2.2921884059906006s
Received healthy response to inference request in 3.9361562728881836s
Received healthy response to inference request in 2.451815605163574s
Received healthy response to inference request in 3.3956313133239746s
Received healthy response to inference request in 3.6995716094970703s
Received healthy response to inference request in 1.755258321762085s
Received healthy response to inference request in 2.554330348968506s
Received healthy response to inference request in 2.6452081203460693s
Received healthy response to inference request in 2.4800548553466797s
Received healthy response to inference request in 3.0222930908203125s
10 requests
0 failed requests
5th percentile: 1.996876859664917
10th percentile: 2.238495397567749
20th percentile: 2.4198901653289795
30th percentile: 2.4715830802917482
40th percentile: 2.5246201515197755
50th percentile: 2.5997692346572876
60th percentile: 2.7960421085357665
70th percentile: 3.134294557571411
80th percentile: 3.4564193725585937
90th percentile: 3.7232300758361814
95th percentile: 3.8296931743621823
99th percentile: 3.9148636531829832
mean time: 2.8232507944107055
Pipeline stage StressChecker completed in 73.57s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.85s
Shutdown handler de-registered
function_bosum_2025-12-14 status is now deployed due to DeploymentManager action
function_bosum_2025-12-14 status is now inactive due to auto deactivation removed underperforming models
function_bosum_2025-12-14 status is now torndown due to DeploymentManager action