function_torub_2025-12-14

developer_uid: chai_evaluation_service

submission_id: function_torub_2025-12-14

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-17T12:01:24+00:00

num_battles: 8003

num_wins: 4041

celo_rating: 1256.38

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-14

win_ratio: 0.5049356491315756

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.649742841720581s
Received healthy response to inference request in 3.834329128265381s
Received healthy response to inference request in 10.286126375198364s
Received healthy response to inference request in 3.1924824714660645s
Received healthy response to inference request in 3.10188627243042s
Received healthy response to inference request in 2.422548294067383s
Received healthy response to inference request in 2.222346544265747s
Received healthy response to inference request in 3.6152842044830322s
Received healthy response to inference request in 2.6565611362457275s
10 requests
1 failed requests
5th percentile: 2.3124373316764832
10th percentile: 2.4025281190872194
20th percentile: 2.6043039321899415
30th percentile: 2.6545156478881835
40th percentile: 2.923756217956543
50th percentile: 3.147184371948242
60th percentile: 3.3616031646728515
70th percentile: 3.680997681617737
80th percentile: 5.124688577651979
90th percentile: 11.268567514419551
95th percentile: 15.689552640914908
99th percentile: 19.22634074211121
mean time: 5.409184503555298
%s, retrying in %s seconds...
Received healthy response to inference request in 4.091832399368286s
Received healthy response to inference request in 2.8564653396606445s
Received healthy response to inference request in 4.057412147521973s
Received healthy response to inference request in 2.6252074241638184s
Received healthy response to inference request in 4.172095537185669s
Received healthy response to inference request in 4.535449028015137s
Received healthy response to inference request in 3.355637550354004s
Received healthy response to inference request in 3.2654285430908203s
Received healthy response to inference request in 3.262927532196045s
Received healthy response to inference request in 3.310399055480957s
10 requests
0 failed requests
5th percentile: 2.72927348613739
10th percentile: 2.833339548110962
20th percentile: 3.181635093688965
30th percentile: 3.2646782398223877
40th percentile: 3.2924108505249023
50th percentile: 3.3330183029174805
60th percentile: 3.636347389221191
70th percentile: 4.0677382230758665
80th percentile: 4.107885026931763
90th percentile: 4.208430886268616
95th percentile: 4.371939957141876
99th percentile: 4.502747213840484
mean time: 3.5532854557037354
Pipeline stage StressChecker completed in 93.22s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.61s
Shutdown handler de-registered
function_torub_2025-12-14 status is now deployed due to DeploymentManager action
function_torub_2025-12-14 status is now inactive due to auto deactivation removed underperforming models
function_torub_2025-12-14 status is now torndown due to DeploymentManager action