function_halet_2025-12-15

developer_uid: chai_evaluation_service

submission_id: function_halet_2025-12-15

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-18T01:41:21+00:00

num_battles: 8963

num_wins: 4610

celo_rating: 1303.27

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-17

win_ratio: 0.5143367176168694

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.099116086959839s
Received healthy response to inference request in 2.3141095638275146s
Received healthy response to inference request in 2.4014124870300293s
Received healthy response to inference request in 2.599626302719116s
Received healthy response to inference request in 2.583409309387207s
Received healthy response to inference request in 1.8658080101013184s
Received healthy response to inference request in 1.8855435848236084s
Received healthy response to inference request in 2.5595595836639404s
Received healthy response to inference request in 2.376695394515991s
10 requests
1 failed requests
5th percentile: 1.874689018726349
10th percentile: 1.8835700273513794
20th percentile: 2.2283963680267336
30th percentile: 2.3579196453094484
40th percentile: 2.391525650024414
50th percentile: 2.480486035346985
60th percentile: 2.569099473953247
70th percentile: 2.5882744073867796
80th percentile: 2.699524259567261
90th percentile: 4.806880307197565
95th percentile: 12.491819298267346
99th percentile: 18.6397704911232
mean time: 4.186203861236573
%s, retrying in %s seconds...
Received healthy response to inference request in 1.9970805644989014s
Received healthy response to inference request in 2.2232799530029297s
Received healthy response to inference request in 2.5518605709075928s
Received healthy response to inference request in 1.606931209564209s
Received healthy response to inference request in 2.9091365337371826s
Received healthy response to inference request in 2.586606025695801s
Received healthy response to inference request in 1.9723610877990723s
Received healthy response to inference request in 2.558572292327881s
Received healthy response to inference request in 2.649152994155884s
Received healthy response to inference request in 2.636986017227173s
10 requests
0 failed requests
5th percentile: 1.7713746547698974
10th percentile: 1.9358180999755858
20th percentile: 1.9921366691589355
30th percentile: 2.155420136451721
40th percentile: 2.4204283237457274
50th percentile: 2.555216431617737
60th percentile: 2.569785785675049
70th percentile: 2.6017200231552122
80th percentile: 2.639419412612915
90th percentile: 2.6751513481140137
95th percentile: 2.7921439409255977
99th percentile: 2.8857380151748657
mean time: 2.3691967248916628
Pipeline stage StressChecker completed in 68.83s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.70s
Shutdown handler de-registered
function_halet_2025-12-15 status is now deployed due to DeploymentManager action
function_halet_2025-12-15 status is now inactive due to auto deactivation removed underperforming models
function_halet_2025-12-15 status is now torndown due to DeploymentManager action