function_hefut_2025-12-14

developer_uid: chai_evaluation_service

submission_id: function_hefut_2025-12-14

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-17T09:01:20+00:00

num_battles: 8174

num_wins: 4092

celo_rating: 1256.37

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-14

win_ratio: 0.5006116956202593

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.352416515350342s
Received healthy response to inference request in 4.442734718322754s
Received healthy response to inference request in 2.6374752521514893s
Received healthy response to inference request in 2.6971471309661865s
Received healthy response to inference request in 2.6679069995880127s
Received healthy response to inference request in 3.5049684047698975s
Received healthy response to inference request in 3.367689371109009s
Received healthy response to inference request in 4.078703880310059s
Received healthy response to inference request in 2.236013889312744s
10 requests
1 failed requests
5th percentile: 2.288395071029663
10th percentile: 2.340776252746582
20th percentile: 2.58046350479126
30th percentile: 2.6587774753570557
40th percentile: 2.685451078414917
50th percentile: 3.0324182510375977
60th percentile: 3.4226009845733643
70th percentile: 3.6770890474319455
80th percentile: 4.1515100479125975
90th percentile: 6.008710575103754
95th percentile: 13.05560193061827
99th percentile: 18.69311501502991
mean time: 4.808754944801331
%s, retrying in %s seconds...
Received healthy response to inference request in 2.6536715030670166s
Received healthy response to inference request in 3.3977577686309814s
Received healthy response to inference request in 2.7963335514068604s
Received healthy response to inference request in 4.155343294143677s
Received healthy response to inference request in 1.8905775547027588s
Received healthy response to inference request in 3.631511688232422s
Received healthy response to inference request in 2.308520555496216s
Received healthy response to inference request in 2.5216047763824463s
Received healthy response to inference request in 3.2265028953552246s
Received healthy response to inference request in 3.195927858352661s
10 requests
0 failed requests
5th percentile: 2.0786519050598145
10th percentile: 2.26672625541687
20th percentile: 2.4789879322052
30th percentile: 2.6140514850616454
40th percentile: 2.739268732070923
50th percentile: 2.9961307048797607
60th percentile: 3.2081578731536866
70th percentile: 3.2778793573379517
80th percentile: 3.4445085525512695
90th percentile: 3.683894848823547
95th percentile: 3.9196190714836114
99th percentile: 4.108198449611664
mean time: 2.977775144577026
Pipeline stage StressChecker completed in 80.63s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.55s
Shutdown handler de-registered
function_hefut_2025-12-14 status is now deployed due to DeploymentManager action
function_hefut_2025-12-14 status is now inactive due to auto deactivation removed underperforming models
function_hefut_2025-12-14 status is now torndown due to DeploymentManager action