developer_uid: chai_evaluation_service
submission_id: function_narit_2025-12-14
model_name: richard
model_group:
status: inactive
timestamp: 2025-12-14T07:27:12+00:00
num_battles: 5902
num_wins: 2943
celo_rating: 1256.36
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: richard
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-13
win_ratio: 0.4986445272788885
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.688263416290283s
Received healthy response to inference request in 4.059533596038818s
Received healthy response to inference request in 3.447800397872925s
Received healthy response to inference request in 3.303880453109741s
Received healthy response to inference request in 4.29269003868103s
Received healthy response to inference request in 1.8563244342803955s
Received healthy response to inference request in 2.3948757648468018s
Received healthy response to inference request in 1.9359488487243652s
Received healthy response to inference request in 5.358246803283691s
10 requests
1 failed requests
5th percentile: 1.8921554207801818
10th percentile: 1.9279864072799682
20th percentile: 2.3030903816223143
30th percentile: 3.031179046630859
40th percentile: 3.3902324199676515
50th percentile: 3.568031907081604
60th percentile: 3.8367714881896973
70th percentile: 4.129480528831482
80th percentile: 4.505801391601563
90th percentile: 6.833523392677302
95th percentile: 13.472268044948562
99th percentile: 18.783263766765597
mean time: 5.04485764503479
%s, retrying in %s seconds...
Received healthy response to inference request in 3.604468822479248s
Received healthy response to inference request in 4.466591835021973s
Received healthy response to inference request in 3.053299903869629s
Received healthy response to inference request in 7.770157337188721s
Received healthy response to inference request in 2.374810218811035s
Received healthy response to inference request in 6.526210784912109s
Received healthy response to inference request in 8.56856656074524s
Received healthy response to inference request in 2.9172842502593994s
Received healthy response to inference request in 9.318123817443848s
Received healthy response to inference request in 6.977404594421387s
10 requests
0 failed requests
5th percentile: 2.618923532962799
10th percentile: 2.863036847114563
20th percentile: 3.026096773147583
30th percentile: 3.4391181468963623
40th percentile: 4.121742630004883
50th percentile: 5.496401309967041
60th percentile: 6.70668830871582
70th percentile: 7.215230417251587
80th percentile: 7.929839181900024
90th percentile: 8.6435222864151
95th percentile: 8.980823051929473
99th percentile: 9.250663664340973
mean time: 5.557691812515259
Pipeline stage StressChecker completed in 108.93s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.92s
Shutdown handler de-registered
function_narit_2025-12-14 status is now deployed due to DeploymentManager action
function_narit_2025-12-14 status is now inactive due to auto deactivation removed underperforming models