developer_uid: chai_evaluation_service
submission_id: function_pefes_2025-12-13
model_name: richard
model_group:
status: inactive
timestamp: 2025-12-13T17:46:53+00:00
num_battles: 5467
num_wins: 2699
celo_rating: 1289.22
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: richard
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-13
win_ratio: 0.4936894091823669
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.020528793334961s
Received healthy response to inference request in 3.7057645320892334s
Received healthy response to inference request in 2.850085735321045s
Received healthy response to inference request in 2.656290054321289s
Received healthy response to inference request in 3.7429850101470947s
Received healthy response to inference request in 1.913618564605713s
Received healthy response to inference request in 3.6670336723327637s
Received healthy response to inference request in 2.475975751876831s
Received healthy response to inference request in 4.322496175765991s
10 requests
1 failed requests
5th percentile: 1.9617281675338745
10th percentile: 2.009837770462036
20th percentile: 2.384886360168457
30th percentile: 2.6021957635879516
40th percentile: 2.7725674629211428
50th percentile: 3.2585597038269043
60th percentile: 3.6825260162353515
70th percentile: 3.7169306755065916
80th percentile: 3.8588872432708743
90th percentile: 5.900058722496027
95th percentile: 12.999090182781202
99th percentile: 18.67831535100937
mean time: 4.745289993286133
%s, retrying in %s seconds...
Received healthy response to inference request in 4.3224523067474365s
Received healthy response to inference request in 2.0140395164489746s
Received healthy response to inference request in 3.5700228214263916s
Received healthy response to inference request in 3.2021613121032715s
Received healthy response to inference request in 3.4757440090179443s
Received healthy response to inference request in 3.324091911315918s
Received healthy response to inference request in 3.2267117500305176s
Received healthy response to inference request in 3.2558364868164062s
Received healthy response to inference request in 4.096298694610596s
Received healthy response to inference request in 4.611887216567993s
10 requests
0 failed requests
5th percentile: 2.5486943244934084
10th percentile: 3.0833491325378417
20th percentile: 3.2218016624450683
30th percentile: 3.2470990657806396
40th percentile: 3.2967897415161134
50th percentile: 3.399917960166931
60th percentile: 3.513455533981323
70th percentile: 3.7279055833816526
80th percentile: 4.1415294170379635
90th percentile: 4.351395797729492
95th percentile: 4.481641507148742
99th percentile: 4.585838074684143
mean time: 3.509924602508545
Pipeline stage StressChecker completed in 85.11s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_pefes_2025-12-13 status is now deployed due to DeploymentManager action
function_pefes_2025-12-13 status is now inactive due to auto deactivation removed underperforming models