developer_uid: chai_evaluation_service
submission_id: function_maros_2025-12-13
model_name: richard
model_group:
status: inactive
timestamp: 2025-12-13T21:47:10+00:00
num_battles: 6669
num_wins: 3317
celo_rating: 1256.33
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: richard
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-13
win_ratio: 0.49737591842855
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.8852498531341553s
Received healthy response to inference request in 2.383301258087158s
Received healthy response to inference request in 2.0578644275665283s
Received healthy response to inference request in 2.136171579360962s
Received healthy response to inference request in 2.341097593307495s
Received healthy response to inference request in 2.4755945205688477s
Received healthy response to inference request in 1.8448891639709473s
Received healthy response to inference request in 2.4353039264678955s
Received healthy response to inference request in 2.2977542877197266s
10 requests
1 failed requests
5th percentile: 1.8630514740943909
10th percentile: 1.8812137842178345
20th percentile: 2.023341512680054
30th percentile: 2.112679433822632
40th percentile: 2.2331212043762205
50th percentile: 2.319425940513611
60th percentile: 2.3579790592193604
70th percentile: 2.3989020586013794
80th percentile: 2.4433620452880858
90th percentile: 4.239901280403131
95th percentile: 12.179281699657421
99th percentile: 18.530786035060885
mean time: 3.997588872909546
%s, retrying in %s seconds...
Received healthy response to inference request in 2.6085426807403564s
Received healthy response to inference request in 2.3348584175109863s
Received healthy response to inference request in 1.8780462741851807s
Received healthy response to inference request in 3.023810625076294s
Received healthy response to inference request in 2.0324244499206543s
Received healthy response to inference request in 2.0537595748901367s
Received healthy response to inference request in 3.1172738075256348s
Received healthy response to inference request in 1.928715467453003s
Received healthy response to inference request in 2.2201199531555176s
Received healthy response to inference request in 2.67940354347229s
10 requests
0 failed requests
5th percentile: 1.9008474111557008
10th percentile: 1.9236485481262207
20th percentile: 2.011682653427124
30th percentile: 2.047359037399292
40th percentile: 2.153575801849365
50th percentile: 2.277489185333252
60th percentile: 2.4443321228027344
70th percentile: 2.6298009395599364
80th percentile: 2.7482849597930907
90th percentile: 3.033156943321228
95th percentile: 3.0752153754234315
99th percentile: 3.108862121105194
mean time: 2.387695479393005
Pipeline stage StressChecker completed in 66.85s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
function_maros_2025-12-13 status is now deployed due to DeploymentManager action
function_maros_2025-12-13 status is now inactive due to auto deactivation removed underperforming models