developer_uid: chai_evaluation_service
submission_id: function_falun_2025-12-16
model_name: richard
model_group:
status: torndown
timestamp: 2025-12-19T18:51:21+00:00
num_battles: 10473
num_wins: 5199
celo_rating: 1290.61
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: richard
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-19
win_ratio: 0.49641936407906045
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 7.016653776168823s
Received healthy response to inference request in 2.626676559448242s
Received healthy response to inference request in 10.118724822998047s
Received healthy response to inference request in 4.437152147293091s
Received healthy response to inference request in 9.536266326904297s
Received healthy response to inference request in 4.41582989692688s
Received healthy response to inference request in 2.683917284011841s
Received healthy response to inference request in 3.8454651832580566s
Received healthy response to inference request in 5.908232927322388s
Received healthy response to inference request in 3.5253007411956787s
10 requests
0 failed requests
5th percentile: 2.6524348855018616
10th percentile: 2.678193211555481
20th percentile: 3.357024049758911
30th percentile: 3.7494158506393434
40th percentile: 4.187684011459351
50th percentile: 4.426491022109985
60th percentile: 5.025584459304809
70th percentile: 6.240759181976318
80th percentile: 7.520576286315919
90th percentile: 9.594512176513671
95th percentile: 9.856618499755859
99th percentile: 10.06630355834961
mean time: 5.4114219665527346
Pipeline stage StressChecker completed in 56.34s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.10s
Shutdown handler de-registered
function_falun_2025-12-16 status is now deployed due to DeploymentManager action
function_falun_2025-12-16 status is now inactive due to auto deactivation removed underperforming models
function_falun_2025-12-16 status is now torndown due to DeploymentManager action