function_milam_2025-12-16

developer_uid: chai_evaluation_service

submission_id: function_milam_2025-12-16

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-19T08:51:23+00:00

num_battles: 7654

num_wins: 3905

celo_rating: 1300.26

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-19

win_ratio: 0.5101907499346747

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 5.3765480518341064s
Received healthy response to inference request in 3.0210580825805664s
Received healthy response to inference request in 3.0619735717773438s
Received healthy response to inference request in 3.376189708709717s
Received healthy response to inference request in 2.9241719245910645s
Received healthy response to inference request in 3.323610782623291s
Received healthy response to inference request in 2.082059860229492s
Received healthy response to inference request in 2.71095609664917s
Received healthy response to inference request in 2.1311655044555664s
Received healthy response to inference request in 4.7097227573394775s
10 requests
0 failed requests
5th percentile: 2.1041574001312258
10th percentile: 2.126254940032959
20th percentile: 2.5949979782104493
30th percentile: 2.860207176208496
40th percentile: 2.9823036193847656
50th percentile: 3.041515827178955
60th percentile: 3.1666284561157223
70th percentile: 3.3393844604492187
80th percentile: 3.6428963184356693
90th percentile: 4.776405286788941
95th percentile: 5.076476669311523
99th percentile: 5.31653377532959
mean time: 3.2717456340789797
Pipeline stage StressChecker completed in 33.96s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.58s
Shutdown handler de-registered
function_milam_2025-12-16 status is now deployed due to DeploymentManager action
function_milam_2025-12-16 status is now inactive due to auto deactivation removed underperforming models
function_milam_2025-12-16 status is now torndown due to DeploymentManager action