function_lopum_2025-12-15

developer_uid: chai_evaluation_service

submission_id: function_lopum_2025-12-15

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-18T16:21:16+00:00

num_battles: 8639

num_wins: 4222

celo_rating: 1285.35

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-18

win_ratio: 0.488713971524482

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.409670829772949s
Received healthy response to inference request in 3.615412473678589s
Received healthy response to inference request in 4.0738911628723145s
Received healthy response to inference request in 3.1766746044158936s
Received healthy response to inference request in 2.939577102661133s
Received healthy response to inference request in 2.9373795986175537s
Received healthy response to inference request in 2.955643892288208s
Received healthy response to inference request in 7.555761098861694s
Received healthy response to inference request in 3.674471855163574s
Received healthy response to inference request in 3.5638465881347656s
10 requests
0 failed requests
5th percentile: 2.9383684754371644
10th percentile: 2.939357352256775
20th percentile: 2.952430534362793
30th percentile: 3.110365390777588
40th percentile: 3.3164723396301268
50th percentile: 3.4867587089538574
60th percentile: 3.5844729423522947
70th percentile: 3.6331302881240846
80th percentile: 3.7543557167053225
90th percentile: 4.422078156471251
95th percentile: 5.988919627666469
99th percentile: 7.242392804622651
mean time: 3.7902329206466674
Pipeline stage StressChecker completed in 39.23s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.83s
Shutdown handler de-registered
function_lopum_2025-12-15 status is now deployed due to DeploymentManager action
function_lopum_2025-12-15 status is now inactive due to auto deactivation removed underperforming models
function_lopum_2025-12-15 status is now torndown due to DeploymentManager action