function_burum_2025-12-16

developer_uid: chai_evaluation_service

submission_id: function_burum_2025-12-16

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-19T14:21:20+00:00

num_battles: 9009

num_wins: 4467

celo_rating: 1290.25

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-19

win_ratio: 0.4958374958374958

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 8.852730512619019s
Received healthy response to inference request in 3.541332483291626s
Received healthy response to inference request in 2.522233724594116s
Received healthy response to inference request in 4.401433944702148s
Received healthy response to inference request in 7.34083890914917s
Received healthy response to inference request in 1.824559211730957s
Received healthy response to inference request in 4.020044803619385s
Received healthy response to inference request in 2.972717761993408s
Received healthy response to inference request in 4.249311923980713s
Received healthy response to inference request in 3.240426778793335s
10 requests
0 failed requests
5th percentile: 2.138512742519379
10th percentile: 2.4524662733078
20th percentile: 2.8826209545135497
30th percentile: 3.160114073753357
40th percentile: 3.4209702014923096
50th percentile: 3.7806886434555054
60th percentile: 4.111751651763916
70th percentile: 4.294948530197144
80th percentile: 4.989314937591553
90th percentile: 7.492028069496154
95th percentile: 8.172379291057585
99th percentile: 8.716660268306732
mean time: 4.296563005447387
Pipeline stage StressChecker completed in 44.42s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_burum_2025-12-16 status is now deployed due to DeploymentManager action
function_burum_2025-12-16 status is now inactive due to auto deactivation removed underperforming models
function_burum_2025-12-16 status is now torndown due to DeploymentManager action