function_bepum_2025-12-18

developer_uid: chai_evaluation_service

submission_id: function_bepum_2025-12-18

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-21T21:21:15+00:00

num_battles: 9168

num_wins: 4569

celo_rating: 1292.01

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-21

win_ratio: 0.49836387434554974

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.652301788330078s
Received healthy response to inference request in 1.6501848697662354s
Received healthy response to inference request in 3.4793570041656494s
Received healthy response to inference request in 2.787252187728882s
Received healthy response to inference request in 2.10988450050354s
Received healthy response to inference request in 3.009779691696167s
Received healthy response to inference request in 2.8800718784332275s
Received healthy response to inference request in 2.055363416671753s
Received healthy response to inference request in 2.357851266860962s
Received healthy response to inference request in 2.99991774559021s
10 requests
0 failed requests
5th percentile: 1.8325152158737184
10th percentile: 2.0148455619812013
20th percentile: 2.0989802837371827
30th percentile: 2.2834612369537353
40th percentile: 2.5345215797424316
50th percentile: 2.71977698802948
60th percentile: 2.82438006401062
70th percentile: 2.9160256385803223
80th percentile: 3.0018901348114015
90th percentile: 3.0567374229431152
95th percentile: 3.268047213554382
99th percentile: 3.437095046043396
mean time: 2.5981964349746702
Pipeline stage StressChecker completed in 27.97s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_bepum_2025-12-18 status is now deployed due to DeploymentManager action
function_bepum_2025-12-18 status is now inactive due to auto deactivation removed underperforming models
function_bepum_2025-12-18 status is now torndown due to DeploymentManager action