function_tufel_2025-12-13

developer_uid: chai_evaluation_service

submission_id: function_tufel_2025-12-13

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-16T10:27:15+00:00

num_battles: 9899

num_wins: 5017

celo_rating: 1256.33

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-13

win_ratio: 0.5068188705929892

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 4.530766725540161s
Received healthy response to inference request in 2.4251837730407715s
Received healthy response to inference request in 4.488572597503662s
Received healthy response to inference request in 2.795598030090332s
Received healthy response to inference request in 3.1738829612731934s
Received healthy response to inference request in 2.799696445465088s
Received healthy response to inference request in 3.3720288276672363s
Received healthy response to inference request in 3.5529396533966064s
Received healthy response to inference request in 2.3920459747314453s
Received healthy response to inference request in 3.0727202892303467s
10 requests
0 failed requests
5th percentile: 2.406957983970642
10th percentile: 2.421869993209839
20th percentile: 2.72151517868042
30th percentile: 2.798466920852661
40th percentile: 2.963510751724243
50th percentile: 3.12330162525177
60th percentile: 3.2531413078308105
70th percentile: 3.4263020753860474
80th percentile: 3.7400662422180178
90th percentile: 4.492792010307312
95th percentile: 4.511779367923737
99th percentile: 4.526969254016876
mean time: 3.2603435277938844
Pipeline stage StressChecker completed in 34.08s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.62s
Shutdown handler de-registered
function_tufel_2025-12-13 status is now deployed due to DeploymentManager action
function_tufel_2025-12-13 status is now inactive due to auto deactivation removed underperforming models
function_tufel_2025-12-13 status is now torndown due to DeploymentManager action