function_bekam_2025-12-16

developer_uid: chai_evaluation_service

submission_id: function_bekam_2025-12-16

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-19T23:06:27+00:00

num_battles: 8008

num_wins: 3959

celo_rating: 1289.33

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-19

win_ratio: 0.4943806193806194

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.7277679443359375s
Received healthy response to inference request in 3.0882606506347656s
Received healthy response to inference request in 3.6081089973449707s
Received healthy response to inference request in 4.826552152633667s
Received healthy response to inference request in 9.0401029586792s
Received healthy response to inference request in 7.831790447235107s
Received healthy response to inference request in 12.036410570144653s
Received healthy response to inference request in 3.9197866916656494s
Received healthy response to inference request in 5.4527976512908936s
Received healthy response to inference request in 1.8434937000274658s
10 requests
0 failed requests
5th percentile: 1.7798445343971252
10th percentile: 1.831921124458313
20th percentile: 2.839307260513306
30th percentile: 3.452154493331909
40th percentile: 3.795115613937378
50th percentile: 4.373169422149658
60th percentile: 5.077050352096557
70th percentile: 6.1664954900741575
80th percentile: 8.073452949523926
90th percentile: 9.339733719825743
95th percentile: 10.688072144985195
99th percentile: 11.766742885112762
mean time: 5.337507176399231
Pipeline stage StressChecker completed in 54.71s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.63s
Shutdown handler de-registered
function_bekam_2025-12-16 status is now deployed due to DeploymentManager action
function_bekam_2025-12-16 status is now inactive due to auto deactivation removed underperforming models
function_bekam_2025-12-16 status is now torndown due to DeploymentManager action