function_sejim_2025-12-18

developer_uid: chai_evaluation_service

submission_id: function_sejim_2025-12-18

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-21T22:01:07+00:00

num_battles: 7520

num_wins: 3695

celo_rating: 1287.09

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-21

win_ratio: 0.4913563829787234

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 4.1804587841033936s
Received healthy response to inference request in 3.343073844909668s
Received healthy response to inference request in 2.5792324542999268s
Received healthy response to inference request in 4.441225528717041s
Received healthy response to inference request in 4.855172872543335s
Received healthy response to inference request in 2.9986562728881836s
Received healthy response to inference request in 2.634082794189453s
Received healthy response to inference request in 1.6804850101470947s
Received healthy response to inference request in 3.386679172515869s
Received healthy response to inference request in 2.065340280532837s
10 requests
0 failed requests
5th percentile: 1.8536698818206787
10th percentile: 2.0268547534942627
20th percentile: 2.476454019546509
30th percentile: 2.6176276922225954
40th percentile: 2.8528268814086912
50th percentile: 3.170865058898926
60th percentile: 3.3605159759521483
70th percentile: 3.624813055992126
80th percentile: 4.232612133026123
90th percentile: 4.48262026309967
95th percentile: 4.668896567821502
99th percentile: 4.817917611598968
mean time: 3.21644070148468
Pipeline stage StressChecker completed in 34.63s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
function_sejim_2025-12-18 status is now deployed due to DeploymentManager action
function_sejim_2025-12-18 status is now inactive due to auto deactivation removed underperforming models
function_sejim_2025-12-18 status is now torndown due to DeploymentManager action