function_lulus_2025-12-17

developer_uid: chai_evaluation_service

submission_id: function_lulus_2025-12-17

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-20T12:31:07+00:00

num_battles: 7876

num_wins: 3962

celo_rating: 1295.38

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-20

win_ratio: 0.5030472320975115

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.2531683444976807s
Received healthy response to inference request in 2.1275272369384766s
Received healthy response to inference request in 1.8787386417388916s
Received healthy response to inference request in 1.792285680770874s
Received healthy response to inference request in 2.2297074794769287s
Received healthy response to inference request in 1.8856289386749268s
Received healthy response to inference request in 3.203087091445923s
Received healthy response to inference request in 2.6595113277435303s
Received healthy response to inference request in 2.424257755279541s
Received healthy response to inference request in 2.4872095584869385s
10 requests
0 failed requests
5th percentile: 1.831189513206482
10th percentile: 1.8700933456420898
20th percentile: 1.8842508792877197
30th percentile: 2.0549577474594116
40th percentile: 2.188835382461548
50th percentile: 2.2414379119873047
60th percentile: 2.321604108810425
70th percentile: 2.44314329624176
80th percentile: 2.5216699123382567
90th percentile: 2.7138689041137694
95th percentile: 2.9584779977798457
99th percentile: 3.1541652727127074
mean time: 2.294112205505371
Pipeline stage StressChecker completed in 24.23s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.62s
Shutdown handler de-registered
function_lulus_2025-12-17 status is now deployed due to DeploymentManager action
function_lulus_2025-12-17 status is now inactive due to auto deactivation removed underperforming models
function_lulus_2025-12-17 status is now torndown due to DeploymentManager action