function_lulor_2025-12-18

developer_uid: chai_evaluation_service

submission_id: function_lulor_2025-12-18

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-21T16:31:11+00:00

num_battles: 9802

num_wins: 4779

celo_rating: 1284.56

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-21

win_ratio: 0.4875535604978576

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.967862844467163s
Received healthy response to inference request in 3.199974775314331s
Received healthy response to inference request in 3.081909656524658s
Received healthy response to inference request in 2.777773857116699s
Received healthy response to inference request in 4.188955068588257s
Received healthy response to inference request in 2.2121143341064453s
Received healthy response to inference request in 2.2755963802337646s
Received healthy response to inference request in 1.9316494464874268s
Received healthy response to inference request in 3.3385860919952393s
Received healthy response to inference request in 2.0506694316864014s
10 requests
0 failed requests
5th percentile: 1.9852084398269654
10th percentile: 2.038767433166504
20th percentile: 2.1798253536224363
30th percentile: 2.2565517663955688
40th percentile: 2.5769028663635254
50th percentile: 2.9298417568206787
60th percentile: 3.1291357040405274
70th percentile: 3.2415581703186036
80th percentile: 3.464441442489624
90th percentile: 3.989972066879272
95th percentile: 4.089463567733764
99th percentile: 4.1690567684173585
mean time: 2.9025091886520387
Pipeline stage StressChecker completed in 30.60s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.83s
Shutdown handler de-registered
function_lulor_2025-12-18 status is now deployed due to DeploymentManager action
function_lulor_2025-12-18 status is now inactive due to auto deactivation removed underperforming models
function_lulor_2025-12-18 status is now torndown due to DeploymentManager action