function_lugum_2025-12-15

developer_uid: chai_evaluation_service

submission_id: function_lugum_2025-12-15

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-18T02:41:22+00:00

num_battles: 8965

num_wins: 4454

celo_rating: 1291.13

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-17

win_ratio: 0.49682097044060236

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.9816384315490723s
Received healthy response to inference request in 2.533991575241089s
Received healthy response to inference request in 2.42401385307312s
Received healthy response to inference request in 2.7614824771881104s
Received healthy response to inference request in 3.078427791595459s
Received healthy response to inference request in 3.560995101928711s
Received healthy response to inference request in 2.5323896408081055s
Received healthy response to inference request in 2.0974416732788086s
Received healthy response to inference request in 2.325486898422241s
Received healthy response to inference request in 3.9590723514556885s
10 requests
0 failed requests
5th percentile: 2.200062024593353
10th percentile: 2.302682375907898
20th percentile: 2.4043084621429442
30th percentile: 2.4998769044876097
40th percentile: 2.5333508014678956
50th percentile: 2.6477370262145996
60th percentile: 2.849544858932495
70th percentile: 3.010675239562988
80th percentile: 3.1749412536621096
90th percentile: 3.6008028268814085
95th percentile: 3.779937589168548
99th percentile: 3.9232453989982607
mean time: 2.8254939794540403
Pipeline stage StressChecker completed in 29.60s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.58s
Shutdown handler de-registered
function_lugum_2025-12-15 status is now deployed due to DeploymentManager action
function_lugum_2025-12-15 status is now inactive due to auto deactivation removed underperforming models
function_lugum_2025-12-15 status is now torndown due to DeploymentManager action