function_dugur_2025-12-05

developer_uid: chai_backend_admin

submission_id: function_dugur_2025-12-05

model_name: function_dugur_2025-12-05

model_group:

status: torndown

timestamp: 2025-12-12T18:28:48+00:00

num_battles: 19073

num_wins: 11481

celo_rating: 1365.39

family_friendly_score: 0.5342

family_friendly_standard_error: 0.007054507211705152

submission_type: function

display_name: function_dugur_2025-12-05

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-05

win_ratio: 0.6019504010905469

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.9691600799560547s
Received healthy response to inference request in 4.1773951053619385s
Received healthy response to inference request in 4.394996881484985s
Received healthy response to inference request in 3.7622077465057373s
Received healthy response to inference request in 3.6477229595184326s
Received healthy response to inference request in 3.9949240684509277s
Received healthy response to inference request in 2.523724317550659s
Received healthy response to inference request in 3.0512235164642334s
Received healthy response to inference request in 5.6857099533081055s
Received healthy response to inference request in 0.5214865207672119s
10 requests
0 failed requests
5th percentile: 1.4224935293197634
10th percentile: 2.3235005378723144
20th percentile: 2.9457236766815185
30th percentile: 3.468773126602173
40th percentile: 3.7164138317108155
50th percentile: 3.865683913230896
60th percentile: 3.979465675354004
70th percentile: 4.049665379524231
80th percentile: 4.220915460586548
90th percentile: 4.524068188667297
95th percentile: 5.1048890709877
99th percentile: 5.569545776844024
mean time: 3.5728551149368286
Pipeline stage StressChecker completed in 37.28s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.68s
Shutdown handler de-registered
function_dugur_2025-12-05 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 3950.07s
Shutdown handler de-registered
function_dugur_2025-12-05 status is now inactive due to auto deactivation removed underperforming models
function_dugur_2025-12-05 status is now torndown due to DeploymentManager action