function_tufeb_2025-12-02

developer_uid: chai_backend_admin

submission_id: function_tufeb_2025-12-02

model_name: function_tufeb_2025-12-02

model_group:

status: torndown

timestamp: 2025-12-12T18:36:36+00:00

num_battles: 5307

num_wins: 2784

celo_rating: 1312.03

family_friendly_score: 0.5236000000000001

family_friendly_standard_error: 0.007063186816161669

submission_type: function

display_name: function_tufeb_2025-12-02

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-02

win_ratio: 0.5245901639344263

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.9932942390441895s
Received healthy response to inference request in 2.0620501041412354s
Received healthy response to inference request in 2.1835427284240723s
Received healthy response to inference request in 1.5431160926818848s
Received healthy response to inference request in 1.9252445697784424s
Received healthy response to inference request in 2.0348548889160156s
Received healthy response to inference request in 1.9171640872955322s
Received healthy response to inference request in 1.5602078437805176s
Received healthy response to inference request in 1.8404464721679688s
Received healthy response to inference request in 2.058833599090576s
10 requests
0 failed requests
5th percentile: 1.5508073806762694
10th percentile: 1.5584986686706543
20th percentile: 1.7843987464904785
30th percentile: 1.8941488027572633
40th percentile: 1.9220123767852784
50th percentile: 1.959269404411316
60th percentile: 2.0099184989929197
70th percentile: 2.0420485019683836
80th percentile: 2.059476900100708
90th percentile: 2.074199366569519
95th percentile: 2.1288710474967956
99th percentile: 2.172608392238617
mean time: 1.9118754625320435
Pipeline stage StressChecker completed in 20.44s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.62s
Shutdown handler de-registered
function_tufeb_2025-12-02 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 2457.40s
Shutdown handler de-registered
function_tufeb_2025-12-02 status is now inactive due to auto deactivation removed underperforming models
function_tufeb_2025-12-02 status is now torndown due to DeploymentManager action