function_serut_2026-01-02

developer_uid: chai_backend_admin

submission_id: function_serut_2026-01-02

model_name: abtest_blend

model_group:

status: torndown

timestamp: 2026-01-05T08:01:36+00:00

num_battles: 5850

num_wins: 2914

celo_rating: 1306.17

family_friendly_score: 0.6148

family_friendly_standard_error: 0.006882164775708295

submission_type: function

display_name: abtest_blend

is_internal_developer: True

ranking_group: single

us_pacific_date: 2026-01-01

win_ratio: 0.49811965811965814

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.1035277843475342s
Received healthy response to inference request in 1.6220080852508545s
Received healthy response to inference request in 2.266000509262085s
Received healthy response to inference request in 1.2030370235443115s
Received healthy response to inference request in 1.3350579738616943s
Received healthy response to inference request in 1.4822001457214355s
Received healthy response to inference request in 1.2295799255371094s
Received healthy response to inference request in 0.9001064300537109s
Received healthy response to inference request in 1.104015588760376s
Received healthy response to inference request in 1.5210509300231934s
10 requests
0 failed requests
5th percentile: 0.9916460394859314
10th percentile: 1.083185648918152
20th percentile: 1.1039180278778076
30th percentile: 1.173330593109131
40th percentile: 1.2189627647399903
50th percentile: 1.2823189496994019
60th percentile: 1.3939148426055907
70th percentile: 1.4938553810119628
80th percentile: 1.5412423610687256
90th percentile: 1.6864073276519773
95th percentile: 1.9762039184570306
99th percentile: 2.208041191101074
mean time: 1.3766584396362305
Pipeline stage StressChecker completed in 14.99s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
function_serut_2026-01-02 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 1396.68s
Shutdown handler de-registered
function_serut_2026-01-02 status is now inactive due to auto deactivation removed underperforming models
function_serut_2026-01-02 status is now torndown due to DeploymentManager action