function_holit_2025-12-24

developer_uid: chai_backend_admin

submission_id: function_holit_2025-12-24

model_name: abtest_blend

model_group:

status: torndown

timestamp: 2025-12-27T06:31:52+00:00

num_battles: 6064

num_wins: 3351

celo_rating: 1329.99

family_friendly_score: 0.5558000000000001

family_friendly_standard_error: 0.0070268963276826565

submission_type: function

display_name: abtest_blend

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-26

win_ratio: 0.5526055408970977

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.6064202785491943s
Received healthy response to inference request in 1.580111026763916s
Received healthy response to inference request in 1.6141080856323242s
Received healthy response to inference request in 1.4710562229156494s
Received healthy response to inference request in 1.7633097171783447s
Received healthy response to inference request in 1.491037130355835s
Received healthy response to inference request in 1.5964744091033936s
Received healthy response to inference request in 1.493424654006958s
Received healthy response to inference request in 2.282546043395996s
Received healthy response to inference request in 1.6936848163604736s
10 requests
0 failed requests
5th percentile: 1.480047631263733
10th percentile: 1.4890390396118165
20th percentile: 1.4929471492767334
30th percentile: 1.5541051149368286
40th percentile: 1.5899290561676025
50th percentile: 1.601447343826294
60th percentile: 1.6094954013824463
70th percentile: 1.637981104850769
80th percentile: 1.7076097965240478
90th percentile: 1.8152333498001096
95th percentile: 2.0488896965980525
99th percentile: 2.2358147740364074
mean time: 1.6592172384262085
Pipeline stage StressChecker completed in 18.82s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.82s
Shutdown handler de-registered
function_holit_2025-12-24 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 2300.53s
Shutdown handler de-registered
function_holit_2025-12-24 status is now inactive due to auto deactivation removed underperforming models
function_holit_2025-12-24 status is now torndown due to DeploymentManager action