function_paluf_2026-01-24

developer_uid: chai_backend_admin

submission_id: function_paluf_2026-01-24

model_name: abtest_tai

model_group:

status: torndown

timestamp: 2026-01-27T11:18:19+00:00

num_battles: 11015

num_wins: 5624

celo_rating: 1311.81

family_friendly_score: 0.5651999999999999

family_friendly_standard_error: 0.007010691264062339

submission_type: function

display_name: abtest_tai

is_internal_developer: True

ranking_group: single

us_pacific_date: 2026-01-24

win_ratio: 0.5105764866091693

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': True}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.7624313831329346s
Received healthy response to inference request in 1.2947642803192139s
Received healthy response to inference request in 1.5458645820617676s
Received healthy response to inference request in 1.3054821491241455s
Received healthy response to inference request in 1.4149200916290283s
Received healthy response to inference request in 2.4554951190948486s
Received healthy response to inference request in 1.2651939392089844s
Received healthy response to inference request in 1.2400097846984863s
Received healthy response to inference request in 1.4888639450073242s
Received healthy response to inference request in 1.4080464839935303s
10 requests
0 failed requests
5th percentile: 1.2513426542282104
10th percentile: 1.2626755237579346
20th percentile: 1.2888502120971679
30th percentile: 1.302266788482666
40th percentile: 1.3670207500457763
50th percentile: 1.4114832878112793
60th percentile: 1.4444976329803467
70th percentile: 1.5059641361236573
80th percentile: 1.589177942276001
90th percentile: 1.8317377567291258
95th percentile: 2.1436164379119864
99th percentile: 2.3931193828582766
mean time: 1.5181071758270264
Pipeline stage StressChecker completed in 16.44s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.57s
Shutdown handler de-registered
function_paluf_2026-01-24 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 2346.40s
Shutdown handler de-registered
function_paluf_2026-01-24 status is now inactive due to auto deactivation removed underperforming models
function_paluf_2026-01-24 status is now torndown due to DeploymentManager action