function_nanur_2026-01-09

developer_uid: chai_backend_admin

submission_id: function_nanur_2026-01-09

model_name: abtest_blend

model_group:

status: torndown

timestamp: 2026-01-14T16:59:57+00:00

num_battles: 11219

num_wins: 6148

celo_rating: 1325.2

family_friendly_score: 0.4962

family_friendly_standard_error: 0.007070863596478156

submission_type: function

display_name: abtest_blend

is_internal_developer: True

ranking_group: single

us_pacific_date: 2026-01-09

win_ratio: 0.5479989303859524

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.120802402496338s
Received healthy response to inference request in 1.4459619522094727s
Received healthy response to inference request in 1.4381811618804932s
Received healthy response to inference request in 2.0904340744018555s
Received healthy response to inference request in 1.9124555587768555s
Received healthy response to inference request in 1.3893945217132568s
Received healthy response to inference request in 1.4334287643432617s
Received healthy response to inference request in 1.4599692821502686s
Received healthy response to inference request in 2.422063112258911s
Received healthy response to inference request in 2.266972780227661s
10 requests
0 failed requests
5th percentile: 1.409209930896759
10th percentile: 1.4290253400802613
20th percentile: 1.437230682373047
30th percentile: 1.4436277151107788
40th percentile: 1.45436635017395
50th percentile: 1.686212420463562
60th percentile: 1.9836469650268553
70th percentile: 2.0995445728302
80th percentile: 2.1500364780426025
90th percentile: 2.282481813430786
95th percentile: 2.3522724628448484
99th percentile: 2.4081049823760985
mean time: 1.7979663610458374
Pipeline stage StressChecker completed in 19.48s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.64s
Shutdown handler de-registered
function_nanur_2026-01-09 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 1952.91s
Shutdown handler de-registered
function_nanur_2026-01-09 status is now inactive due to auto deactivation removed underperforming models
function_nanur_2026-01-09 status is now torndown due to DeploymentManager action