developer_uid: chai_backend_admin
submission_id: function_tiraf_2025-12-23
model_name: abtest_blend
model_group:
status: protected
timestamp: 2025-12-23T22:08:13+00:00
num_battles: 1729
num_wins: 978
celo_rating: 1335.57
family_friendly_score: 0.5518000000000001
family_friendly_standard_error: 0.007033018697543751
submission_type: function
display_name: abtest_blend
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-23
win_ratio: 0.5656448814343551
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.6897125244140625s
Received healthy response to inference request in 1.7644262313842773s
Received healthy response to inference request in 1.8646812438964844s
Received healthy response to inference request in 1.759943962097168s
Received healthy response to inference request in 1.859372615814209s
Received healthy response to inference request in 1.7271583080291748s
Received healthy response to inference request in 1.8147096633911133s
Received healthy response to inference request in 1.6865119934082031s
Received healthy response to inference request in 1.6487352848052979s
Received healthy response to inference request in 1.6441123485565186s
10 requests
0 failed requests
5th percentile: 1.6461926698684692
10th percentile: 1.64827299118042
20th percentile: 1.678956651687622
30th percentile: 1.6887523651123046
40th percentile: 1.71217999458313
50th percentile: 1.7435511350631714
60th percentile: 1.7617368698120117
70th percentile: 1.7795112609863282
80th percentile: 1.8236422538757324
90th percentile: 1.8599034786224364
95th percentile: 1.8622923612594604
99th percentile: 1.8642034673690795
mean time: 1.7459364175796508
Pipeline stage StressChecker completed in 18.78s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.58s
Shutdown handler de-registered
function_tiraf_2025-12-23 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 2871.32s
Shutdown handler de-registered