developer_uid: chai_backend_admin
submission_id: function_rigen_2026-01-24
model_name: abtest_tai
model_group:
status: inactive
timestamp: 2026-01-24T17:11:15+00:00
num_battles: 8986
num_wins: 4658
celo_rating: 1316.63
family_friendly_score: 0.5582
family_friendly_standard_error: 0.007023001637476671
submission_type: function
display_name: abtest_tai
is_internal_developer: True
ranking_group: single
us_pacific_date: 2026-01-24
win_ratio: 0.518361896283107
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': True}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.3332972526550293s
Received healthy response to inference request in 2.605494260787964s
Received healthy response to inference request in 1.3466765880584717s
Received healthy response to inference request in 1.4142796993255615s
Received healthy response to inference request in 1.4817440509796143s
Received healthy response to inference request in 1.1632757186889648s
Received healthy response to inference request in 1.4464318752288818s
Received healthy response to inference request in 1.2524774074554443s
Received healthy response to inference request in 1.6754913330078125s
Received healthy response to inference request in 1.453263521194458s
10 requests
0 failed requests
5th percentile: 1.2034164786338806
10th percentile: 1.2435572385787963
20th percentile: 1.3171332836151124
30th percentile: 1.342662787437439
40th percentile: 1.3872384548187255
50th percentile: 1.4303557872772217
60th percentile: 1.4491645336151122
70th percentile: 1.4618076801300048
80th percentile: 1.520493507385254
90th percentile: 1.7684916257858272
95th percentile: 2.186992943286895
99th percentile: 2.5217939972877503
mean time: 1.5172431707382201
Pipeline stage StressChecker completed in 16.61s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.58s
Shutdown handler de-registered
function_rigen_2026-01-24 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 2430.94s
Shutdown handler de-registered
function_rigen_2026-01-24 status is now inactive due to admin request