function_lotab_2026-02-03

developer_uid: chai_backend_admin

submission_id: function_lotab_2026-02-03

model_name: function_lotab_2026-02-03

model_group:

status: torndown

timestamp: 2026-02-07T01:21:43+00:00

num_battles: 11238

num_wins: 5508

celo_rating: 1297.92

family_friendly_score: 0.573

family_friendly_standard_error: 0.006995298421082549

submission_type: function

display_name: function_lotab_2026-02-03

is_internal_developer: True

ranking_group: single

us_pacific_date: 2026-02-03

win_ratio: 0.49012279765082756

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': 'CUSTOM', 'prompt_template': 'CUSTOM', 'bot_template': 'CUSTOM', 'user_template': 'CUSTOM', 'response_template': 'CUSTOM', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.281043291091919s
Received healthy response to inference request in 1.6790063381195068s
Received healthy response to inference request in 2.080828905105591s
Received healthy response to inference request in 1.5114080905914307s
Received healthy response to inference request in 1.3034372329711914s
Received healthy response to inference request in 1.4833643436431885s
Received healthy response to inference request in 1.5377795696258545s
Received healthy response to inference request in 1.4253404140472412s
Received healthy response to inference request in 1.3033983707427979s
Received healthy response to inference request in 1.3237977027893066s
10 requests
0 failed requests
5th percentile: 1.2911030769348144
10th percentile: 1.3011628627777099
20th percentile: 1.3034294605255128
30th percentile: 1.317689561843872
40th percentile: 1.3847233295440673
50th percentile: 1.4543523788452148
60th percentile: 1.4945818424224853
70th percentile: 1.5193195343017578
80th percentile: 1.566024923324585
90th percentile: 1.7191885948181151
95th percentile: 1.9000087499618525
99th percentile: 2.0446648740768434
mean time: 1.4929404258728027
Pipeline stage StressChecker completed in 16.27s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.58s
Shutdown handler de-registered
function_lotab_2026-02-03 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 1732.44s
Shutdown handler de-registered
function_lotab_2026-02-03 status is now inactive due to auto deactivation removed underperforming models
function_lotab_2026-02-03 status is now torndown due to DeploymentManager action