developer_uid: chai_backend_admin
submission_id: function_roluf_2025-12-04
model_name: function_roluf_2025-12-04
model_group:
status: inactive
timestamp: 2025-12-04T23:46:53+00:00
num_battles: 5940
num_wins: 2326
celo_rating: 1218.8
family_friendly_score: 0.6716
family_friendly_standard_error: 0.006641587761973789
submission_type: function
display_name: function_roluf_2025-12-04
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-04
win_ratio: 0.3915824915824916
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.136404275894165s
Received healthy response to inference request in 3.351062297821045s
Received healthy response to inference request in 1.8796541690826416s
Received healthy response to inference request in 2.3779897689819336s
Received healthy response to inference request in 1.9091808795928955s
Received healthy response to inference request in 1.1309540271759033s
Received healthy response to inference request in 1.7560224533081055s
Received healthy response to inference request in 2.074251174926758s
Received healthy response to inference request in 1.0643606185913086s
10 requests
1 failed requests
5th percentile: 1.0943276524543761
10th percentile: 1.124294686317444
20th percentile: 1.631008768081665
30th percentile: 1.8425646543502807
40th percentile: 1.897370195388794
50th percentile: 1.9917160272598267
60th percentile: 2.195746612548828
70th percentile: 2.6055141210556028
80th percentile: 3.179335880279541
90th percentile: 5.027038550376886
95th percentile: 12.568931686878187
99th percentile: 18.602446196079256
mean time: 3.879070448875427
%s, retrying in %s seconds...
Received healthy response to inference request in 5.378652811050415s
Received healthy response to inference request in 4.547958850860596s
Received healthy response to inference request in 0.7845613956451416s
Received healthy response to inference request in 0.8611359596252441s
Received healthy response to inference request in 0.764784574508667s
Received healthy response to inference request in 2.1822547912597656s
Received healthy response to inference request in 0.8150632381439209s
Received healthy response to inference request in 1.797572374343872s
Received healthy response to inference request in 1.0970144271850586s
Received healthy response to inference request in 1.9438109397888184s
10 requests
0 failed requests
5th percentile: 0.7736841440200806
10th percentile: 0.7825837135314941
20th percentile: 0.8089628696441651
30th percentile: 0.8473141431808472
40th percentile: 1.0026630401611327
50th percentile: 1.4472934007644653
60th percentile: 1.8560678005218505
70th percentile: 2.0153440952301027
80th percentile: 2.655395603179932
90th percentile: 4.631028246879577
95th percentile: 5.004840528964995
99th percentile: 5.303890354633332
mean time: 2.01728093624115
Pipeline stage StressChecker completed in 61.97s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_roluf_2025-12-04 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 1514.76s
Shutdown handler de-registered
function_roluf_2025-12-04 status is now inactive due to auto deactivation removed underperforming models