function_dilul_2025-12-19

developer_uid: chai_backend_admin

submission_id: function_dilul_2025-12-19

model_name: function_dilul_2025-12-19

model_group:

status: torndown

timestamp: 2025-12-22T22:41:24+00:00

num_battles: 5405

num_wins: 2799

celo_rating: 1310.83

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: function_dilul_2025-12-19

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-19

win_ratio: 0.5178538390379278

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 10.841151475906372s
Received healthy response to inference request in 3.0560946464538574s
Received healthy response to inference request in 7.944679498672485s
Received healthy response to inference request in 17.451149940490723s
Received healthy response to inference request in 10.508006811141968s
Received healthy response to inference request in 19.083795309066772s
Received healthy response to inference request in 5.82293176651001s
Received healthy response to inference request in 6.825781583786011s
Received healthy response to inference request in 2.2356526851654053s
Received healthy response to inference request in 7.9159839153289795s
10 requests
0 failed requests
5th percentile: 2.6048515677452087
10th percentile: 2.974050450325012
20th percentile: 5.2695643424987795
30th percentile: 6.52492663860321
40th percentile: 7.479902982711792
50th percentile: 7.930331707000732
60th percentile: 8.970010423660277
70th percentile: 10.607950210571289
80th percentile: 12.163151168823244
90th percentile: 17.614414477348326
95th percentile: 18.34910489320755
99th percentile: 18.93685722589493
mean time: 9.168522763252259
%s, retrying in %s seconds...
Received healthy response to inference request in 5.671297550201416s
Received healthy response to inference request in 5.449711084365845s
Received healthy response to inference request in 2.8949079513549805s
Received healthy response to inference request in 11.234382629394531s
Received healthy response to inference request in 8.664521932601929s
Received healthy response to inference request in 18.26403307914734s
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.8413524627685547s
Received healthy response to inference request in 1.6181299686431885s
Received healthy response to inference request in 2.1064507961273193s
10 requests
1 failed requests
5th percentile: 1.7185800909996032
10th percentile: 1.819030213356018
20th percentile: 2.0534311294555665
30th percentile: 2.658370804786682
40th percentile: 4.427789831161499
50th percentile: 5.56050431728363
60th percentile: 6.86858730316162
70th percentile: 9.43548014163971
80th percentile: 12.640312719345093
90th percentile: 18.45422649383545
95th percentile: 19.310096859931942
99th percentile: 19.994793152809144
mean time: 7.7910754680633545
%s, retrying in %s seconds...
Received healthy response to inference request in 3.0148940086364746s
Received healthy response to inference request in 5.489083290100098s
Received healthy response to inference request in 6.671991348266602s
Received healthy response to inference request in 3.99652361869812s
Received healthy response to inference request in 5.538815498352051s
Received healthy response to inference request in 3.8411240577697754s
Received healthy response to inference request in 1.2951629161834717s
Received healthy response to inference request in 2.5353920459747314s
Received healthy response to inference request in 1.8101410865783691s
Received healthy response to inference request in 2.0293736457824707s
10 requests
0 failed requests
5th percentile: 1.5269030928611755
10th percentile: 1.7586432695388794
20th percentile: 1.9855271339416505
30th percentile: 2.3835865259170532
40th percentile: 2.823093223571777
50th percentile: 3.428009033203125
60th percentile: 3.903283882141113
70th percentile: 4.444291520118713
80th percentile: 5.499029731750488
90th percentile: 5.6521330833435055
95th percentile: 6.162062215805053
99th percentile: 6.570005521774292
mean time: 3.6222501516342165
Pipeline stage StressChecker completed in 211.00s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.74s
Shutdown handler de-registered
function_dilul_2025-12-19 status is now deployed due to DeploymentManager action
function_dilul_2025-12-19 status is now inactive due to auto deactivation removed underperforming models
function_dilul_2025-12-19 status is now torndown due to DeploymentManager action