function_tujob_2025-12-15

developer_uid: chai_evaluation_service

submission_id: function_tujob_2025-12-15

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-18T07:31:10+00:00

num_battles: 7961

num_wins: 4011

celo_rating: 1295.9

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-17

win_ratio: 0.5038311769878157

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.638031244277954s
Received healthy response to inference request in 2.0727274417877197s
Received healthy response to inference request in 2.564481735229492s
Received healthy response to inference request in 3.159625768661499s
Received healthy response to inference request in 2.8210837841033936s
Received healthy response to inference request in 2.0707669258117676s
Received healthy response to inference request in 2.2187256813049316s
Received healthy response to inference request in 2.9323296546936035s
Received healthy response to inference request in 2.823883295059204s
10 requests
1 failed requests
5th percentile: 1.8327623009681702
10th percentile: 2.0274933576583862
20th percentile: 2.0723353385925294
30th percentile: 2.174926209449768
40th percentile: 2.426179313659668
50th percentile: 2.692782759666443
60th percentile: 2.822203588485718
70th percentile: 2.856417202949524
80th percentile: 2.977788877487183
90th percentile: 4.854546809196466
95th percentile: 12.481691491603833
99th percentile: 18.583407237529755
mean time: 4.24104917049408
%s, retrying in %s seconds...
Received healthy response to inference request in 1.6904304027557373s
Received healthy response to inference request in 2.2331666946411133s
Received healthy response to inference request in 2.4374499320983887s
Received healthy response to inference request in 1.9139461517333984s
Received healthy response to inference request in 1.8745124340057373s
Received healthy response to inference request in 3.0638315677642822s
Received healthy response to inference request in 2.616234064102173s
Received healthy response to inference request in 2.986330509185791s
Received healthy response to inference request in 1.68178391456604s
Received healthy response to inference request in 1.8495736122131348s
10 requests
0 failed requests
5th percentile: 1.6856748342514039
10th percentile: 1.6895657539367677
20th percentile: 1.8177449703216553
30th percentile: 1.8670307874679566
40th percentile: 1.898172664642334
50th percentile: 2.073556423187256
60th percentile: 2.3148799896240235
70th percentile: 2.491085171699524
80th percentile: 2.6902533531188966
90th percentile: 2.9940806150436403
95th percentile: 3.0289560914039613
99th percentile: 3.056856472492218
mean time: 2.2347259283065797
Pipeline stage StressChecker completed in 67.90s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.64s
Shutdown handler de-registered
function_tujob_2025-12-15 status is now deployed due to DeploymentManager action
function_tujob_2025-12-15 status is now inactive due to auto deactivation removed underperforming models
function_tujob_2025-12-15 status is now torndown due to DeploymentManager action