function_hodul_2025-12-14

developer_uid: chai_evaluation_service

submission_id: function_hodul_2025-12-14

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-17T21:06:41+00:00

num_battles: 9669

num_wins: 4781

celo_rating: 1256.41

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-14

win_ratio: 0.49446685282862757

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 7.249832630157471s
Received healthy response to inference request in 6.243911266326904s
Received healthy response to inference request in 5.2451255321502686s
Received healthy response to inference request in 2.9831039905548096s
Received healthy response to inference request in 11.06296968460083s
Received healthy response to inference request in 7.037874698638916s
Received healthy response to inference request in 5.957582235336304s
Received healthy response to inference request in 4.492192268371582s
Received healthy response to inference request in 4.018158197402954s
10 requests
1 failed requests
5th percentile: 3.4488783836364747
10th percentile: 3.91465277671814
20th percentile: 4.397385454177856
30th percentile: 5.019245553016662
40th percentile: 5.67259955406189
50th percentile: 6.100746750831604
60th percentile: 6.5614966392517085
70th percentile: 7.101462078094483
80th percentile: 8.012460041046143
90th percentile: 11.984148478507992
95th percentile: 16.12945305109023
99th percentile: 19.44569670915604
mean time: 7.456550812721252
%s, retrying in %s seconds...
Received healthy response to inference request in 5.268614053726196s
Received healthy response to inference request in 3.9623794555664062s
Received healthy response to inference request in 7.586337566375732s
Received healthy response to inference request in 4.634059906005859s
Received healthy response to inference request in 7.887232780456543s
Received healthy response to inference request in 5.299781560897827s
Received healthy response to inference request in 2.8755087852478027s
Received healthy response to inference request in 3.601757764816284s
Received healthy response to inference request in 2.8707714080810547s
Received healthy response to inference request in 5.792543888092041s
10 requests
0 failed requests
5th percentile: 2.8729032278060913
10th percentile: 2.875035047531128
20th percentile: 3.4565079689025877
30th percentile: 3.8541929483413697
40th percentile: 4.365387725830078
50th percentile: 4.951336979866028
60th percentile: 5.281081056594848
70th percentile: 5.447610259056091
80th percentile: 6.15130262374878
90th percentile: 7.616427087783813
95th percentile: 7.7518299341201775
99th percentile: 7.86015221118927
mean time: 4.977898716926575
Pipeline stage StressChecker completed in 127.88s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_hodul_2025-12-14 status is now deployed due to DeploymentManager action
function_hodul_2025-12-14 status is now inactive due to auto deactivation removed underperforming models
function_hodul_2025-12-14 status is now torndown due to DeploymentManager action