function_lomen_2025-12-14

developer_uid: chai_evaluation_service

submission_id: function_lomen_2025-12-14

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-18T00:01:19+00:00

num_battles: 8190

num_wins: 4041

celo_rating: 1256.41

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-14

win_ratio: 0.49340659340659343

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.573837995529175s
Received healthy response to inference request in 3.0887022018432617s
Received healthy response to inference request in 3.5431718826293945s
Received healthy response to inference request in 1.9068670272827148s
Received healthy response to inference request in 1.7090764045715332s
Received healthy response to inference request in 4.078458786010742s
Received healthy response to inference request in 2.134056806564331s
Received healthy response to inference request in 2.042370557785034s
Received healthy response to inference request in 2.377821207046509s
10 requests
1 failed requests
5th percentile: 1.798082184791565
10th percentile: 1.8870879650115966
20th percentile: 2.0152698516845704
30th percentile: 2.106550931930542
40th percentile: 2.2803154468536375
50th percentile: 2.475829601287842
60th percentile: 2.7797836780548093
70th percentile: 3.2250431060791014
80th percentile: 3.6502292633056643
90th percentile: 5.680330944061273
95th percentile: 12.88875565528868
99th percentile: 18.655495424270633
mean time: 4.355154323577881
%s, retrying in %s seconds...
Received healthy response to inference request in 2.8283228874206543s
Received healthy response to inference request in 2.632314920425415s
Received healthy response to inference request in 2.915008783340454s
Received healthy response to inference request in 2.4806251525878906s
Received healthy response to inference request in 2.910675525665283s
Received healthy response to inference request in 2.1382153034210205s
Received healthy response to inference request in 1.94801664352417s
Received healthy response to inference request in 2.111055612564087s
Received healthy response to inference request in 3.274538278579712s
Received healthy response to inference request in 2.3615190982818604s
10 requests
0 failed requests
5th percentile: 2.0213841795921326
10th percentile: 2.094751715660095
20th percentile: 2.132783365249634
30th percentile: 2.294527959823608
40th percentile: 2.4329827308654783
50th percentile: 2.556470036506653
60th percentile: 2.7107181072235105
70th percentile: 2.853028678894043
80th percentile: 2.9115421772003174
90th percentile: 2.95096173286438
95th percentile: 3.1127500057220456
99th percentile: 3.2421806240081787
mean time: 2.560029220581055
Pipeline stage StressChecker completed in 71.81s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_lomen_2025-12-14 status is now deployed due to DeploymentManager action
function_lomen_2025-12-14 status is now inactive due to auto deactivation removed underperforming models
function_lomen_2025-12-14 status is now torndown due to DeploymentManager action