developer_uid: chai_evaluation_service
submission_id: function_jitam_2025-12-14
model_name: richard
model_group:
status: inactive
timestamp: 2025-12-14T10:56:29+00:00
num_battles: 8149
num_wins: 3981
celo_rating: 1256.38
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: function
display_name: richard
is_internal_developer: True
ranking_group: single
us_pacific_date: 2025-12-14
win_ratio: 0.4885261995336851
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.6383142471313477s
Received healthy response to inference request in 2.6113836765289307s
Received healthy response to inference request in 2.5862998962402344s
Received healthy response to inference request in 2.270752191543579s
Received healthy response to inference request in 2.296358823776245s
Received healthy response to inference request in 2.594675302505493s
Received healthy response to inference request in 2.4442026615142822s
Received healthy response to inference request in 2.485297441482544s
Received healthy response to inference request in 2.99949312210083s
10 requests
1 failed requests
5th percentile: 2.282275176048279
10th percentile: 2.2937981605529787
20th percentile: 2.4146338939666747
30th percentile: 2.4729690074920656
40th percentile: 2.545898914337158
50th percentile: 2.5904875993728638
60th percentile: 2.601358652114868
70th percentile: 2.6194628477096558
80th percentile: 2.7105500221252443
90th percentile: 4.710119724273675
95th percentile: 12.407939434051496
99th percentile: 18.566195201873782
mean time: 4.303253650665283
%s, retrying in %s seconds...
Received healthy response to inference request in 2.0118417739868164s
Received healthy response to inference request in 2.60685396194458s
Received healthy response to inference request in 2.7870137691497803s
Received healthy response to inference request in 1.962970495223999s
Received healthy response to inference request in 2.428102731704712s
Received healthy response to inference request in 2.6535274982452393s
Received healthy response to inference request in 1.9712207317352295s
Received healthy response to inference request in 2.222538948059082s
Received healthy response to inference request in 3.258479595184326s
Received healthy response to inference request in 2.0911448001861572s
10 requests
0 failed requests
5th percentile: 1.9666831016540527
10th percentile: 1.9703957080841064
20th percentile: 2.003717565536499
30th percentile: 2.067353892326355
40th percentile: 2.169981288909912
50th percentile: 2.325320839881897
60th percentile: 2.499603223800659
70th percentile: 2.620856022834778
80th percentile: 2.6802247524261475
90th percentile: 2.8341603517532348
95th percentile: 3.04631997346878
99th percentile: 3.216047670841217
mean time: 2.3993694305419924
Pipeline stage StressChecker completed in 70.22s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
function_jitam_2025-12-14 status is now deployed due to DeploymentManager action
function_jitam_2025-12-14 status is now inactive due to auto deactivation removed underperforming models