function_gurum_2025-12-16

developer_uid: chai_evaluation_service

submission_id: function_gurum_2025-12-16

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-19T19:21:19+00:00

num_battles: 10651

num_wins: 5261

celo_rating: 1288.94

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-19

win_ratio: 0.49394423058867715

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 7.736253023147583s
Received healthy response to inference request in 7.0542237758636475s
Received healthy response to inference request in 9.68016242980957s
Received healthy response to inference request in 4.115215063095093s
Received healthy response to inference request in 7.064007520675659s
Received healthy response to inference request in 2.6536097526550293s
Received healthy response to inference request in 5.125759840011597s
Received healthy response to inference request in 4.653863906860352s
Received healthy response to inference request in 6.777930974960327s
Received healthy response to inference request in 3.729628086090088s
10 requests
0 failed requests
5th percentile: 3.1378180027008056
10th percentile: 3.622026252746582
20th percentile: 4.038097667694092
30th percentile: 4.492269253730774
40th percentile: 4.937001466751099
50th percentile: 5.951845407485962
60th percentile: 6.888448095321655
70th percentile: 7.057158899307251
80th percentile: 7.1984566211700445
90th percentile: 7.930643963813781
95th percentile: 8.805403196811675
99th percentile: 9.505210583209992
mean time: 5.859065437316895
Pipeline stage StressChecker completed in 62.64s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.68s
Shutdown handler de-registered
function_gurum_2025-12-16 status is now deployed due to DeploymentManager action
function_gurum_2025-12-16 status is now inactive due to auto deactivation removed underperforming models
function_gurum_2025-12-16 status is now torndown due to DeploymentManager action