function_tusif_2025-12-17

developer_uid: chai_evaluation_service

submission_id: function_tusif_2025-12-17

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-20T02:41:14+00:00

num_battles: 7897

num_wins: 3928

celo_rating: 1291.35

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-19

win_ratio: 0.497404077497784

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.1344103813171387s
Received healthy response to inference request in 2.065239906311035s
Received healthy response to inference request in 2.419978618621826s
Received healthy response to inference request in 3.590183734893799s
Received healthy response to inference request in 2.7879436016082764s
Received healthy response to inference request in 3.788698196411133s
Received healthy response to inference request in 1.841648817062378s
Received healthy response to inference request in 4.086672782897949s
Received healthy response to inference request in 2.245466709136963s
Received healthy response to inference request in 1.9030461311340332s
10 requests
0 failed requests
5th percentile: 1.8692776083946228
10th percentile: 1.8969063997268676
20th percentile: 2.0328011512756348
30th percentile: 2.1913986682891844
40th percentile: 2.3501738548278808
50th percentile: 2.6039611101150513
60th percentile: 2.926530313491821
70th percentile: 3.2711423873901366
80th percentile: 3.6298866271972656
90th percentile: 3.8184956550598144
95th percentile: 3.9525842189788816
99th percentile: 4.059855070114136
mean time: 2.786328887939453
Pipeline stage StressChecker completed in 29.24s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
function_tusif_2025-12-17 status is now deployed due to DeploymentManager action
function_tusif_2025-12-17 status is now inactive due to auto deactivation removed underperforming models
function_tusif_2025-12-17 status is now torndown due to DeploymentManager action