function_lupit_2025-12-13

developer_uid: chai_evaluation_service

submission_id: function_lupit_2025-12-13

model_name: richard

model_group:

status: torndown

timestamp: 2025-12-16T19:21:13+00:00

num_battles: 6102

num_wins: 3046

celo_rating: 1311.8

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: function

display_name: richard

is_internal_developer: True

ranking_group: single

us_pacific_date: 2025-12-13

win_ratio: 0.49918059652572927

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '### Instruction:\n{memory}\n', 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.004894733428955s
Received healthy response to inference request in 3.971158981323242s
Received healthy response to inference request in 2.6851730346679688s
Received healthy response to inference request in 4.147960662841797s
Received healthy response to inference request in 2.91261625289917s
Received healthy response to inference request in 1.927807331085205s
Received healthy response to inference request in 2.026456117630005s
Received healthy response to inference request in 2.0833256244659424s
Received healthy response to inference request in 3.8711390495300293s
Received healthy response to inference request in 3.3466415405273438s
10 requests
0 failed requests
5th percentile: 1.9624966621398925
10th percentile: 1.9971859931945801
20th percentile: 2.022143840789795
30th percentile: 2.066264772415161
40th percentile: 2.444434070587158
50th percentile: 2.7988946437835693
60th percentile: 3.0862263679504394
70th percentile: 3.5039907932281493
80th percentile: 3.891143035888672
90th percentile: 3.9888391494750977
95th percentile: 4.068399906158447
99th percentile: 4.132048511505127
mean time: 2.897717332839966
Pipeline stage StressChecker completed in 30.64s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.57s
Shutdown handler de-registered
function_lupit_2025-12-13 status is now deployed due to DeploymentManager action
function_lupit_2025-12-13 status is now inactive due to auto deactivation removed underperforming models
function_lupit_2025-12-13 status is now torndown due to DeploymentManager action