qwen-qwen3-235b-a22b-in_47730

developer_uid: chai_backend_admin

submission_id: qwen-qwen3-235b-a22b-in_47730_v8

model_name: qwen-qwen3-235b-a22b-in_47730_v8

model_group: Qwen/Qwen3-235B-A22B-Ins

status: torndown

timestamp: 2025-12-09T05:12:16+00:00

num_battles: 8292

num_wins: 4208

celo_rating: 1304.17

family_friendly_score: 0.597

family_friendly_standard_error: 0.006936728335461898

submission_type: basic

model_repo: Qwen/Qwen3-235B-A22B-Instruct-2507

model_architecture: Qwen3MoeForCausalLM

model_num_parameters: 18790207488.0

best_of: 8

max_input_tokens: 3000

max_output_tokens: 70

reward_model: default

display_name: qwen-qwen3-235b-a22b-in_47730_v8

ineligible_reason: max_output_tokens!=64

is_internal_developer: True

language_model: Qwen/Qwen3-235B-A22B-Instruct-2507

model_size: 19B

ranking_group: single

us_pacific_date: 2025-12-08

win_ratio: 0.5074770863482875

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['<|user|>', '<|im_end|>', '</think>', '</s>', '<|assistant|>', '####'], 'max_input_tokens': 3000, 'best_of': 8, 'max_output_tokens': 70}

formatter: {'memory_template': "<|im_start|>system\nYou are {bot_name}. Engaged in a highly engaging roleplay with user. NSFW allowed. Be descriptive and creative.\n{bot_name}'s persona: {memory}<|im_end|>\n", 'prompt_template': '', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.32s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen3-235b-a22b-in-47730-v8
Waiting for inference service qwen-qwen3-235b-a22b-in-47730-v8 to be ready
Inference service qwen-qwen3-235b-a22b-in-47730-v8 ready after 426.158992767334s
Pipeline stage VLLMDeployer completed in 427.31s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.393209934234619s
Received healthy response to inference request in 2.090310573577881s
Received healthy response to inference request in 1.67695951461792s
Received healthy response to inference request in 1.6587774753570557s
Received healthy response to inference request in 2.5557947158813477s
Received healthy response to inference request in 2.4874062538146973s
Received healthy response to inference request in 2.401486396789551s
Received healthy response to inference request in 1.8976788520812988s
Received healthy response to inference request in 2.2247374057769775s
Received healthy response to inference request in 2.2716031074523926s
Received healthy response to inference request in 2.263580799102783s
Received healthy response to inference request in 2.61722993850708s
Received healthy response to inference request in 2.532566547393799s
Received healthy response to inference request in 2.4408535957336426s
Received healthy response to inference request in 2.408405303955078s
Received healthy response to inference request in 2.8332934379577637s
Received healthy response to inference request in 1.7677550315856934s
Received healthy response to inference request in 1.8663170337677002s
Received healthy response to inference request in 1.656731367111206s
Received healthy response to inference request in 1.7935094833374023s
Received healthy response to inference request in 1.7121210098266602s
Received healthy response to inference request in 1.6267850399017334s
Received healthy response to inference request in 1.8763024806976318s
Received healthy response to inference request in 1.967339277267456s
Received healthy response to inference request in 1.6885168552398682s
Received healthy response to inference request in 1.8615541458129883s
Received healthy response to inference request in 2.3794567584991455s
Received healthy response to inference request in 2.3412818908691406s
Received healthy response to inference request in 1.6854526996612549s
Received healthy response to inference request in 1.891040563583374s
30 requests
0 failed requests
5th percentile: 1.6576521158218385
10th percentile: 1.6751413106918336
20th percentile: 1.7074001789093018
30th percentile: 1.8411407470703125
40th percentile: 1.885145330429077
50th percentile: 2.0288249254226685
60th percentile: 2.2667897224426268
70th percentile: 2.3835827112197876
80th percentile: 2.414894962310791
90th percentile: 2.5348893642425536
95th percentile: 2.5895840883255
99th percentile: 2.7706350231170656
mean time: 2.0956019163131714
Pipeline stage StressChecker completed in 81.37s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.53s
Shutdown handler de-registered
qwen-qwen3-235b-a22b-in_47730_v8 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Generating Leaderboard row for %s
Generated Leaderboard row for %s
Pipeline stage OfflineFamilyFriendlyScorer completed in 2648.98s
Shutdown handler de-registered
qwen-qwen3-235b-a22b-in_47730_v8 status is now torndown due to DeploymentManager action