qwen-qwen2-5-3b-instruct

developer_uid: chai_evaluation_service

submission_id: qwen-qwen2-5-3b-instruct_v6

model_name: qwen-qwen2-5-3b-instruct_v6

model_group: Qwen/Qwen2.5-3B-Instruct

status: torndown

timestamp: 2026-02-10T23:21:35+00:00

num_battles: 12252

num_wins: 4402

celo_rating: 1206.8

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: basic

model_repo: Qwen/Qwen2.5-3B-Instruct

model_architecture: Qwen2ForCausalLM

model_num_parameters: 3397011456.0

best_of: 8

max_input_tokens: 2048

max_output_tokens: 64

reward_model: default

display_name: qwen-qwen2-5-3b-instruct_v6

is_internal_developer: True

language_model: Qwen/Qwen2.5-3B-Instruct

model_size: 3B

ranking_group: single

us_pacific_date: 2026-02-07

win_ratio: 0.3592882794645772

generation_params: {'temperature': 0.85, 'top_p': 0.9, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.2, 'frequency_penalty': 0.3, 'stopping_words': ['\n'], 'max_input_tokens': 2048, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '<|im_start|>system\n{memory}<|im_end|>\n', 'prompt_template': '<|im_start|>user\n{prompt}<|im_end|>\n', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{user_name}: {message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name qwen-qwen2-5-3b-instruct-v6-uploader
Waiting for job on qwen-qwen2-5-3b-instruct-v6-uploader to finish
qwen-qwen2-5-3b-instruct-v6-uploader: Using quantization_mode: none
qwen-qwen2-5-3b-instruct-v6-uploader: Downloading snapshot of Qwen/Qwen2.5-3B-Instruct...
qwen-qwen2-5-3b-instruct-v6-uploader: 
Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]
Fetching 12 files:   8%|▊         | 1/12 [00:00<00:03,  3.45it/s]
Fetching 12 files:  58%|█████▊    | 7/12 [00:02<00:02,  2.39it/s]
Fetching 12 files: 100%|██████████| 12/12 [00:02<00:00,  4.15it/s]
qwen-qwen2-5-3b-instruct-v6-uploader: Downloaded in 3.041s
qwen-qwen2-5-3b-instruct-v6-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v6/model-00002-of-00002.safetensors
qwen-qwen2-5-3b-instruct-v6-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v6/model-00001-of-00002.safetensors
Job qwen-qwen2-5-3b-instruct-v6-uploader completed after 74.52s with status: succeeded
Stopping job with name qwen-qwen2-5-3b-instruct-v6-uploader
Pipeline stage VLLMUploader completed in 78.91s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.19s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen2-5-3b-instruct-v6
Waiting for inference service qwen-qwen2-5-3b-instruct-v6 to be ready
Inference service qwen-qwen2-5-3b-instruct-v6 ready after 171.26632022857666s
Pipeline stage VLLMDeployer completed in 173.27s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.5104289054870605s
Received healthy response to inference request in 1.1055827140808105s
Received healthy response to inference request in 1.6326649188995361s
Received healthy response to inference request in 2.0266480445861816s
Received healthy response to inference request in 1.3543086051940918s
Received healthy response to inference request in 0.6423749923706055s
Received healthy response to inference request in 1.005537748336792s
Received healthy response to inference request in 1.1697313785552979s
Received healthy response to inference request in 0.8995182514190674s
Received healthy response to inference request in 0.7338018417358398s
Received healthy response to inference request in 0.7906866073608398s
Received healthy response to inference request in 0.42670416831970215s
Received healthy response to inference request in 0.7089221477508545s
Received healthy response to inference request in 0.8531079292297363s
Received healthy response to inference request in 1.4749705791473389s
Received healthy response to inference request in 0.8783597946166992s
Received healthy response to inference request in 0.5654046535491943s
Received healthy response to inference request in 0.7639880180358887s
Received healthy response to inference request in 1.7811477184295654s
Received healthy response to inference request in 1.2335498332977295s
Received healthy response to inference request in 1.2852437496185303s
Received healthy response to inference request in 2.6807048320770264s
Received healthy response to inference request in 1.5462782382965088s
Received healthy response to inference request in 1.1446542739868164s
Received healthy response to inference request in 1.0771644115447998s
Received healthy response to inference request in 1.3094158172607422s
Received healthy response to inference request in 0.9871010780334473s
Received healthy response to inference request in 1.0543088912963867s
Received healthy response to inference request in 1.115515947341919s
Received healthy response to inference request in 1.523411512374878s
30 requests
0 failed requests
5th percentile: 0.6000413060188293
10th percentile: 0.7022674322128296
20th percentile: 0.7853468894958496
30th percentile: 0.8931707143783569
40th percentile: 1.034800434112549
50th percentile: 1.1105493307113647
60th percentile: 1.1952587604522704
70th percentile: 1.322883653640747
80th percentile: 1.5130254268646242
90th percentile: 1.6475131988525393
95th percentile: 1.9161728978157035
99th percentile: 2.491028363704682
mean time: 1.1760412534077962
Pipeline stage StressChecker completed in 62.93s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 3.58s
Shutdown handler de-registered
qwen-qwen2-5-3b-instruct_v6 status is now deployed due to DeploymentManager action
qwen-qwen2-5-3b-instruct_v6 status is now inactive due to auto deactivation removed underperforming models
qwen-qwen2-5-3b-instruct_v6 status is now torndown due to DeploymentManager action