qwen-qwen2-5-32b

developer_uid: rirv938

submission_id: qwen-qwen2-5-32b_v11

model_name: qwen-qwen2-5-32b_v11

model_group: Qwen/Qwen2.5-32B

status: torndown

timestamp: 2025-01-03T04:40:15+00:00

num_battles: 15

num_wins: 5

family_friendly_score: 0.0

family_friendly_standard_error: 0.0

submission_type: basic

model_repo: Qwen/Qwen2.5-32B

model_architecture: Qwen2ForCausalLM

model_num_parameters: 32763417600.0

best_of: 4

max_input_tokens: 768

max_output_tokens: 64

latencies: [{'batch_size': 1, 'throughput': 0.2791906533009261, 'latency_mean': 3.5817212617397307, 'latency_p50': 3.5636913776397705, 'latency_p90': 3.9850356340408326}, {'batch_size': 2, 'throughput': 0.45234344582362546, 'latency_mean': 4.420391418933868, 'latency_p50': 4.450167655944824, 'latency_p90': 4.862027168273926}, {'batch_size': 3, 'throughput': 0.6040122991147273, 'latency_mean': 4.938081824779511, 'latency_p50': 4.943245768547058, 'latency_p90': 5.406014490127563}, {'batch_size': 4, 'throughput': 0.7352758325981334, 'latency_mean': 5.418999512195587, 'latency_p50': 5.3902692794799805, 'latency_p90': 5.981081557273865}, {'batch_size': 5, 'throughput': 0.8280083577614781, 'latency_mean': 6.014862887859344, 'latency_p50': 6.0092631578445435, 'latency_p90': 6.6853755235672}]

gpu_counts: {'NVIDIA RTX A6000': 1}

display_name: qwen-qwen2-5-32b_v11

ineligible_reason: num_battles<5000

is_internal_developer: True

language_model: Qwen/Qwen2.5-32B

model_size: 33B

ranking_group: single

throughput_3p7s: 0.31

us_pacific_date: 2025-01-02

win_ratio: 0.3333333333333333

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n', '<|eot_id|>', '<|end_of_text|>', 'You:'], 'max_input_tokens': 768, 'best_of': 4, 'max_output_tokens': 64}

formatter: {'memory_template': '', 'prompt_template': '', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name qwen-qwen2-5-32b-v11-mkmlizer
Waiting for job on qwen-qwen2-5-32b-v11-mkmlizer to finish
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
qwen-qwen2-5-32b-v11-mkmlizer: Downloaded to shared memory in 91.588s
qwen-qwen2-5-32b-v11-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpz7mlx7i6, device:0
qwen-qwen2-5-32b-v11-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
qwen-qwen2-5-32b-v11-mkmlizer: quantized model in 63.054s
qwen-qwen2-5-32b-v11-mkmlizer: Processed model Qwen/Qwen2.5-32B in 154.643s
qwen-qwen2-5-32b-v11-mkmlizer: creating bucket guanaco-mkml-models
qwen-qwen2-5-32b-v11-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
qwen-qwen2-5-32b-v11-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/config.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/added_tokens.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/added_tokens.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/special_tokens_map.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/tokenizer_config.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/merges.txt s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/merges.txt
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/vocab.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/vocab.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/tokenizer.json
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/flywheel_model.2.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/flywheel_model.2.safetensors
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/flywheel_model.0.safetensors
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/flywheel_model.1.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/flywheel_model.1.safetensors
qwen-qwen2-5-32b-v11-mkmlizer: 
Loading 0:   0%|          | 0/771 [00:00<?, ?it/s]
Loading 0:   1%|          | 5/771 [00:00<00:18, 40.90it/s]
Loading 0:   2%|▏         | 16/771 [00:00<00:09, 76.63it/s]
Loading 0:   3%|▎         | 26/771 [00:00<00:08, 85.34it/s]
Loading 0:   5%|▍         | 35/771 [00:00<00:13, 55.15it/s]
Loading 0:   5%|▌         | 42/771 [00:00<00:14, 50.88it/s]
Loading 0:   7%|▋         | 53/771 [00:00<00:11, 60.59it/s]
Loading 0:   8%|▊         | 65/771 [00:01<00:10, 69.60it/s]
Loading 0:  11%|█         | 82/771 [00:01<00:10, 68.52it/s]
Loading 0:  12%|█▏        | 92/771 [00:01<00:09, 74.72it/s]
Loading 0:  13%|█▎        | 101/771 [00:01<00:09, 73.72it/s]
Loading 0:  14%|█▍        | 109/771 [00:01<00:09, 71.17it/s]
Loading 0:  16%|█▌        | 121/771 [00:01<00:08, 76.12it/s]
Loading 0:  17%|█▋        | 130/771 [00:01<00:09, 67.74it/s]
Loading 0:  18%|█▊        | 138/771 [00:02<00:11, 56.23it/s]
Loading 0:  19%|█▉        | 149/771 [00:02<00:09, 63.50it/s]
Loading 0:  21%|██        | 161/771 [00:02<00:08, 70.64it/s]
Loading 0:  23%|██▎       | 178/771 [00:02<00:08, 72.83it/s]
Loading 0:  24%|██▍       | 186/771 [00:02<00:09, 61.41it/s]
Loading 0:  26%|██▌       | 197/771 [00:02<00:08, 66.56it/s]
Loading 0:  27%|██▋       | 209/771 [00:03<00:07, 72.57it/s]
Loading 0:  29%|██▉       | 226/771 [00:03<00:07, 73.24it/s]
Loading 0:  30%|███       | 234/771 [00:03<00:08, 61.36it/s]
Loading 0:  32%|███▏      | 245/771 [00:03<00:07, 65.78it/s]
Loading 0:  33%|███▎      | 256/771 [00:03<00:06, 74.33it/s]
Loading 0:  35%|███▍      | 266/771 [00:03<00:06, 79.65it/s]
Loading 0:  36%|███▌      | 275/771 [00:04<00:07, 64.79it/s]
Loading 0:  37%|███▋      | 283/771 [00:04<00:08, 59.52it/s]
Loading 0:  38%|███▊      | 293/771 [00:04<00:07, 63.49it/s]
Loading 0:  39%|███▉      | 303/771 [00:18<03:32,  2.20it/s]
Loading 0:  41%|████      | 314/771 [00:19<02:21,  3.22it/s]
Loading 0:  42%|████▏     | 322/771 [00:19<01:46,  4.20it/s]
Loading 0:  43%|████▎     | 329/771 [00:19<01:25,  5.15it/s]
Loading 0:  44%|████▍     | 341/771 [00:19<00:54,  7.91it/s]
Loading 0:  46%|████▌     | 353/771 [00:19<00:36, 11.55it/s]
Loading 0:  48%|████▊     | 370/771 [00:20<00:22, 17.62it/s]
Loading 0:  49%|████▉     | 378/771 [00:20<00:19, 19.97it/s]
Loading 0:  50%|█████     | 389/771 [00:20<00:14, 25.82it/s]
Loading 0:  52%|█████▏    | 401/771 [00:20<00:11, 33.38it/s]
Loading 0:  54%|█████▍    | 418/771 [00:20<00:08, 42.81it/s]
Loading 0:  55%|█████▌    | 426/771 [00:20<00:08, 41.84it/s]
Loading 0:  57%|█████▋    | 437/771 [00:21<00:06, 48.90it/s]
Loading 0:  58%|█████▊    | 449/771 [00:21<00:05, 57.04it/s]
Loading 0:  60%|██████    | 466/771 [00:21<00:04, 63.43it/s]
Loading 0:  61%|██████▏   | 474/771 [00:21<00:05, 56.42it/s]
Loading 0:  63%|██████▎   | 484/771 [00:21<00:04, 63.57it/s]
Loading 0:  64%|██████▍   | 496/771 [00:21<00:03, 70.38it/s]
Loading 0:  66%|██████▌   | 507/771 [00:22<00:03, 78.53it/s]
Loading 0:  67%|██████▋   | 516/771 [00:22<00:04, 59.85it/s]
Loading 0:  68%|██████▊   | 524/771 [00:22<00:04, 61.09it/s]
Loading 0:  69%|██████▉   | 533/771 [00:22<00:03, 63.54it/s]
Loading 0:  71%|███████   | 545/771 [00:22<00:03, 70.89it/s]
Loading 0:  73%|███████▎  | 562/771 [00:22<00:02, 73.27it/s]
Loading 0:  74%|███████▍  | 570/771 [00:23<00:03, 61.45it/s]
Loading 0:  75%|███████▌  | 581/771 [00:23<00:02, 67.07it/s]
Loading 0:  77%|███████▋  | 593/771 [00:23<00:02, 73.26it/s]
Loading 0:  79%|███████▉  | 610/771 [00:23<00:02, 74.46it/s]
Loading 0:  80%|████████  | 618/771 [00:23<00:02, 62.40it/s]
Loading 0:  81%|████████▏ | 627/771 [00:38<00:59,  2.42it/s]
Loading 0:  83%|████████▎ | 639/771 [00:38<00:37,  3.53it/s]
Loading 0:  84%|████████▍ | 650/771 [00:38<00:24,  4.94it/s]
Loading 0:  85%|████████▌ | 658/771 [00:38<00:18,  6.26it/s]
Loading 0:  86%|████████▋ | 665/771 [00:38<00:14,  7.40it/s]
Loading 0:  88%|████████▊ | 677/771 [00:39<00:08, 11.03it/s]
Loading 0:  89%|████████▉ | 689/771 [00:39<00:05, 15.66it/s]
Loading 0:  92%|█████████▏| 706/771 [00:39<00:02, 23.15it/s]
Loading 0:  93%|█████████▎| 714/771 [00:39<00:02, 25.31it/s]
Loading 0:  94%|█████████▍| 725/771 [00:39<00:01, 31.93it/s]
Loading 0:  96%|█████████▌| 737/771 [00:39<00:00, 40.20it/s]
Loading 0:  98%|█████████▊| 754/771 [00:47<00:03,  5.16it/s]
Loading 0:  99%|█████████▊| 760/771 [00:47<00:01,  6.06it/s]
                                                            
Job qwen-qwen2-5-32b-v11-mkmlizer completed after 196.07s with status: succeeded
Stopping job with name qwen-qwen2-5-32b-v11-mkmlizer
Pipeline stage MKMLizer completed in 196.53s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.16s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service qwen-qwen2-5-32b-v11
Waiting for inference service qwen-qwen2-5-32b-v11 to be ready
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Inference service qwen-qwen2-5-32b-v11 ready after 331.60169792175293s
Pipeline stage MKMLDeployer completed in 332.13s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.5819168090820312s
Received healthy response to inference request in 2.72748064994812s
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Received healthy response to inference request in 3.1838302612304688s
Received healthy response to inference request in 1.8934087753295898s
Received healthy response to inference request in 3.8232977390289307s
5 requests
0 failed requests
5th percentile: 2.031110382080078
10th percentile: 2.1688119888305666
20th percentile: 2.444215202331543
30th percentile: 2.611029577255249
40th percentile: 2.6692551136016847
50th percentile: 2.72748064994812
60th percentile: 2.9100204944610595
70th percentile: 3.092560338973999
80th percentile: 3.3117237567901614
90th percentile: 3.567510747909546
95th percentile: 3.695404243469238
99th percentile: 3.7977190399169922
mean time: 2.8419868469238283
Pipeline stage StressChecker completed in 15.59s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.76s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 0.65s
Shutdown handler de-registered
qwen-qwen2-5-32b_v11 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
clean up pipeline due to error=DeploymentChecksError("('http://qwen-qwen2-5-32b-v11-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')")
Shutdown handler de-registered
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
clean up pipeline due to error=DeploymentChecksError("('http://qwen-qwen2-5-32b-v11-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')")
Shutdown handler de-registered
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
clean up pipeline due to error=DeploymentChecksError("('http://qwen-qwen2-5-32b-v11-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')")
Shutdown handler de-registered
qwen-qwen2-5-32b_v11 status is now torndown due to DeploymentManager action
qwen-qwen2-5-32b_v11 status is now torndown due to DeploymentManager action
qwen-qwen2-5-32b_v11 status is now torndown due to DeploymentManager action

ChatRequest

Bot Name

Generation Params

Prompt Formatter

ChatMessage 1

Sender

Message

User Message