developer_uid: rirv938
submission_id: qwen-qwen2-5-32b_v11
model_name: qwen-qwen2-5-32b_v11
model_group: Qwen/Qwen2.5-32B
status: torndown
timestamp: 2025-01-03T04:40:15+00:00
num_battles: 15
num_wins: 5
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: basic
model_repo: Qwen/Qwen2.5-32B
model_architecture: Qwen2ForCausalLM
model_num_parameters: 32763417600.0
best_of: 4
max_input_tokens: 768
max_output_tokens: 64
latencies: [{'batch_size': 1, 'throughput': 0.2791906533009261, 'latency_mean': 3.5817212617397307, 'latency_p50': 3.5636913776397705, 'latency_p90': 3.9850356340408326}, {'batch_size': 2, 'throughput': 0.45234344582362546, 'latency_mean': 4.420391418933868, 'latency_p50': 4.450167655944824, 'latency_p90': 4.862027168273926}, {'batch_size': 3, 'throughput': 0.6040122991147273, 'latency_mean': 4.938081824779511, 'latency_p50': 4.943245768547058, 'latency_p90': 5.406014490127563}, {'batch_size': 4, 'throughput': 0.7352758325981334, 'latency_mean': 5.418999512195587, 'latency_p50': 5.3902692794799805, 'latency_p90': 5.981081557273865}, {'batch_size': 5, 'throughput': 0.8280083577614781, 'latency_mean': 6.014862887859344, 'latency_p50': 6.0092631578445435, 'latency_p90': 6.6853755235672}]
gpu_counts: {'NVIDIA RTX A6000': 1}
display_name: qwen-qwen2-5-32b_v11
ineligible_reason: num_battles<5000
is_internal_developer: True
language_model: Qwen/Qwen2.5-32B
model_size: 33B
ranking_group: single
throughput_3p7s: 0.31
us_pacific_date: 2025-01-02
win_ratio: 0.3333333333333333
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n', '<|eot_id|>', '<|end_of_text|>', 'You:'], 'max_input_tokens': 768, 'best_of': 4, 'max_output_tokens': 64}
formatter: {'memory_template': '', 'prompt_template': '', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '{bot_name}:', 'truncate_by_message': False}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name qwen-qwen2-5-32b-v11-mkmlizer
Waiting for job on qwen-qwen2-5-32b-v11-mkmlizer to finish
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
qwen-qwen2-5-32b-v11-mkmlizer: Downloaded to shared memory in 91.588s
qwen-qwen2-5-32b-v11-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpz7mlx7i6, device:0
qwen-qwen2-5-32b-v11-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
qwen-qwen2-5-32b-v11-mkmlizer: quantized model in 63.054s
qwen-qwen2-5-32b-v11-mkmlizer: Processed model Qwen/Qwen2.5-32B in 154.643s
qwen-qwen2-5-32b-v11-mkmlizer: creating bucket guanaco-mkml-models
qwen-qwen2-5-32b-v11-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
qwen-qwen2-5-32b-v11-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/config.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/added_tokens.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/added_tokens.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/special_tokens_map.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/tokenizer_config.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/merges.txt s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/merges.txt
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/vocab.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/vocab.json
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/tokenizer.json
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/flywheel_model.2.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/flywheel_model.2.safetensors
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/flywheel_model.0.safetensors
qwen-qwen2-5-32b-v11-mkmlizer: cp /dev/shm/model_cache/flywheel_model.1.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-32b-v11/flywheel_model.1.safetensors
qwen-qwen2-5-32b-v11-mkmlizer: Loading 0: 0%| | 0/771 [00:00<?, ?it/s] Loading 0: 1%| | 5/771 [00:00<00:18, 40.90it/s] Loading 0: 2%|▏ | 16/771 [00:00<00:09, 76.63it/s] Loading 0: 3%|▎ | 26/771 [00:00<00:08, 85.34it/s] Loading 0: 5%|▍ | 35/771 [00:00<00:13, 55.15it/s] Loading 0: 5%|▌ | 42/771 [00:00<00:14, 50.88it/s] Loading 0: 7%|▋ | 53/771 [00:00<00:11, 60.59it/s] Loading 0: 8%|▊ | 65/771 [00:01<00:10, 69.60it/s] Loading 0: 11%|█ | 82/771 [00:01<00:10, 68.52it/s] Loading 0: 12%|█▏ | 92/771 [00:01<00:09, 74.72it/s] Loading 0: 13%|█▎ | 101/771 [00:01<00:09, 73.72it/s] Loading 0: 14%|█▍ | 109/771 [00:01<00:09, 71.17it/s] Loading 0: 16%|█▌ | 121/771 [00:01<00:08, 76.12it/s] Loading 0: 17%|█▋ | 130/771 [00:01<00:09, 67.74it/s] Loading 0: 18%|█▊ | 138/771 [00:02<00:11, 56.23it/s] Loading 0: 19%|█▉ | 149/771 [00:02<00:09, 63.50it/s] Loading 0: 21%|██ | 161/771 [00:02<00:08, 70.64it/s] Loading 0: 23%|██▎ | 178/771 [00:02<00:08, 72.83it/s] Loading 0: 24%|██▍ | 186/771 [00:02<00:09, 61.41it/s] Loading 0: 26%|██▌ | 197/771 [00:02<00:08, 66.56it/s] Loading 0: 27%|██▋ | 209/771 [00:03<00:07, 72.57it/s] Loading 0: 29%|██▉ | 226/771 [00:03<00:07, 73.24it/s] Loading 0: 30%|███ | 234/771 [00:03<00:08, 61.36it/s] Loading 0: 32%|███▏ | 245/771 [00:03<00:07, 65.78it/s] Loading 0: 33%|███▎ | 256/771 [00:03<00:06, 74.33it/s] Loading 0: 35%|███▍ | 266/771 [00:03<00:06, 79.65it/s] Loading 0: 36%|███▌ | 275/771 [00:04<00:07, 64.79it/s] Loading 0: 37%|███▋ | 283/771 [00:04<00:08, 59.52it/s] Loading 0: 38%|███▊ | 293/771 [00:04<00:07, 63.49it/s] Loading 0: 39%|███▉ | 303/771 [00:18<03:32, 2.20it/s] Loading 0: 41%|████ | 314/771 [00:19<02:21, 3.22it/s] Loading 0: 42%|████▏ | 322/771 [00:19<01:46, 4.20it/s] Loading 0: 43%|████▎ | 329/771 [00:19<01:25, 5.15it/s] Loading 0: 44%|████▍ | 341/771 [00:19<00:54, 7.91it/s] Loading 0: 46%|████▌ | 353/771 [00:19<00:36, 11.55it/s] Loading 0: 48%|████▊ | 370/771 [00:20<00:22, 17.62it/s] Loading 0: 49%|████▉ | 378/771 [00:20<00:19, 19.97it/s] Loading 0: 50%|█████ | 389/771 [00:20<00:14, 25.82it/s] Loading 0: 52%|█████▏ | 401/771 [00:20<00:11, 33.38it/s] Loading 0: 54%|█████▍ | 418/771 [00:20<00:08, 42.81it/s] Loading 0: 55%|█████▌ | 426/771 [00:20<00:08, 41.84it/s] Loading 0: 57%|█████▋ | 437/771 [00:21<00:06, 48.90it/s] Loading 0: 58%|█████▊ | 449/771 [00:21<00:05, 57.04it/s] Loading 0: 60%|██████ | 466/771 [00:21<00:04, 63.43it/s] Loading 0: 61%|██████▏ | 474/771 [00:21<00:05, 56.42it/s] Loading 0: 63%|██████▎ | 484/771 [00:21<00:04, 63.57it/s] Loading 0: 64%|██████▍ | 496/771 [00:21<00:03, 70.38it/s] Loading 0: 66%|██████▌ | 507/771 [00:22<00:03, 78.53it/s] Loading 0: 67%|██████▋ | 516/771 [00:22<00:04, 59.85it/s] Loading 0: 68%|██████▊ | 524/771 [00:22<00:04, 61.09it/s] Loading 0: 69%|██████▉ | 533/771 [00:22<00:03, 63.54it/s] Loading 0: 71%|███████ | 545/771 [00:22<00:03, 70.89it/s] Loading 0: 73%|███████▎ | 562/771 [00:22<00:02, 73.27it/s] Loading 0: 74%|███████▍ | 570/771 [00:23<00:03, 61.45it/s] Loading 0: 75%|███████▌ | 581/771 [00:23<00:02, 67.07it/s] Loading 0: 77%|███████▋ | 593/771 [00:23<00:02, 73.26it/s] Loading 0: 79%|███████▉ | 610/771 [00:23<00:02, 74.46it/s] Loading 0: 80%|████████ | 618/771 [00:23<00:02, 62.40it/s] Loading 0: 81%|████████▏ | 627/771 [00:38<00:59, 2.42it/s] Loading 0: 83%|████████▎ | 639/771 [00:38<00:37, 3.53it/s] Loading 0: 84%|████████▍ | 650/771 [00:38<00:24, 4.94it/s] Loading 0: 85%|████████▌ | 658/771 [00:38<00:18, 6.26it/s] Loading 0: 86%|████████▋ | 665/771 [00:38<00:14, 7.40it/s] Loading 0: 88%|████████▊ | 677/771 [00:39<00:08, 11.03it/s] Loading 0: 89%|████████▉ | 689/771 [00:39<00:05, 15.66it/s] Loading 0: 92%|█████████▏| 706/771 [00:39<00:02, 23.15it/s] Loading 0: 93%|█████████▎| 714/771 [00:39<00:02, 25.31it/s] Loading 0: 94%|█████████▍| 725/771 [00:39<00:01, 31.93it/s] Loading 0: 96%|█████████▌| 737/771 [00:39<00:00, 40.20it/s] Loading 0: 98%|█████████▊| 754/771 [00:47<00:03, 5.16it/s] Loading 0: 99%|█████████▊| 760/771 [00:47<00:01, 6.06it/s]
Job qwen-qwen2-5-32b-v11-mkmlizer completed after 196.07s with status: succeeded
Stopping job with name qwen-qwen2-5-32b-v11-mkmlizer
Pipeline stage MKMLizer completed in 196.53s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.16s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service qwen-qwen2-5-32b-v11
Waiting for inference service qwen-qwen2-5-32b-v11 to be ready
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Inference service qwen-qwen2-5-32b-v11 ready after 331.60169792175293s
Pipeline stage MKMLDeployer completed in 332.13s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.5819168090820312s
Received healthy response to inference request in 2.72748064994812s
Failed to get response for submission qwen-qwen2-5-32b_v9: ('http://qwen-qwen2-5-32b-v9-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Received healthy response to inference request in 3.1838302612304688s
Received healthy response to inference request in 1.8934087753295898s
Received healthy response to inference request in 3.8232977390289307s
5 requests
0 failed requests
5th percentile: 2.031110382080078
10th percentile: 2.1688119888305666
20th percentile: 2.444215202331543
30th percentile: 2.611029577255249
40th percentile: 2.6692551136016847
50th percentile: 2.72748064994812
60th percentile: 2.9100204944610595
70th percentile: 3.092560338973999
80th percentile: 3.3117237567901614
90th percentile: 3.567510747909546
95th percentile: 3.695404243469238
99th percentile: 3.7977190399169922
mean time: 2.8419868469238283
Pipeline stage StressChecker completed in 15.59s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.76s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 0.65s
Shutdown handler de-registered
qwen-qwen2-5-32b_v11 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
clean up pipeline due to error=DeploymentChecksError("('http://qwen-qwen2-5-32b-v11-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')")
Shutdown handler de-registered
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
clean up pipeline due to error=DeploymentChecksError("('http://qwen-qwen2-5-32b-v11-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')")
Shutdown handler de-registered
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
%s, retrying in %s seconds...
Evaluating %s Family Friendly Score with %s threads
clean up pipeline due to error=DeploymentChecksError("('http://qwen-qwen2-5-32b-v11-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')")
Shutdown handler de-registered
qwen-qwen2-5-32b_v11 status is now torndown due to DeploymentManager action
qwen-qwen2-5-32b_v11 status is now torndown due to DeploymentManager action
qwen-qwen2-5-32b_v11 status is now torndown due to DeploymentManager action
ChatRequest
Generation Params
Prompt Formatter
Chat History
ChatMessage 1