Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-muster-v0c-lr1e5e-3996-v6-uploader
Waiting for job on chaiml-muster-v0c-lr1e5e-3996-v6-uploader to finish
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Using quantization_mode: w4a16
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Checking if ChaiML/muster-v0c-lr1e5ep2r64g4b01-W4A16 already exists in ChaiML
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Downloading snapshot of ChaiML/muster-v0c-lr1e5ep2r64g4b01-W4A16...
chaiml-muster-v0c-lr1e5e-3996-v6-uploader:
Fetching 39 files: 0%| | 0/39 [00:00<?, ?it/s]
Fetching 39 files: 3%|▎ | 1/39 [00:00<00:12, 3.00it/s]
Fetching 39 files: 18%|█▊ | 7/39 [00:14<01:09, 2.18s/it]
Fetching 39 files: 31%|███ | 12/39 [00:15<00:30, 1.12s/it]
Fetching 39 files: 38%|███▊ | 15/39 [00:22<00:37, 1.54s/it]
Fetching 39 files: 41%|████ | 16/39 [00:23<00:34, 1.49s/it]
Fetching 39 files: 44%|████▎ | 17/39 [00:24<00:30, 1.39s/it]
Fetching 39 files: 46%|████▌ | 18/39 [00:26<00:31, 1.48s/it]
Fetching 39 files: 49%|████▊ | 19/39 [00:27<00:24, 1.25s/it]
Fetching 39 files: 51%|█████▏ | 20/39 [00:27<00:20, 1.10s/it]
Fetching 39 files: 54%|█████▍ | 21/39 [00:27<00:16, 1.12it/s]
Fetching 39 files: 56%|█████▋ | 22/39 [00:28<00:12, 1.36it/s]
Fetching 39 files: 59%|█████▉ | 23/39 [00:35<00:40, 2.53s/it]
Fetching 39 files: 62%|██████▏ | 24/39 [00:36<00:30, 2.05s/it]
Fetching 39 files: 64%|██████▍ | 25/39 [00:37<00:26, 1.89s/it]
Fetching 39 files: 67%|██████▋ | 26/39 [00:39<00:21, 1.69s/it]
Fetching 39 files: 69%|██████▉ | 27/39 [00:40<00:18, 1.57s/it]
Fetching 39 files: 72%|███████▏ | 28/39 [00:42<00:19, 1.74s/it]
Fetching 39 files: 79%|███████▉ | 31/39 [00:43<00:07, 1.03it/s]
Fetching 39 files: 82%|████████▏ | 32/39 [00:44<00:06, 1.12it/s]
Fetching 39 files: 100%|██████████| 39/39 [00:44<00:00, 1.13s/it]
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Downloaded in 44.213s
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Processed model ChaiML/muster-v0c-lr1e5ep2r64g4b01 in 44.751s
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: creating bucket guanaco-vllm-models
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/.gitattributes
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/chat_template.jinja
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/tokenizer_config.json
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/generation_config.json
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/special_tokens_map.json
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/quantization_config.json s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/quantization_config.json
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/added_tokens.json s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/added_tokens.json
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/vocab.json
HTTP Request: %s %s "%s %d %s"
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00027-of-00027.safetensors
HTTP Request: %s %s "%s %d %s"
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00003-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00003-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00021-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00021-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00007-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00019-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00015-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00023-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00014-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00014-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00018-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00017-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00010-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00010-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00012-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00005-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00026-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00004-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00004-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00024-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00002-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00022-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00022-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00013-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00006-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00009-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00009-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00020-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00016-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00016-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00001-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00001-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00008-of-00027.safetensors
chaiml-muster-v0c-lr1e5e-3996-v6-uploader: cp /dev/shm/model_output/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0c-lr1e5e-3996-v6/model-00025-of-00027.safetensors
Job chaiml-muster-v0c-lr1e5e-3996-v6-uploader completed after 948.22s with status: succeeded
Stopping job with name chaiml-muster-v0c-lr1e5e-3996-v6-uploader
Pipeline stage VLLMUploader completed in 948.88s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-muster-v0c-lr1e5e-3996-v6
Waiting for inference service chaiml-muster-v0c-lr1e5e-3996-v6 to be ready
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Inference service chaiml-muster-v0c-lr1e5e-3996-v6 ready after 1051.1376514434814s
Pipeline stage VLLMDeployer completed in 1051.59s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.0673089027404785s
Received healthy response to inference request in 2.246335029602051s
Received healthy response to inference request in 2.5836472511291504s
Received healthy response to inference request in 2.3004024028778076s
Received healthy response to inference request in 2.0943961143493652s
Received healthy response to inference request in 2.068831443786621s
Received healthy response to inference request in 2.0304484367370605s
Received healthy response to inference request in 2.169429063796997s
Received healthy response to inference request in 2.1484122276306152s
Received healthy response to inference request in 2.042524814605713s
Received healthy response to inference request in 2.1967036724090576s
Received healthy response to inference request in 2.428711175918579s
Received healthy response to inference request in 2.0268032550811768s
Received healthy response to inference request in 1.9047877788543701s
Received healthy response to inference request in 2.436326503753662s
Received healthy response to inference request in 2.2945218086242676s
Received healthy response to inference request in 1.965397596359253s
Received healthy response to inference request in 2.238597869873047s
Received healthy response to inference request in 1.9539880752563477s
Received healthy response to inference request in 2.2617697715759277s
Received healthy response to inference request in 1.8845748901367188s
Received healthy response to inference request in 2.3796069622039795s
Received healthy response to inference request in 2.029110908508301s
Received healthy response to inference request in 2.05137300491333s
Received healthy response to inference request in 2.0638363361358643s
Received healthy response to inference request in 2.1238033771514893s
Received healthy response to inference request in 2.039766788482666s
Received healthy response to inference request in 2.4788267612457275s
Received healthy response to inference request in 2.1132402420043945s
30 requests
1 failed requests
5th percentile: 1.92692791223526
10th percentile: 1.9642566442489624
20th percentile: 2.0301809310913086
30th percentile: 2.048718547821045
40th percentile: 2.068222427368164
50th percentile: 2.118521809577942
60th percentile: 2.180338907241821
70th percentile: 2.250965452194214
80th percentile: 2.3162433147430423
90th percentile: 2.4405765295028687
95th percentile: 2.53647803068161
99th percentile: 15.313329436779037
mean time: 2.771208651860555
%s, retrying in %s seconds...
Received healthy response to inference request in 2.581190824508667s
Received healthy response to inference request in 2.3610522747039795s
Received healthy response to inference request in 2.6633219718933105s
Received healthy response to inference request in 1.9693529605865479s
Received healthy response to inference request in 1.991046667098999s
Received healthy response to inference request in 1.9934921264648438s
Received healthy response to inference request in 2.498136043548584s
Received healthy response to inference request in 2.1391336917877197s
Received healthy response to inference request in 1.9331443309783936s
Received healthy response to inference request in 2.1655209064483643s
Received healthy response to inference request in 2.333491802215576s
Received healthy response to inference request in 2.860529899597168s
Received healthy response to inference request in 2.787214756011963s
Received healthy response to inference request in 1.9219415187835693s
Received healthy response to inference request in 1.873058795928955s
Received healthy response to inference request in 1.9572701454162598s
Received healthy response to inference request in 1.9968278408050537s
Received healthy response to inference request in 2.1369593143463135s
Received healthy response to inference request in 1.940338134765625s
Received healthy response to inference request in 1.8969027996063232s
Received healthy response to inference request in 2.0188207626342773s
HTTP Request: %s %s "%s %d %s"
Received healthy response to inference request in 2.1154565811157227s
Received healthy response to inference request in 2.0134899616241455s
Received healthy response to inference request in 2.356955051422119s
Received healthy response to inference request in 2.3641397953033447s
Received healthy response to inference request in 1.9429106712341309s
Received healthy response to inference request in 2.1776010990142822s
Received healthy response to inference request in 2.4395315647125244s
Received healthy response to inference request in 2.0870141983032227s
Received healthy response to inference request in 2.552502393722534s
30 requests
0 failed requests
5th percentile: 1.908170223236084
10th percentile: 1.9320240497589112
20th percentile: 1.954398250579834
30th percentile: 1.9927584886550904
40th percentile: 2.0166884422302247
50th percentile: 2.126207947731018
60th percentile: 2.1703529834747313
70th percentile: 2.3581842184066772
80th percentile: 2.4512524604797363
90th percentile: 2.5894039392471315
95th percentile: 2.731463003158569
99th percentile: 2.8392685079574584
mean time: 2.2022782961527505
Pipeline stage StressChecker completed in 158.11s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.76s
Shutdown handler de-registered
chaiml-muster-v0c-lr1e5e_3996_v6 status is now deployed due to DeploymentManager action
chaiml-muster-v0c-lr1e5e_3996_v6 status is now inactive due to auto deactivation removed underperforming models