Shutdown handler not registered because Python interpreter is not running in the main thread
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-muster-v0-q235b-52842-v14-uploader
Waiting for job on chaiml-muster-v0-q235b-52842-v14-uploader to finish
chaiml-muster-v0-q235b-52842-v14-uploader: Using quantization_mode: w4a16
chaiml-muster-v0-q235b-52842-v14-uploader: Checking if ChaiML/muster-v0-q235b-lr1e4ep2r64g4-W4A16 already exists in ChaiML
chaiml-muster-v0-q235b-52842-v14-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-muster-v0-q235b-52842-v14-uploader: Downloading snapshot of ChaiML/muster-v0-q235b-lr1e4ep2r64g4-W4A16...
chaiml-muster-v0-q235b-52842-v14-uploader:
Fetching 39 files: 0%| | 0/39 [00:00<?, ?it/s]
Fetching 39 files: 3%|▎ | 1/39 [00:00<00:09, 4.06it/s]
Fetching 39 files: 18%|█▊ | 7/39 [00:18<01:25, 2.68s/it]
Fetching 39 files: 23%|██▎ | 9/39 [00:18<00:58, 1.96s/it]
Fetching 39 files: 26%|██▌ | 10/39 [00:18<00:48, 1.66s/it]
Fetching 39 files: 38%|███▊ | 15/39 [00:31<00:51, 2.14s/it]
Fetching 39 files: 41%|████ | 16/39 [00:32<00:46, 2.04s/it]
Fetching 39 files: 46%|████▌ | 18/39 [00:33<00:33, 1.59s/it]
Fetching 39 files: 49%|████▊ | 19/39 [00:35<00:31, 1.59s/it]
Fetching 39 files: 51%|█████▏ | 20/39 [00:37<00:32, 1.72s/it]
Fetching 39 files: 59%|█████▉ | 23/39 [00:47<00:38, 2.38s/it]
Fetching 39 files: 62%|██████▏ | 24/39 [00:49<00:36, 2.43s/it]
Fetching 39 files: 67%|██████▋ | 26/39 [00:50<00:23, 1.78s/it]
Fetching 39 files: 69%|██████▉ | 27/39 [00:51<00:18, 1.58s/it]
Fetching 39 files: 72%|███████▏ | 28/39 [00:52<00:16, 1.46s/it]
Fetching 39 files: 74%|███████▍ | 29/39 [00:54<00:15, 1.54s/it]
Fetching 39 files: 79%|███████▉ | 31/39 [00:56<00:10, 1.29s/it]
Fetching 39 files: 82%|████████▏ | 32/39 [00:57<00:08, 1.26s/it]
Fetching 39 files: 100%|██████████| 39/39 [00:57<00:00, 1.47s/it]
chaiml-muster-v0-q235b-52842-v14-uploader: Downloaded in 57.381s
chaiml-muster-v0-q235b-52842-v14-uploader: Processed model ChaiML/muster-v0-q235b-lr1e4ep2r64g4 in 57.907s
chaiml-muster-v0-q235b-52842-v14-uploader: creating bucket guanaco-vllm-models
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v14-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-muster-v0-q235b-52842-v14-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v14-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v14-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v14-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v14-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-muster-v0-q235b-52842-v14-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-muster-v0-q235b-52842-v14-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-muster-v0-q235b-52842-v14-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-muster-v0-q235b-52842-v14-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-muster-v0-q235b-52842-v14-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/added_tokens.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/added_tokens.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/chat_template.jinja
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/generation_config.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/.gitattributes
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/config.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/special_tokens_map.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/tokenizer_config.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/quantization_config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/quantization_config.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/vocab.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/merges.txt
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model.safetensors.index.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/tokenizer.json
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00027-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00002-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00001-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00001-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00023-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00013-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00009-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00009-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00017-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00016-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00016-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00019-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00003-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00003-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00011-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00011-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00018-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00005-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00007-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00025-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00026-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00014-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00014-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00021-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00021-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00008-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00020-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00024-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00006-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00015-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00012-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00022-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00022-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00010-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00010-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v14-uploader: cp /dev/shm/model_output/model-00004-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v14/model-00004-of-00027.safetensors
Job chaiml-muster-v0-q235b-52842-v14-uploader completed after 860.6s with status: succeeded
Stopping job with name chaiml-muster-v0-q235b-52842-v14-uploader
Pipeline stage VLLMUploader completed in 862.42s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.14s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-muster-v0-q235b-52842-v14
Waiting for inference service chaiml-muster-v0-q235b-52842-v14 to be ready
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Inference service chaiml-muster-v0-q235b-52842-v14 ready after 934.1413283348083s
Pipeline stage VLLMDeployer completed in 934.59s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.037886619567871s
Received healthy response to inference request in 2.602051258087158s
Received healthy response to inference request in 3.312246084213257s
Received healthy response to inference request in 2.0164263248443604s
Received healthy response to inference request in 1.9403448104858398s
Received healthy response to inference request in 2.0071773529052734s
Received healthy response to inference request in 2.1494431495666504s
Received healthy response to inference request in 1.9344923496246338s
Received healthy response to inference request in 1.8800585269927979s
Received healthy response to inference request in 2.162250518798828s
Received healthy response to inference request in 1.9713315963745117s
Received healthy response to inference request in 2.118438959121704s
Received healthy response to inference request in 2.1538472175598145s
Received healthy response to inference request in 1.92976713180542s
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Received healthy response to inference request in 2.26832914352417s
Received healthy response to inference request in 2.0356714725494385s
Received healthy response to inference request in 1.9356651306152344s
Received healthy response to inference request in 2.0861542224884033s
Received healthy response to inference request in 2.111276149749756s
Received healthy response to inference request in 1.9665396213531494s
Received healthy response to inference request in 2.5290465354919434s
Received healthy response to inference request in 2.1622867584228516s
Received healthy response to inference request in 1.9154388904571533s
Received healthy response to inference request in 2.328692674636841s
Received healthy response to inference request in 3.4411211013793945s
Received healthy response to inference request in 2.092991590499878s
Received healthy response to inference request in 2.0650899410247803s
Received healthy response to inference request in 2.0017435550689697s
Received healthy response to inference request in 2.2152187824249268s
Received healthy response to inference request in 2.1936089992523193s
30 requests
0 failed requests
5th percentile: 1.9218865990638734
10th percentile: 1.9340198278427123
20th percentile: 1.9613006591796875
30th percentile: 2.005547213554382
40th percentile: 2.037000560760498
50th percentile: 2.0895729064941406
60th percentile: 2.1308406352996827
70th percentile: 2.162261390686035
80th percentile: 2.2258408546447757
90th percentile: 2.536347007751465
95th percentile: 2.9926584124565103
99th percentile: 3.403747346401215
mean time: 2.1854878822962442
Pipeline stage StressChecker completed in 69.03s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.98s
Shutdown handler de-registered
chaiml-muster-v0-q235b_52842_v14 status is now deployed due to DeploymentManager action
chaiml-muster-v0-q235b_52842_v14 status is now inactive due to auto deactivation removed underperforming models
chaiml-muster-v0-q235b_52842_v14 status is now torndown due to DeploymentManager action