Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-2a6f-69d4-linear-82448-v3-uploader
Waiting for job on chaiml-2a6f-69d4-linear-82448-v3-uploader to finish
chaiml-2a6f-69d4-linear-82448-v3-uploader: Using quantization_mode: none
chaiml-2a6f-69d4-linear-82448-v3-uploader: Downloading snapshot of ChaiML/2a6f-69d4-linear-w01-W4A16-G128-AutoRound...
chaiml-2a6f-69d4-linear-82448-v3-uploader:
Fetching 13 files: 0%| | 0/13 [00:00<?, ?it/s]
Fetching 13 files: 8%|▊ | 1/13 [00:00<00:03, 3.51it/s]
Fetching 13 files: 46%|████▌ | 6/13 [00:08<00:09, 1.41s/it]
Fetching 13 files: 100%|██████████| 13/13 [00:08<00:00, 1.60it/s]
chaiml-2a6f-69d4-linear-82448-v3-uploader: Downloaded in 8.252s
chaiml-2a6f-69d4-linear-82448-v3-uploader: Processed model ChaiML/2a6f-69d4-linear-w01-W4A16-G128-AutoRound in 13.558s
chaiml-2a6f-69d4-linear-82448-v3-uploader: creating bucket guanaco-vllm-models
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-2a6f-69d4-linear-82448-v3-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-2a6f-69d4-linear-82448-v3-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-2a6f-69d4-linear-82448-v3-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-2a6f-69d4-linear-82448-v3-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-2a6f-69d4-linear-82448-v3-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-2a6f-69d4-linear-82448-v3-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-2a6f-69d4-linear-82448-v3-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-2a6f-69d4-linear-82448-v3-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-2a6f-69d4-linear-82448-v3-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-2a6f-69d4-linear-82448-v3-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-2a6f-69d4-linear-82448-v3-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/.gitattributes
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/README.md
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/generation_config.json
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/config.json
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/recipe.yaml
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/special_tokens_map.json
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/model.safetensors.index.json
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/tokenizer_config.json
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/tokenizer.json
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/chat_template.jinja
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/model-00003-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/model-00003-of-00003.safetensors
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/model-00002-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/model-00002-of-00003.safetensors
chaiml-2a6f-69d4-linear-82448-v3-uploader: cp /dev/shm/model_output/model-00001-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2a6f-69d4-linear-82448-v3/model-00001-of-00003.safetensors
Job chaiml-2a6f-69d4-linear-82448-v3-uploader completed after 267.28s with status: succeeded
Stopping job with name chaiml-2a6f-69d4-linear-82448-v3-uploader
Pipeline stage VLLMUploader completed in 267.74s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-2a6f-69d4-linear-82448-v3
Waiting for inference service chaiml-2a6f-69d4-linear-82448-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Inference service chaiml-2a6f-69d4-linear-82448-v3 ready after 423.38505959510803s
Pipeline stage VLLMDeployer completed in 423.95s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.2749533653259277s
Received healthy response to inference request in 1.1493909358978271s
Received healthy response to inference request in 1.033973217010498s
Received healthy response to inference request in 1.5183522701263428s
Received healthy response to inference request in 1.1972520351409912s
Received healthy response to inference request in 1.1321821212768555s
Received healthy response to inference request in 1.1798601150512695s
Received healthy response to inference request in 1.0334513187408447s
Received healthy response to inference request in 1.4028174877166748s
Received healthy response to inference request in 1.051722526550293s
Received healthy response to inference request in 1.2951583862304688s
Received healthy response to inference request in 1.2537429332733154s
Received healthy response to inference request in 1.0909676551818848s
Received healthy response to inference request in 1.0921249389648438s
Received healthy response to inference request in 1.063627004623413s
Received healthy response to inference request in 1.9337990283966064s
Received healthy response to inference request in 1.133744716644287s
Received healthy response to inference request in 1.832822322845459s
Received healthy response to inference request in 1.4654757976531982s
Received healthy response to inference request in 1.1802713871002197s
Received healthy response to inference request in 1.5797548294067383s
Received healthy response to inference request in 1.4145283699035645s
Received healthy response to inference request in 1.0047948360443115s
Received healthy response to inference request in 1.013197660446167s
Received healthy response to inference request in 1.2269291877746582s
Received healthy response to inference request in 1.5514705181121826s
Received healthy response to inference request in 0.9886255264282227s
Received healthy response to inference request in 0.9581146240234375s
Received healthy response to inference request in 1.4315767288208008s
Received healthy response to inference request in 1.203944206237793s
30 requests
0 failed requests
5th percentile: 0.9959017157554626
10th percentile: 1.0123573780059814
20th percentile: 1.048172664642334
30th percentile: 1.091777753829956
40th percentile: 1.1431324481964111
50th percentile: 1.1887617111206055
60th percentile: 1.237654685974121
70th percentile: 1.3274561166763303
80th percentile: 1.4383565425872804
90th percentile: 1.5542989492416381
95th percentile: 1.718941950798034
99th percentile: 1.9045157837867737
mean time: 1.2562875350316365
Pipeline stage StressChecker completed in 41.18s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.08s
Shutdown handler de-registered
chaiml-2a6f-69d4-linear_82448_v3 status is now deployed due to DeploymentManager action