Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-2fe5-c13f-linear-w01-v40-uploader
Waiting for job on chaiml-2fe5-c13f-linear-w01-v40-uploader to finish
chaiml-2fe5-c13f-linear-w01-v40-uploader: Using quantization_mode: none
chaiml-2fe5-c13f-linear-w01-v40-uploader: Downloading snapshot of ChaiML/2fe5-c13f-linear-w01...
chaiml-2fe5-c13f-linear-w01-v40-uploader:
Fetching 14 files: 0%| | 0/14 [00:00<?, ?it/s]
Fetching 14 files: 7%|▋ | 1/14 [00:00<00:03, 3.88it/s]
Fetching 14 files: 43%|████▎ | 6/14 [00:12<00:17, 2.18s/it]
Fetching 14 files: 64%|██████▍ | 9/14 [00:12<00:06, 1.26s/it]
Fetching 14 files: 100%|██████████| 14/14 [00:12<00:00, 1.11it/s]
chaiml-2fe5-c13f-linear-w01-v40-uploader: Downloaded in 12.738s
chaiml-2fe5-c13f-linear-w01-v40-uploader: Processed model ChaiML/2fe5-c13f-linear-w01 in 21.736s
chaiml-2fe5-c13f-linear-w01-v40-uploader: creating bucket guanaco-vllm-models
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-w01-v40-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-2fe5-c13f-linear-w01-v40-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-w01-v40-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-w01-v40-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-w01-v40-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-w01-v40-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-2fe5-c13f-linear-w01-v40-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-2fe5-c13f-linear-w01-v40-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-2fe5-c13f-linear-w01-v40-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-2fe5-c13f-linear-w01-v40-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-2fe5-c13f-linear-w01-v40-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/.gitattributes
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/config.json
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/mergekit_config.yml s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/mergekit_config.yml
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/special_tokens_map.json
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/README.md
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/model.safetensors.index.json
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/tokenizer_config.json
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/mergekit_config.yaml s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/mergekit_config.yaml
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/tokenizer.json
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/model-00001-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/model-00001-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/model-00002-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/model-00002-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/model-00003-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/model-00003-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/model-00004-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/model-00004-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v40-uploader: cp /dev/shm/model_output/model-00005-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v40/model-00005-of-00005.safetensors
Job chaiml-2fe5-c13f-linear-w01-v40-uploader completed after 206.29s with status: succeeded
Stopping job with name chaiml-2fe5-c13f-linear-w01-v40-uploader
Pipeline stage VLLMUploader completed in 206.79s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.14s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-2fe5-c13f-linear-w01-v40
Waiting for inference service chaiml-2fe5-c13f-linear-w01-v40 to be ready
Inference service chaiml-2fe5-c13f-linear-w01-v40 ready after 614.6786639690399s
Pipeline stage VLLMDeployer completed in 615.21s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.3375236988067627s
Received healthy response to inference request in 1.5701510906219482s
Received healthy response to inference request in 1.3530337810516357s
Received healthy response to inference request in 1.4370942115783691s
Received healthy response to inference request in 1.358964443206787s
Received healthy response to inference request in 1.3339803218841553s
Received healthy response to inference request in 1.8003146648406982s
Received healthy response to inference request in 1.818830966949463s
Received healthy response to inference request in 1.486856460571289s
Received healthy response to inference request in 1.2895424365997314s
Received healthy response to inference request in 1.5243358612060547s
Received healthy response to inference request in 1.3312020301818848s
Received healthy response to inference request in 1.6046931743621826s
Received healthy response to inference request in 1.6767661571502686s
Received healthy response to inference request in 1.328183889389038s
Received healthy response to inference request in 1.564528465270996s
Received healthy response to inference request in 1.4310760498046875s
Received healthy response to inference request in 1.4273226261138916s
Received healthy response to inference request in 1.4187369346618652s
Received healthy response to inference request in 1.7529652118682861s
Received healthy response to inference request in 1.3011794090270996s
Received healthy response to inference request in 1.3509316444396973s
Received healthy response to inference request in 1.609907627105713s
Received healthy response to inference request in 1.5957789421081543s
Received healthy response to inference request in 1.3595857620239258s
Received healthy response to inference request in 1.6597325801849365s
Received healthy response to inference request in 1.9288761615753174s
Received healthy response to inference request in 1.3424410820007324s
Received healthy response to inference request in 1.7738137245178223s
Received healthy response to inference request in 1.4403076171875s
30 requests
0 failed requests
5th percentile: 1.313331425189972
10th percentile: 1.3309002161026
20th percentile: 1.3414576053619385
30th percentile: 1.3571852445602417
40th percentile: 1.423888349533081
50th percentile: 1.4387009143829346
60th percentile: 1.5404129028320312
70th percentile: 1.5984532117843628
80th percentile: 1.6631392955780029
90th percentile: 1.7764638185501098
95th percentile: 1.8104986310005187
99th percentile: 1.8969630551338197
mean time: 1.5069552342096963
Pipeline stage StressChecker completed in 47.92s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.59s
Shutdown handler de-registered
chaiml-2fe5-c13f-linear-w01_v40 status is now deployed due to DeploymentManager action