Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name qwen-qwen2-5-3b-instruct-v3-uploader
Waiting for job on qwen-qwen2-5-3b-instruct-v3-uploader to finish
qwen-qwen2-5-3b-instruct-v3-uploader: Using quantization_mode: none
qwen-qwen2-5-3b-instruct-v3-uploader: Downloading snapshot of Qwen/Qwen2.5-3B-Instruct...
qwen-qwen2-5-3b-instruct-v3-uploader:
Fetching 12 files: 0%| | 0/12 [00:00<?, ?it/s]
Fetching 12 files: 8%|▊ | 1/12 [00:00<00:02, 4.31it/s]
Fetching 12 files: 58%|█████▊ | 7/12 [00:03<00:02, 2.24it/s]
Fetching 12 files: 100%|██████████| 12/12 [00:03<00:00, 3.92it/s]
qwen-qwen2-5-3b-instruct-v3-uploader: Downloaded in 3.176s
qwen-qwen2-5-3b-instruct-v3-uploader: Processed model Qwen/Qwen2.5-3B-Instruct in 5.854s
qwen-qwen2-5-3b-instruct-v3-uploader: creating bucket guanaco-vllm-models
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v3-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
qwen-qwen2-5-3b-instruct-v3-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v3-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v3-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v3-uploader: if re.search("-\.", bucket, re.UNICODE):
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v3-uploader: if re.search("\.\.", bucket, re.UNICODE):
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
qwen-qwen2-5-3b-instruct-v3-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
qwen-qwen2-5-3b-instruct-v3-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
qwen-qwen2-5-3b-instruct-v3-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
qwen-qwen2-5-3b-instruct-v3-uploader: Bucket 's3://guanaco-vllm-models/' created
qwen-qwen2-5-3b-instruct-v3-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/config.json
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/LICENSE s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/LICENSE
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/tokenizer_config.json
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/.gitattributes
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/merges.txt
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/generation_config.json
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/README.md
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/model.safetensors.index.json
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/vocab.json
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/tokenizer.json
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/model-00002-of-00002.safetensors
qwen-qwen2-5-3b-instruct-v3-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v3/model-00001-of-00002.safetensors
Job qwen-qwen2-5-3b-instruct-v3-uploader completed after 267.63s with status: succeeded
Stopping job with name qwen-qwen2-5-3b-instruct-v3-uploader
Pipeline stage VLLMUploader completed in 268.14s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.90s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen2-5-3b-instruct-v3
Waiting for inference service qwen-qwen2-5-3b-instruct-v3 to be ready
Inference service qwen-qwen2-5-3b-instruct-v3 ready after 151.06214928627014s
Pipeline stage VLLMDeployer completed in 151.70s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 0.8867659568786621s
Received healthy response to inference request in 0.9425926208496094s
Received healthy response to inference request in 0.6343634128570557s
Received healthy response to inference request in 0.45087289810180664s
Received healthy response to inference request in 0.9227991104125977s
Received healthy response to inference request in 0.5952398777008057s
Received healthy response to inference request in 0.8012290000915527s
Received healthy response to inference request in 1.2955231666564941s
Received healthy response to inference request in 1.527888298034668s
Received healthy response to inference request in 0.6113348007202148s
Received healthy response to inference request in 0.7278189659118652s
Received healthy response to inference request in 0.7459719181060791s
Received healthy response to inference request in 0.5071051120758057s
Received healthy response to inference request in 0.6123363971710205s
Received healthy response to inference request in 0.8116581439971924s
Received healthy response to inference request in 0.6505465507507324s
Received healthy response to inference request in 0.6832623481750488s
Received healthy response to inference request in 0.7823810577392578s
Received healthy response to inference request in 0.7398958206176758s
Received healthy response to inference request in 1.1853129863739014s
Received healthy response to inference request in 1.2728769779205322s
Received healthy response to inference request in 0.763045072555542s
Received healthy response to inference request in 0.9992251396179199s
Received healthy response to inference request in 0.5567982196807861s
Received healthy response to inference request in 0.7129273414611816s
Received healthy response to inference request in 0.8823306560516357s
Received healthy response to inference request in 0.7151169776916504s
Received healthy response to inference request in 0.7574679851531982s
Received healthy response to inference request in 0.6232771873474121s
Received healthy response to inference request in 1.0187952518463135s
30 requests
0 failed requests
5th percentile: 0.5294670104980469
10th percentile: 0.5913957118988037
20th percentile: 0.6210890293121338
30th percentile: 0.6734476089477539
40th percentile: 0.7227381706237793
50th percentile: 0.7517199516296387
60th percentile: 0.7899202346801758
70th percentile: 0.8836612462997436
80th percentile: 0.9539191246032717
90th percentile: 1.1940693855285647
95th percentile: 1.2853323817253113
99th percentile: 1.4605024099349977
mean time: 0.8138919750849406
Pipeline stage StressChecker completed in 27.96s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.62s
Shutdown handler de-registered
qwen-qwen2-5-3b-instruct_v3 status is now deployed due to DeploymentManager action
qwen-qwen2-5-3b-instruct_v3 status is now inactive due to auto deactivation removed underperforming models