Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name qwen-qwen2-5-3b-instruct-v4-uploader
Waiting for job on qwen-qwen2-5-3b-instruct-v4-uploader to finish
qwen-qwen2-5-3b-instruct-v4-uploader: Using quantization_mode: none
qwen-qwen2-5-3b-instruct-v4-uploader: Downloading snapshot of Qwen/Qwen2.5-3B-Instruct...
qwen-qwen2-5-3b-instruct-v4-uploader:
Fetching 12 files: 0%| | 0/12 [00:00<?, ?it/s]
Fetching 12 files: 8%|▊ | 1/12 [00:00<00:03, 3.40it/s]
Fetching 12 files: 58%|█████▊ | 7/12 [00:03<00:02, 2.10it/s]
Fetching 12 files: 100%|██████████| 12/12 [00:03<00:00, 3.66it/s]
qwen-qwen2-5-3b-instruct-v4-uploader: Downloaded in 3.400s
qwen-qwen2-5-3b-instruct-v4-uploader: Processed model Qwen/Qwen2.5-3B-Instruct in 5.968s
qwen-qwen2-5-3b-instruct-v4-uploader: creating bucket guanaco-vllm-models
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v4-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
qwen-qwen2-5-3b-instruct-v4-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v4-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v4-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v4-uploader: if re.search("-\.", bucket, re.UNICODE):
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v4-uploader: if re.search("\.\.", bucket, re.UNICODE):
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
qwen-qwen2-5-3b-instruct-v4-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
qwen-qwen2-5-3b-instruct-v4-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
qwen-qwen2-5-3b-instruct-v4-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
qwen-qwen2-5-3b-instruct-v4-uploader: Bucket 's3://guanaco-vllm-models/' created
qwen-qwen2-5-3b-instruct-v4-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/.gitattributes
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/generation_config.json
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/config.json
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/model.safetensors.index.json
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/tokenizer_config.json
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/LICENSE s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/LICENSE
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/README.md
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/vocab.json
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/merges.txt
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/model-00002-of-00002.safetensors
qwen-qwen2-5-3b-instruct-v4-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v4/model-00001-of-00002.safetensors
Job qwen-qwen2-5-3b-instruct-v4-uploader completed after 125.15s with status: succeeded
Stopping job with name qwen-qwen2-5-3b-instruct-v4-uploader
Pipeline stage VLLMUploader completed in 125.89s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen2-5-3b-instruct-v4
Waiting for inference service qwen-qwen2-5-3b-instruct-v4 to be ready
Inference service qwen-qwen2-5-3b-instruct-v4 ready after 160.8049018383026s
Pipeline stage VLLMDeployer completed in 161.33s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 0.8825576305389404s
Received healthy response to inference request in 0.6800124645233154s
Received healthy response to inference request in 0.7184875011444092s
Received healthy response to inference request in 0.7290647029876709s
Received healthy response to inference request in 0.795490026473999s
Received healthy response to inference request in 0.7518174648284912s
Received healthy response to inference request in 0.5064907073974609s
Received healthy response to inference request in 0.699568510055542s
Received healthy response to inference request in 0.8061316013336182s
Received healthy response to inference request in 0.5670251846313477s
Received healthy response to inference request in 0.7307107448577881s
Received healthy response to inference request in 0.5678598880767822s
Received healthy response to inference request in 0.6567621231079102s
Received healthy response to inference request in 0.4756162166595459s
Received healthy response to inference request in 0.6340441703796387s
Received healthy response to inference request in 0.5871074199676514s
Received healthy response to inference request in 0.5965759754180908s
Received healthy response to inference request in 0.5640859603881836s
Received healthy response to inference request in 0.4199681282043457s
Received healthy response to inference request in 0.6712338924407959s
Received healthy response to inference request in 0.6665024757385254s
Received healthy response to inference request in 0.6228010654449463s
Received healthy response to inference request in 0.6930742263793945s
Received healthy response to inference request in 0.48756909370422363s
Received healthy response to inference request in 0.76163649559021s
Received healthy response to inference request in 0.4830024242401123s
Received healthy response to inference request in 0.40110063552856445s
Received healthy response to inference request in 0.7680323123931885s
Received healthy response to inference request in 0.43730807304382324s
Received healthy response to inference request in 0.6091654300689697s
30 requests
0 failed requests
5th percentile: 0.4277711033821106
10th percentile: 0.47178540229797367
20th percentile: 0.5027063846588135
30th percentile: 0.5676094770431519
40th percentile: 0.6041296482086181
50th percentile: 0.6454031467437744
60th percentile: 0.6747453212738037
70th percentile: 0.7052442073822021
80th percentile: 0.7349320888519287
90th percentile: 0.7707780838012696
95th percentile: 0.8013428926467895
99th percentile: 0.860394082069397
mean time: 0.6323600848515828
Pipeline stage StressChecker completed in 21.77s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
qwen-qwen2-5-3b-instruct_v4 status is now deployed due to DeploymentManager action
qwen-qwen2-5-3b-instruct_v4 status is now inactive due to auto deactivation removed underperforming models