Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Shutdown handler not registered because Python interpreter is not running in the main thread
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name qwen-qwen2-5-3b-instruct-v5-uploader
Waiting for job on qwen-qwen2-5-3b-instruct-v5-uploader to finish
qwen-qwen2-5-3b-instruct-v5-uploader: Using quantization_mode: none
qwen-qwen2-5-3b-instruct-v5-uploader: Downloading snapshot of Qwen/Qwen2.5-3B-Instruct...
qwen-qwen2-5-3b-instruct-v5-uploader:
Fetching 12 files: 0%| | 0/12 [00:00<?, ?it/s]
Fetching 12 files: 8%|▊ | 1/12 [00:00<00:03, 3.13it/s]
Fetching 12 files: 58%|█████▊ | 7/12 [00:03<00:02, 2.11it/s]
Fetching 12 files: 100%|██████████| 12/12 [00:03<00:00, 3.67it/s]
qwen-qwen2-5-3b-instruct-v5-uploader: Downloaded in 3.436s
qwen-qwen2-5-3b-instruct-v5-uploader: Processed model Qwen/Qwen2.5-3B-Instruct in 5.857s
qwen-qwen2-5-3b-instruct-v5-uploader: creating bucket guanaco-vllm-models
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v5-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
qwen-qwen2-5-3b-instruct-v5-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v5-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v5-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v5-uploader: if re.search("-\.", bucket, re.UNICODE):
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen2-5-3b-instruct-v5-uploader: if re.search("\.\.", bucket, re.UNICODE):
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
qwen-qwen2-5-3b-instruct-v5-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
qwen-qwen2-5-3b-instruct-v5-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
qwen-qwen2-5-3b-instruct-v5-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
qwen-qwen2-5-3b-instruct-v5-uploader: Bucket 's3://guanaco-vllm-models/' created
qwen-qwen2-5-3b-instruct-v5-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/LICENSE s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/LICENSE
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/README.md
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/config.json
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/.gitattributes
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/generation_config.json
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/tokenizer_config.json
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/model.safetensors.index.json
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/merges.txt
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/vocab.json
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/tokenizer.json
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/model-00002-of-00002.safetensors
qwen-qwen2-5-3b-instruct-v5-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/qwen-qwen2-5-3b-instruct-v5/model-00001-of-00002.safetensors
Job qwen-qwen2-5-3b-instruct-v5-uploader completed after 42.95s with status: succeeded
Stopping job with name qwen-qwen2-5-3b-instruct-v5-uploader
Pipeline stage VLLMUploader completed in 43.65s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.14s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen2-5-3b-instruct-v5
Waiting for inference service qwen-qwen2-5-3b-instruct-v5 to be ready
Inference service qwen-qwen2-5-3b-instruct-v5 ready after 161.02453136444092s
Pipeline stage VLLMDeployer completed in 161.54s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 0.9730982780456543s
Received healthy response to inference request in 0.7589471340179443s
Received healthy response to inference request in 0.7723636627197266s
Received healthy response to inference request in 0.776757001876831s
Received healthy response to inference request in 0.6087236404418945s
Received healthy response to inference request in 0.627561092376709s
Received healthy response to inference request in 0.6977646350860596s
Received healthy response to inference request in 0.6871800422668457s
Received healthy response to inference request in 1.3826673030853271s
Received healthy response to inference request in 0.7669377326965332s
Received healthy response to inference request in 0.9277091026306152s
Received healthy response to inference request in 0.5443603992462158s
Received healthy response to inference request in 0.6996660232543945s
Received healthy response to inference request in 1.0310709476470947s
Received healthy response to inference request in 0.8001267910003662s
Received healthy response to inference request in 0.5683419704437256s
Received healthy response to inference request in 0.48024559020996094s
Received healthy response to inference request in 0.6489489078521729s
Received healthy response to inference request in 0.5674071311950684s
Received healthy response to inference request in 0.48870086669921875s
Received healthy response to inference request in 0.806973934173584s
Received healthy response to inference request in 0.5917279720306396s
Received healthy response to inference request in 0.49126529693603516s
Received healthy response to inference request in 0.7379708290100098s
Received healthy response to inference request in 0.510662317276001s
Received healthy response to inference request in 0.7862298488616943s
Received healthy response to inference request in 0.5370578765869141s
Received healthy response to inference request in 0.8258533477783203s
Received healthy response to inference request in 0.5591464042663574s
Received healthy response to inference request in 0.7992358207702637s
30 requests
0 failed requests
5th percentile: 0.4898548603057861
10th percentile: 0.5087226152420044
20th percentile: 0.5561892032623291
30th percentile: 0.5847121715545655
40th percentile: 0.6403937816619873
50th percentile: 0.698715329170227
60th percentile: 0.7621433734893799
70th percentile: 0.7795988559722901
80th percentile: 0.8014962196350098
90th percentile: 0.9322480201721192
95th percentile: 1.0049832463264463
99th percentile: 1.28070436000824
mean time: 0.7151567300160726
Pipeline stage StressChecker completed in 24.01s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.48s
Shutdown handler de-registered
qwen-qwen2-5-3b-instruct_v5 status is now deployed due to DeploymentManager action
qwen-qwen2-5-3b-instruct_v5 status is now inactive due to auto deactivation removed underperforming models