Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v43-uploader
Waiting for job on google-gemma-4-31b-it-v43-uploader to finish
google-gemma-4-31b-it-v43-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v43-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v43-uploader: Downloaded in 32.154s
google-gemma-4-31b-it-v43-uploader: Processed model google/gemma-4-31B-it in 32.249s
google-gemma-4-31b-it-v43-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v43-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v43-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v43-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v43-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v43-uploader: if re.search("-\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v43-uploader: if re.search("\.\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
google-gemma-4-31b-it-v43-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
google-gemma-4-31b-it-v43-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
google-gemma-4-31b-it-v43-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
google-gemma-4-31b-it-v43-uploader: Bucket 's3://guanaco-vllm-models/' created
google-gemma-4-31b-it-v43-uploader: uploading /tmp/model_output to s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/tokenizer_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/tokenizer_config.json
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/config.json
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/generation_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/generation_config.json
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/processor_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/processor_config.json
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/.gitattributes s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/.gitattributes
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/README.md s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/README.md
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/model.safetensors.index.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/model.safetensors.index.json
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/chat_template.jinja s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/chat_template.jinja
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/tokenizer.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/tokenizer.json
2026-04-13T11:51:04.806328+00:00 monitor updated for google-gemma-4-31b-it_v43
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/model-00002-of-00002.safetensors
2026-04-13T11:52:04.896132+00:00 monitor updated for google-gemma-4-31b-it_v43
google-gemma-4-31b-it-v43-uploader: cp /tmp/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v43/default/model-00001-of-00002.safetensors
Job google-gemma-4-31b-it-v43-uploader completed after 133.11s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v43-uploader
Pipeline stage VLLMUploader completed in 133.61s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.09s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.31s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v43
Waiting for inference service google-gemma-4-31b-it-v43 to be ready
2026-04-13T11:53:04.991068+00:00 monitor updated for google-gemma-4-31b-it_v43
2026-04-13T11:54:05.130912+00:00 monitor updated for google-gemma-4-31b-it_v43
2026-04-13T11:55:05.220576+00:00 monitor updated for google-gemma-4-31b-it_v43
Inference service google-gemma-4-31b-it-v43 ready after 210.44487690925598s
Pipeline stage VLLMDeployer completed in 211.31s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 9.058895349502563s
2026-04-13T11:56:05.310623+00:00 monitor updated for google-gemma-4-31b-it_v43
Received healthy response to inference request in 11.138793230056763s
Received healthy response to inference request in 2.7169082164764404s
Received healthy response to inference request in 3.640136241912842s
Received healthy response to inference request in 2.234327554702759s
Received healthy response to inference request in 2.453479051589966s
Received healthy response to inference request in 2.4675791263580322s
Received healthy response to inference request in 2.409109592437744s
Received healthy response to inference request in 2.585310459136963s
Received healthy response to inference request in 2.5110926628112793s
Received healthy response to inference request in 11.042649507522583s
Received healthy response to inference request in 2.2913382053375244s
Received healthy response to inference request in 2.3686532974243164s
Received healthy response to inference request in 2.2752695083618164s
Received healthy response to inference request in 2.898106813430786s
Received healthy response to inference request in 2.401371479034424s
Received healthy response to inference request in 2.40042781829834s
Received healthy response to inference request in 2.362602710723877s
Received healthy response to inference request in 2.2709178924560547s
2026-04-13T11:57:05.403995+00:00 monitor updated for google-gemma-4-31b-it_v43
Received healthy response to inference request in 2.2880468368530273s
Received healthy response to inference request in 2.4045350551605225s
Received healthy response to inference request in 2.2987420558929443s
Received healthy response to inference request in 10.63412857055664s
Received healthy response to inference request in 2.276355504989624s
Received healthy response to inference request in 2.444066286087036s
Received healthy response to inference request in 2.407687187194824s
Received healthy response to inference request in 2.2831473350524902s
Received healthy response to inference request in 2.3273093700408936s
Received healthy response to inference request in 2.3006696701049805s
Received healthy response to inference request in 2.486456871032715s
30 requests
0 failed requests
5th percentile: 2.2728761196136475
10th percentile: 2.2762469053268433
20th percentile: 2.290679931640625
30th percentile: 2.3193174600601196
40th percentile: 2.3877180099487303
50th percentile: 2.4061111211776733
60th percentile: 2.447831392288208
70th percentile: 2.493847608566284
80th percentile: 2.75314793586731
90th percentile: 9.216418671607974
95th percentile: 10.858815085887908
99th percentile: 11.11091155052185
mean time: 3.5226037820180256
Pipeline stage StressChecker completed in 108.10s
Shutdown handler de-registered
google-gemma-4-31b-it_v43 status is now deployed due to DeploymentManager action
google-gemma-4-31b-it_v43 status is now inactive due to auto deactivation removed underperforming models