Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v19-uploader
Waiting for job on google-gemma-4-31b-it-v19-uploader to finish
google-gemma-4-31b-it-v19-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v19-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v19-uploader: Downloaded in 28.359s
2026-04-07T19:43:04.978690+00:00 monitor updated for google-gemma-4-31b-it_v19
google-gemma-4-31b-it-v19-uploader: Processed model google/gemma-4-31B-it in 51.041s
google-gemma-4-31b-it-v19-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v19-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v19-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v19-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v19-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v19-uploader: if re.search("-\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v19-uploader: if re.search("\.\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
google-gemma-4-31b-it-v19-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
google-gemma-4-31b-it-v19-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
google-gemma-4-31b-it-v19-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
google-gemma-4-31b-it-v19-uploader: Bucket 's3://guanaco-vllm-models/' created
google-gemma-4-31b-it-v19-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/.gitattributes
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/README.md
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/config.json
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/tokenizer_config.json
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/processor_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/processor_config.json
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/model.safetensors.index.json
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/chat_template.jinja
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/generation_config.json
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/tokenizer.json
google-gemma-4-31b-it-v19-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v19/default/model-00002-of-00002.safetensors
2026-04-07T19:44:05.159045+00:00 monitor updated for google-gemma-4-31b-it_v19
Job google-gemma-4-31b-it-v19-uploader completed after 170.8s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v19-uploader
Pipeline stage VLLMUploader completed in 172.02s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.19s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.31s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v19
Waiting for inference service google-gemma-4-31b-it-v19 to be ready
2026-04-07T19:45:05.368307+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:46:05.566498+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:47:05.770016+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:48:05.977616+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:49:06.155879+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:50:06.443930+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:51:06.678062+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:52:06.981291+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:53:07.244091+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:54:07.456517+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:55:07.672867+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:56:07.899815+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:57:08.137682+00:00 monitor updated for google-gemma-4-31b-it_v19
2026-04-07T19:58:08.763548+00:00 monitor updated for google-gemma-4-31b-it_v19
Inference service google-gemma-4-31b-it-v19 ready after 790.8186123371124s
Pipeline stage VLLMDeployer completed in 792.26s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 12.299941539764404s
Received healthy response to inference request in 11.56465482711792s
Received healthy response to inference request in 4.281947374343872s
Received healthy response to inference request in 11.682807922363281s
Received healthy response to inference request in 4.180165529251099s
Received healthy response to inference request in 11.778138399124146s
2026-04-07T19:59:08.977308+00:00 monitor updated for google-gemma-4-31b-it_v19
Received healthy response to inference request in 12.15792441368103s
Received healthy response to inference request in 4.190220355987549s
Received healthy response to inference request in 4.221025228500366s
Received healthy response to inference request in 4.131706714630127s
Received healthy response to inference request in 5.043622016906738s
Received healthy response to inference request in 4.1976213455200195s
Received healthy response to inference request in 4.389627456665039s
Received healthy response to inference request in 4.008623123168945s
Received healthy response to inference request in 4.511573314666748s
Received healthy response to inference request in 4.153040409088135s
Received healthy response to inference request in 4.3712990283966064s
Received healthy response to inference request in 4.163147449493408s
2026-04-07T20:00:09.209077+00:00 monitor updated for google-gemma-4-31b-it_v19
Received healthy response to inference request in 4.286266088485718s
Received healthy response to inference request in 4.916428089141846s
Received healthy response to inference request in 4.135718107223511s
Received healthy response to inference request in 4.173004150390625s
Received healthy response to inference request in 4.101456880569458s
Received healthy response to inference request in 4.1981236934661865s
Received healthy response to inference request in 4.54474139213562s
Received healthy response to inference request in 4.241558790206909s
Received healthy response to inference request in 4.272902965545654s
Received healthy response to inference request in 4.07327675819397s
Received healthy response to inference request in 4.189446687698364s
Received healthy response to inference request in 4.714776277542114s
30 requests
0 failed requests
5th percentile: 4.085957813262939
10th percentile: 4.12868173122406
20th percentile: 4.161126041412354
30th percentile: 4.186662340164185
40th percentile: 4.1979227542877195
50th percentile: 4.257230877876282
60th percentile: 4.320279264450073
70th percentile: 4.521523737907409
80th percentile: 4.941866874694824
90th percentile: 11.692340970039368
95th percentile: 11.987020707130432
99th percentile: 12.258756573200227
mean time: 5.572492877642314
Pipeline stage StressChecker completed in 173.20s
Shutdown handler de-registered
google-gemma-4-31b-it_v19 status is now deployed due to DeploymentManager action
google-gemma-4-31b-it_v19 status is now inactive due to auto deactivation removed underperforming models