Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v20-uploader
Waiting for job on google-gemma-4-31b-it-v20-uploader to finish
google-gemma-4-31b-it-v20-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v20-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v20-uploader: Downloaded in 31.776s
2026-04-07T20:17:51.281418+00:00 monitor updated for google-gemma-4-31b-it_v20
google-gemma-4-31b-it-v20-uploader: Processed model google/gemma-4-31B-it in 53.934s
google-gemma-4-31b-it-v20-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v20-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v20-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v20-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v20-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v20-uploader: if re.search("-\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v20-uploader: if re.search("\.\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
google-gemma-4-31b-it-v20-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
google-gemma-4-31b-it-v20-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
google-gemma-4-31b-it-v20-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
google-gemma-4-31b-it-v20-uploader: Bucket 's3://guanaco-vllm-models/' created
google-gemma-4-31b-it-v20-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/.gitattributes
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/tokenizer_config.json
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/README.md
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/model.safetensors.index.json
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/generation_config.json
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/config.json
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/chat_template.jinja
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/processor_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/processor_config.json
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/tokenizer.json
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/model-00002-of-00002.safetensors
2026-04-07T20:18:51.476803+00:00 monitor updated for google-gemma-4-31b-it_v20
google-gemma-4-31b-it-v20-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v20/default/model-00001-of-00002.safetensors
Job google-gemma-4-31b-it-v20-uploader completed after 149.56s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v20-uploader
Pipeline stage VLLMUploader completed in 150.81s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.18s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.27s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v20
Waiting for inference service google-gemma-4-31b-it-v20 to be ready
2026-04-07T20:19:51.692260+00:00 monitor updated for google-gemma-4-31b-it_v20
2026-04-07T20:20:51.892127+00:00 monitor updated for google-gemma-4-31b-it_v20
2026-04-07T20:21:52.082712+00:00 monitor updated for google-gemma-4-31b-it_v20
2026-04-07T20:22:52.270129+00:00 monitor updated for google-gemma-4-31b-it_v20
2026-04-07T20:23:52.456443+00:00 monitor updated for google-gemma-4-31b-it_v20
Inference service google-gemma-4-31b-it-v20 ready after 283.3676791191101s
Pipeline stage VLLMDeployer completed in 284.73s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 12.228101015090942s
Received healthy response to inference request in 12.013630867004395s
Received healthy response to inference request in 4.311209440231323s
Received healthy response to inference request in 11.566702365875244s
2026-04-07T20:24:52.626230+00:00 monitor updated for google-gemma-4-31b-it_v20
Received healthy response to inference request in 4.165797233581543s
Received healthy response to inference request in 11.564317226409912s
Received healthy response to inference request in 4.119154453277588s
Received healthy response to inference request in 4.153909206390381s
Received healthy response to inference request in 4.388208866119385s
Received healthy response to inference request in 4.0700554847717285s
Received healthy response to inference request in 4.095192909240723s
Received healthy response to inference request in 4.331085443496704s
Received healthy response to inference request in 4.306710720062256s
Received healthy response to inference request in 4.099220275878906s
Received healthy response to inference request in 4.162569522857666s
2026-04-07T20:25:52.818707+00:00 monitor updated for google-gemma-4-31b-it_v20
Received healthy response to inference request in 4.0508081912994385s
Received healthy response to inference request in 4.530407190322876s
Received healthy response to inference request in 4.584288597106934s
Received healthy response to inference request in 4.160653591156006s
Received healthy response to inference request in 4.146569013595581s
Received healthy response to inference request in 4.3208324909210205s
Received healthy response to inference request in 11.703619956970215s
Received healthy response to inference request in 4.121368885040283s
Received healthy response to inference request in 4.2624781131744385s
Received healthy response to inference request in 4.270867347717285s
Received healthy response to inference request in 4.203716516494751s
2026-04-07T20:26:53.072203+00:00 monitor updated for google-gemma-4-31b-it_v20
Received healthy response to inference request in 4.472652196884155s
Received healthy response to inference request in 4.29718017578125s
Received healthy response to inference request in 4.292773962020874s
Received healthy response to inference request in 4.171154260635376s
30 requests
0 failed requests
5th percentile: 4.081367325782776
10th percentile: 4.098817539215088
20th percentile: 4.141528987884522
30th percentile: 4.161994743347168
40th percentile: 4.190691614151001
50th percentile: 4.28182065486908
60th percentile: 4.308510208129883
70th percentile: 4.348222470283508
80th percentile: 4.5411834716796875
90th percentile: 11.58039412498474
95th percentile: 11.874125957489014
99th percentile: 12.165904672145844
mean time: 5.505507850646973
Pipeline stage StressChecker completed in 180.45s
Shutdown handler de-registered
google-gemma-4-31b-it_v20 status is now deployed due to DeploymentManager action
google-gemma-4-31b-it_v20 status is now inactive due to auto deactivation removed underperforming models