Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v26-uploader
Waiting for job on google-gemma-4-31b-it-v26-uploader to finish
google-gemma-4-31b-it-v26-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v26-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v26-uploader: Downloaded in 33.808s
2026-04-08T02:18:34.625167+00:00 monitor updated for google-gemma-4-31b-it_v26
google-gemma-4-31b-it-v26-uploader: Processed model google/gemma-4-31B-it in 56.009s
google-gemma-4-31b-it-v26-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v26-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v26-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v26-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v26-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v26-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v26/default/model-00002-of-00002.safetensors
2026-04-08T02:19:34.812070+00:00 monitor updated for google-gemma-4-31b-it_v26
google-gemma-4-31b-it-v26-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v26/default/model-00001-of-00002.safetensors
Job google-gemma-4-31b-it-v26-uploader completed after 147.35s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v26-uploader
Pipeline stage VLLMUploader completed in 148.59s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.19s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.41s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v26
Waiting for inference service google-gemma-4-31b-it-v26 to be ready
2026-04-08T02:20:34.994113+00:00 monitor updated for google-gemma-4-31b-it_v26
2026-04-08T02:21:35.188734+00:00 monitor updated for google-gemma-4-31b-it_v26
2026-04-08T02:22:35.393547+00:00 monitor updated for google-gemma-4-31b-it_v26
2026-04-08T02:23:35.591080+00:00 monitor updated for google-gemma-4-31b-it_v26
Inference service google-gemma-4-31b-it-v26 ready after 222.93941187858582s
Pipeline stage VLLMDeployer completed in 223.99s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 11.330532789230347s
Received healthy response to inference request in 11.92457389831543s
Received healthy response to inference request in 10.8860342502594s
Received healthy response to inference request in 5.651732921600342s
2026-04-08T02:24:35.786298+00:00 monitor updated for google-gemma-4-31b-it_v26
Received healthy response to inference request in 5.543116331100464s
Received healthy response to inference request in 4.321292400360107s
Received healthy response to inference request in 4.323812484741211s
Received healthy response to inference request in 11.891998291015625s
Received healthy response to inference request in 4.580567836761475s
Received healthy response to inference request in 4.35443902015686s
Received healthy response to inference request in 14.987995624542236s
Received healthy response to inference request in 4.418408393859863s
Received healthy response to inference request in 4.285848617553711s
Received healthy response to inference request in 4.421539068222046s
2026-04-08T02:25:36.017825+00:00 monitor updated for google-gemma-4-31b-it_v26
Received healthy response to inference request in 4.165823936462402s
Received healthy response to inference request in 4.4477198123931885s
Received healthy response to inference request in 4.31400465965271s
Received healthy response to inference request in 4.491267204284668s
Received healthy response to inference request in 4.287995338439941s
Received healthy response to inference request in 4.179605960845947s
Received healthy response to inference request in 4.19357967376709s
Received healthy response to inference request in 4.306352853775024s
Received healthy response to inference request in 4.2554237842559814s
Received healthy response to inference request in 4.407923698425293s
Received healthy response to inference request in 4.211745500564575s
2026-04-08T02:26:36.211814+00:00 monitor updated for google-gemma-4-31b-it_v26
Received healthy response to inference request in 4.321052312850952s
Received healthy response to inference request in 4.3205342292785645s
Received healthy response to inference request in 4.2985289096832275s
Received healthy response to inference request in 4.366399765014648s
Received healthy response to inference request in 4.299811601638794s
30 requests
0 failed requests
5th percentile: 4.185894131660461
10th percentile: 4.2099289178848265
20th percentile: 4.287565994262695
30th percentile: 4.304390478134155
40th percentile: 4.320845079421997
50th percentile: 4.339125752449036
60th percentile: 4.412117576599121
70th percentile: 4.4607840299606325
80th percentile: 5.56483964920044
90th percentile: 11.386679339408875
95th percentile: 11.909914875030518
99th percentile: 14.099603323936465
mean time: 5.726322038968404
Pipeline stage StressChecker completed in 187.88s
Shutdown handler de-registered
google-gemma-4-31b-it_v26 status is now deployed due to DeploymentManager action