Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v42-uploader
Waiting for job on google-gemma-4-31b-it-v42-uploader to finish
google-gemma-4-31b-it-v42-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v42-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v42-uploader: Downloaded in 32.005s
google-gemma-4-31b-it-v42-uploader: Processed model google/gemma-4-31B-it in 32.108s
google-gemma-4-31b-it-v42-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v42-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v42-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v42-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v42-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v42-uploader: if re.search("-\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v42-uploader: if re.search("\.\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
google-gemma-4-31b-it-v42-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
google-gemma-4-31b-it-v42-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
google-gemma-4-31b-it-v42-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
google-gemma-4-31b-it-v42-uploader: Bucket 's3://guanaco-vllm-models/' created
google-gemma-4-31b-it-v42-uploader: uploading /tmp/model_output to s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/.gitattributes s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/.gitattributes
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/tokenizer_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/tokenizer_config.json
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/config.json
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/README.md s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/README.md
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/chat_template.jinja s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/chat_template.jinja
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/processor_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/processor_config.json
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/model.safetensors.index.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/model.safetensors.index.json
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/generation_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/generation_config.json
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/tokenizer.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/tokenizer.json
2026-04-13T11:50:53.842995+00:00 monitor updated for google-gemma-4-31b-it_v42
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/model-00002-of-00002.safetensors
2026-04-13T11:51:53.929712+00:00 monitor updated for google-gemma-4-31b-it_v42
google-gemma-4-31b-it-v42-uploader: cp /tmp/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v42/default/model-00001-of-00002.safetensors
Job google-gemma-4-31b-it-v42-uploader completed after 124.49s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v42-uploader
Pipeline stage VLLMUploader completed in 124.96s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.09s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.93s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v42
Waiting for inference service google-gemma-4-31b-it-v42 to be ready
2026-04-13T11:52:54.467751+00:00 monitor updated for google-gemma-4-31b-it_v42
Failed to get request counts for guanaco-submitter. Falling back to default
2026-04-13T11:53:54.559621+00:00 monitor updated for google-gemma-4-31b-it_v42
2026-04-13T11:54:54.652762+00:00 monitor updated for google-gemma-4-31b-it_v42
Inference service google-gemma-4-31b-it-v42 ready after 230.59421467781067s
Pipeline stage VLLMDeployer completed in 231.01s
run pipeline stage %s
Running pipeline stage StressChecker
2026-04-13T11:55:54.736310+00:00 monitor updated for google-gemma-4-31b-it_v42
Received healthy response to inference request in 10.186779499053955s
Received healthy response to inference request in 10.909903764724731s
Received healthy response to inference request in 11.060816287994385s
Received healthy response to inference request in 2.4771623611450195s
Received healthy response to inference request in 2.2316315174102783s
Received healthy response to inference request in 2.5763607025146484s
Received healthy response to inference request in 2.2431845664978027s
Received healthy response to inference request in 2.4840283393859863s
Received healthy response to inference request in 2.4180901050567627s
Received healthy response to inference request in 2.2616190910339355s
Received healthy response to inference request in 2.257910966873169s
Received healthy response to inference request in 2.3658268451690674s
Received healthy response to inference request in 2.929633140563965s
Received healthy response to inference request in 2.629276752471924s
2026-04-13T11:56:54.870399+00:00 monitor updated for google-gemma-4-31b-it_v42
Received healthy response to inference request in 10.349008798599243s
Received healthy response to inference request in 2.717820405960083s
Received healthy response to inference request in 2.3232524394989014s
Received healthy response to inference request in 2.2954981327056885s
Received healthy response to inference request in 2.340841054916382s
Received healthy response to inference request in 2.288106679916382s
Received healthy response to inference request in 2.3328020572662354s
Received healthy response to inference request in 2.651493787765503s
Received healthy response to inference request in 2.2564585208892822s
Received healthy response to inference request in 2.3901002407073975s
Received healthy response to inference request in 2.2723779678344727s
Received healthy response to inference request in 2.3439576625823975s
Received healthy response to inference request in 2.9841620922088623s
Received healthy response to inference request in 2.6128170490264893s
Received healthy response to inference request in 2.7682342529296875s
Received healthy response to inference request in 2.2856712341308594s
30 requests
0 failed requests
5th percentile: 2.2491578459739685
10th percentile: 2.2577657222747805
20th percentile: 2.283012580871582
30th percentile: 2.3149261474609375
40th percentile: 2.342711019515991
50th percentile: 2.40409517288208
60th percentile: 2.520961284637451
70th percentile: 2.6359418630599976
80th percentile: 2.8005140304565432
90th percentile: 10.203002429008484
95th percentile: 10.65750102996826
99th percentile: 11.017051656246185
mean time: 3.5414942105611167
Pipeline stage StressChecker completed in 109.29s
Shutdown handler de-registered
google-gemma-4-31b-it_v42 status is now deployed due to DeploymentManager action
google-gemma-4-31b-it_v42 status is now inactive due to auto deactivation removed underperforming models