Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v17-uploader
Waiting for job on google-gemma-4-31b-it-v17-uploader to finish
google-gemma-4-31b-it-v17-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v17-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v17-uploader: Downloaded in 33.368s
2026-04-07T19:13:08.846014+00:00 monitor updated for google-gemma-4-31b-it_v17
google-gemma-4-31b-it-v17-uploader: Processed model google/gemma-4-31B-it in 55.463s
google-gemma-4-31b-it-v17-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v17-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v17-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v17-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v17-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v17-uploader: if re.search("-\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v17-uploader: if re.search("\.\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
google-gemma-4-31b-it-v17-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
google-gemma-4-31b-it-v17-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
google-gemma-4-31b-it-v17-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
google-gemma-4-31b-it-v17-uploader: Bucket 's3://guanaco-vllm-models/' created
google-gemma-4-31b-it-v17-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/.gitattributes
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/tokenizer_config.json
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/chat_template.jinja
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/generation_config.json
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/model.safetensors.index.json
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/config.json
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/processor_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/processor_config.json
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/README.md
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/tokenizer.json
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/model-00002-of-00002.safetensors
2026-04-07T19:14:09.040998+00:00 monitor updated for google-gemma-4-31b-it_v17
2026-04-07T19:15:09.258800+00:00 monitor updated for google-gemma-4-31b-it_v17
2026-04-07T19:16:09.485797+00:00 monitor updated for google-gemma-4-31b-it_v17
2026-04-07T19:17:09.700964+00:00 monitor updated for google-gemma-4-31b-it_v17
google-gemma-4-31b-it-v17-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v17/default/model-00001-of-00002.safetensors
Job google-gemma-4-31b-it-v17-uploader completed after 345.71s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v17-uploader
Pipeline stage VLLMUploader completed in 346.87s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.22s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.40s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v17
Waiting for inference service google-gemma-4-31b-it-v17 to be ready
2026-04-07T19:18:09.911447+00:00 monitor updated for google-gemma-4-31b-it_v17
2026-04-07T19:19:10.137737+00:00 monitor updated for google-gemma-4-31b-it_v17
2026-04-07T19:20:10.345118+00:00 monitor updated for google-gemma-4-31b-it_v17
2026-04-07T19:21:10.569919+00:00 monitor updated for google-gemma-4-31b-it_v17
Inference service google-gemma-4-31b-it-v17 ready after 242.63806176185608s
Pipeline stage VLLMDeployer completed in 244.10s
run pipeline stage %s
Running pipeline stage StressChecker
2026-04-07T19:22:10.784751+00:00 monitor updated for google-gemma-4-31b-it_v17
Received healthy response to inference request in 10.994353771209717s
Received healthy response to inference request in 5.363650321960449s
Received healthy response to inference request in 4.103855609893799s
Received healthy response to inference request in 4.734524488449097s
Received healthy response to inference request in 4.173731088638306s
Received healthy response to inference request in 4.1946868896484375s
Received healthy response to inference request in 4.071425437927246s
Received healthy response to inference request in 4.514293432235718s
Received healthy response to inference request in 4.5227210521698s
Received healthy response to inference request in 4.289250373840332s
Received healthy response to inference request in 4.228727579116821s
Received healthy response to inference request in 4.150444507598877s
Received healthy response to inference request in 4.184152364730835s
2026-04-07T19:23:11.077476+00:00 monitor updated for google-gemma-4-31b-it_v17
Received healthy response to inference request in 4.452723264694214s
Received healthy response to inference request in 5.222696542739868s
Received healthy response to inference request in 4.137506484985352s
Received healthy response to inference request in 4.532751798629761s
Received healthy response to inference request in 4.803702116012573s
Received healthy response to inference request in 4.257944583892822s
Received healthy response to inference request in 11.720235824584961s
Received healthy response to inference request in 11.951350927352905s
Received healthy response to inference request in 4.592493295669556s
Received healthy response to inference request in 4.291664123535156s
2026-04-07T19:24:11.285531+00:00 monitor updated for google-gemma-4-31b-it_v17
Received healthy response to inference request in 4.1503965854644775s
Received healthy response to inference request in 4.838659286499023s
Received healthy response to inference request in 4.2152018547058105s
Received healthy response to inference request in 4.441236734390259s
Received healthy response to inference request in 4.119574785232544s
Received healthy response to inference request in 10.75917911529541s
Received healthy response to inference request in 4.387028932571411s
30 requests
0 failed requests
5th percentile: 4.110929238796234
10th percentile: 4.1357133150100704
20th percentile: 4.16907377243042
30th percentile: 4.2090473651885985
40th percentile: 4.2767280578613285
50th percentile: 4.414132833480835
60th percentile: 4.51766448020935
70th percentile: 4.635102653503417
80th percentile: 4.9154667377471934
90th percentile: 10.78269658088684
95th percentile: 11.3935889005661
99th percentile: 11.884327547550201
mean time: 5.346672105789184
Pipeline stage StressChecker completed in 166.83s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.21s
Shutdown handler de-registered
google-gemma-4-31b-it_v17 status is now deployed due to DeploymentManager action
google-gemma-4-31b-it_v17 status is now inactive due to admin request