Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name google-gemma-4-31b-it-v16-uploader
Waiting for job on google-gemma-4-31b-it-v16-uploader to finish
google-gemma-4-31b-it-v16-uploader: Using quantization_mode: none
google-gemma-4-31b-it-v16-uploader: Downloading snapshot of google/gemma-4-31B-it...
google-gemma-4-31b-it-v16-uploader: Downloaded in 30.824s
2026-04-07T18:06:46.602445+00:00 monitor updated for google-gemma-4-31b-it_v16
google-gemma-4-31b-it-v16-uploader: Processed model google/gemma-4-31B-it in 53.371s
google-gemma-4-31b-it-v16-uploader: creating bucket guanaco-vllm-models
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v16-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
google-gemma-4-31b-it-v16-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v16-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v16-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v16-uploader: if re.search("-\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
google-gemma-4-31b-it-v16-uploader: if re.search("\.\.", bucket, re.UNICODE):
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
google-gemma-4-31b-it-v16-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
google-gemma-4-31b-it-v16-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
google-gemma-4-31b-it-v16-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
google-gemma-4-31b-it-v16-uploader: Bucket 's3://guanaco-vllm-models/' created
google-gemma-4-31b-it-v16-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/.gitattributes
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/processor_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/processor_config.json
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/tokenizer_config.json
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/chat_template.jinja
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/generation_config.json
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/model.safetensors.index.json
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/config.json
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/README.md
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/tokenizer.json
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/model-00002-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/model-00002-of-00002.safetensors
2026-04-07T18:07:46.778755+00:00 monitor updated for google-gemma-4-31b-it_v16
google-gemma-4-31b-it-v16-uploader: cp /dev/shm/model_output/model-00001-of-00002.safetensors s3://guanaco-vllm-models/google-gemma-4-31b-it-v16/default/model-00001-of-00002.safetensors
Job google-gemma-4-31b-it-v16-uploader completed after 148.1s with status: succeeded
Stopping job with name google-gemma-4-31b-it-v16-uploader
Pipeline stage VLLMUploader completed in 149.23s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.19s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.16s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service google-gemma-4-31b-it-v16
Waiting for inference service google-gemma-4-31b-it-v16 to be ready
2026-04-07T18:08:46.963204+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:09:47.135564+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:10:47.382647+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:11:47.591092+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:12:47.765343+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:13:48.227427+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:14:48.401821+00:00 monitor updated for google-gemma-4-31b-it_v16
2026-04-07T18:15:48.580547+00:00 monitor updated for google-gemma-4-31b-it_v16
Inference service google-gemma-4-31b-it-v16 ready after 465.31568789482117s
Pipeline stage VLLMDeployer completed in 466.38s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 10.897209882736206s
Received healthy response to inference request in 2.6444926261901855s
Received healthy response to inference request in 2.7724530696868896s
Received healthy response to inference request in 2.6417722702026367s
Received healthy response to inference request in 2.7443509101867676s
Received healthy response to inference request in 2.673314332962036s
Received healthy response to inference request in 2.5974645614624023s
Received healthy response to inference request in 2.622103452682495s
Received healthy response to inference request in 2.601086378097534s
Received healthy response to inference request in 2.8991611003875732s
Received healthy response to inference request in 2.5712575912475586s
Received healthy response to inference request in 2.7465932369232178s
2026-04-07T18:16:48.753373+00:00 monitor updated for google-gemma-4-31b-it_v16
Received healthy response to inference request in 2.6615521907806396s
Received healthy response to inference request in 2.691676616668701s
Received healthy response to inference request in 2.604135036468506s
Received healthy response to inference request in 2.791003942489624s
Received healthy response to inference request in 2.75242018699646s
Received healthy response to inference request in 2.6411499977111816s
Received healthy response to inference request in 2.9892423152923584s
Received healthy response to inference request in 2.691742420196533s
Received healthy response to inference request in 3.966712236404419s
Received healthy response to inference request in 2.633887529373169s
Received healthy response to inference request in 2.716907024383545s
Received healthy response to inference request in 2.5970940589904785s
Received healthy response to inference request in 2.5855205059051514s
Received healthy response to inference request in 2.5979247093200684s
Received healthy response to inference request in 2.6102559566497803s
Received healthy response to inference request in 3.0428433418273926s
Received healthy response to inference request in 2.7606170177459717s
Received healthy response to inference request in 2.7460360527038574s
30 requests
0 failed requests
5th percentile: 2.5907286047935485
10th percentile: 2.59742751121521
20th percentile: 2.6035253047943114
30th percentile: 2.6303523063659666
40th percentile: 2.643404483795166
50th percentile: 2.6824954748153687
60th percentile: 2.727884578704834
70th percentile: 2.7483413219451904
80th percentile: 2.7761632442474364
90th percentile: 2.994602417945862
95th percentile: 3.5509712338447543
99th percentile: 8.887365565299994
mean time: 3.016399351755778
Pipeline stage StressChecker completed in 95.71s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.20s
Shutdown handler de-registered
google-gemma-4-31b-it_v16 status is now deployed due to DeploymentManager action
google-gemma-4-31b-it_v16 status is now inactive due to auto deactivation removed underperforming models