Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-llama-8b-202503-16869-v45-uploader
Waiting for job on chaiml-llama-8b-202503-16869-v45-uploader to finish
chaiml-llama-8b-202503-16869-v45-uploader: Using quantization_mode: none
chaiml-llama-8b-202503-16869-v45-uploader: Downloading snapshot of ChaiML/llama_8b_202503_1m_nemo_safety...
chaiml-llama-8b-202503-16869-v45-uploader: Downloaded in 7.494s
chaiml-llama-8b-202503-16869-v45-uploader: Processed model ChaiML/llama_8b_202503_1m_nemo_safety in 13.341s
chaiml-llama-8b-202503-16869-v45-uploader: creating bucket guanaco-vllm-models
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-llama-8b-202503-16869-v45-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-llama-8b-202503-16869-v45-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-llama-8b-202503-16869-v45-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-llama-8b-202503-16869-v45-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-llama-8b-202503-16869-v45-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-llama-8b-202503-16869-v45-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-llama-8b-202503-16869-v45-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-llama-8b-202503-16869-v45-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-llama-8b-202503-16869-v45-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-llama-8b-202503-16869-v45-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-llama-8b-202503-16869-v45-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/README.md
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/special_tokens_map.json
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/.gitattributes
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/config.json
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/model.safetensors.index.json
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/tokenizer_config.json
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/tokenizer.json
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/model-00004-of-00004.safetensors s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/model-00004-of-00004.safetensors
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/model-00002-of-00004.safetensors s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/model-00002-of-00004.safetensors
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/model-00001-of-00004.safetensors s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/model-00001-of-00004.safetensors
chaiml-llama-8b-202503-16869-v45-uploader: cp /dev/shm/model_output/model-00003-of-00004.safetensors s3://guanaco-vllm-models/chaiml-llama-8b-202503-16869-v45/default/model-00003-of-00004.safetensors
Job chaiml-llama-8b-202503-16869-v45-uploader completed after 41.89s with status: succeeded
Stopping job with name chaiml-llama-8b-202503-16869-v45-uploader
Pipeline stage VLLMUploader completed in 42.34s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.86s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-llama-8b-202503-16869-v45
Waiting for inference service chaiml-llama-8b-202503-16869-v45 to be ready
2026-03-22T18:42:51.673350+00:00 monitor updated for chaiml-llama-8b-202503_16869_v45
2026-03-22T18:43:51.758168+00:00 monitor updated for chaiml-llama-8b-202503_16869_v45
2026-03-22T18:44:51.841739+00:00 monitor updated for chaiml-llama-8b-202503_16869_v45
Inference service chaiml-llama-8b-202503-16869-v45 ready after 160.3955066204071s
Pipeline stage VLLMDeployer completed in 160.91s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 6.710764646530151s
Received healthy response to inference request in 3.981919527053833s
Received healthy response to inference request in 2.65944504737854s
Received healthy response to inference request in 8.2935950756073s
Received healthy response to inference request in 2.5807559490203857s
5 requests
0 failed requests
5th percentile: 2.5964937686920164
10th percentile: 2.6122315883636475
20th percentile: 2.6437072277069094
30th percentile: 2.9239399433135986
40th percentile: 3.452929735183716
50th percentile: 3.981919527053833
60th percentile: 5.07345757484436
70th percentile: 6.164995622634887
80th percentile: 7.027330732345582
90th percentile: 7.66046290397644
95th percentile: 7.97702898979187
99th percentile: 8.230281858444213
mean time: 4.845296049118042
%s, retrying in %s seconds...
Received healthy response to inference request in 2.5761196613311768s
Received healthy response to inference request in 3.0366196632385254s
Received healthy response to inference request in 4.487387418746948s
2026-03-22T18:45:51.936432+00:00 monitor updated for chaiml-llama-8b-202503_16869_v45
Received healthy response to inference request in 3.9130053520202637s
Received healthy response to inference request in 4.2028632164001465s
5 requests
0 failed requests
5th percentile: 2.6682196617126466
10th percentile: 2.7603196620941164
20th percentile: 2.9445196628570556
30th percentile: 3.211896800994873
40th percentile: 3.5624510765075685
50th percentile: 3.9130053520202637
60th percentile: 4.028948497772217
70th percentile: 4.14489164352417
80th percentile: 4.259768056869507
90th percentile: 4.373577737808228
95th percentile: 4.430482578277588
99th percentile: 4.476006450653077
mean time: 3.643199062347412
Pipeline stage StressChecker completed in 44.46s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.63s
Shutdown handler de-registered
chaiml-llama-8b-202503_16869_v45 status is now deployed due to DeploymentManager action
chaiml-llama-8b-202503_16869_v45 status is now inactive due to auto deactivation removed underperforming models
chaiml-llama-8b-202503_16869_v45 status is now torndown due to DeploymentManager action