Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-d3a-mv1-plc-89556-v4-uploader
Waiting for job on chaiml-pony-d3a-mv1-plc-89556-v4-uploader to finish
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Using quantization_mode: fp8
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Checking if ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Downloading snapshot of ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8-FP8...
2026-03-28T06:30:26.024326+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Downloaded in 27.193s
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Processed model ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8 in 29.772s
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: creating bucket guanaco-vllm-models
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/.gitattributes
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/generation_config.json
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/chat_template.jinja
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/tokenizer_config.json
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/recipe.yaml
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/config.json
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/tokenizer.json
2026-03-28T06:31:26.119492+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
chaiml-pony-d3a-mv1-plc-89556-v4-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v4/default/model.safetensors
Job chaiml-pony-d3a-mv1-plc-89556-v4-uploader completed after 143.53s with status: succeeded
Stopping job with name chaiml-pony-d3a-mv1-plc-89556-v4-uploader
Pipeline stage VLLMUploader completed in 144.16s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 8.19s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-d3a-mv1-plc-89556-v4
Waiting for inference service chaiml-pony-d3a-mv1-plc-89556-v4 to be ready
2026-03-28T06:32:26.234019+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
2026-03-28T06:33:26.353397+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
Failed to get request counts for guanaco-submitter. Falling back to default
2026-03-28T06:34:26.487563+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
Inference service chaiml-pony-d3a-mv1-plc-89556-v4 ready after 200.48051404953003s
Pipeline stage VLLMDeployer completed in 201.25s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-28T06:35:26.822501+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T06:36:26.937362+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.4187798500061035s
Received healthy response to inference request in 7.808798313140869s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.051210641860962s
2026-03-28T06:37:27.085074+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
{"detail":"('http://chaiml-pony-d3a-mv1-plc-89556-v4-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination')"}
Received unhealthy response to inference request!
Received healthy response to inference request in 1.6955502033233643s
Received healthy response to inference request in 2.2289915084838867s
Received healthy response to inference request in 1.6292996406555176s
Received healthy response to inference request in 2.25685977935791s
Received healthy response to inference request in 1.7087931632995605s
Received healthy response to inference request in 1.6505193710327148s
Received healthy response to inference request in 4.573709011077881s
Received healthy response to inference request in 1.6511156558990479s
Received healthy response to inference request in 1.8474304676055908s
Received healthy response to inference request in 2.1386067867279053s
Received healthy response to inference request in 1.77632474899292s
Received healthy response to inference request in 1.8068468570709229s
Received healthy response to inference request in 2.1207001209259033s
Received healthy response to inference request in 1.645270586013794s
Received healthy response to inference request in 2.2391297817230225s
Received healthy response to inference request in 1.961775302886963s
Received healthy response to inference request in 2.1171157360076904s
Received healthy response to inference request in 1.664703130722046s
Received healthy response to inference request in 1.9855809211730957s
Received healthy response to inference request in 1.6414494514465332s
Received healthy response to inference request in 1.667372226715088s
30 requests
6 failed requests
5th percentile: 1.6431689620018006
10th percentile: 1.6499944925308228
20th percentile: 1.6668384075164795
30th percentile: 1.756065273284912
40th percentile: 1.9160373687744143
50th percentile: 2.118907928466797
60th percentile: 2.233046817779541
70th percentile: 3.608509087562559
80th percentile: 10.11721549034122
90th percentile: 20.13483154773712
95th percentile: 20.140474605560303
99th percentile: 20.461311492919922
mean time: 5.925326236089071
%s, retrying in %s seconds...
Received healthy response to inference request in 1.66573166847229s
Received healthy response to inference request in 1.6034445762634277s
Received healthy response to inference request in 1.704448938369751s
2026-03-28T06:38:27.175333+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v4
Received healthy response to inference request in 1.5909655094146729s
Received healthy response to inference request in 1.605543613433838s
Received healthy response to inference request in 1.7448465824127197s
Received healthy response to inference request in 1.7878963947296143s
Received healthy response to inference request in 1.8975777626037598s
Received healthy response to inference request in 1.7126164436340332s
Received healthy response to inference request in 1.7201728820800781s
Received healthy response to inference request in 2.1299421787261963s
Received healthy response to inference request in 1.6993780136108398s
Received healthy response to inference request in 1.659578561782837s
Received healthy response to inference request in 2.0008397102355957s
Received healthy response to inference request in 1.655245065689087s
Received healthy response to inference request in 1.7442567348480225s
Received healthy response to inference request in 1.6412270069122314s
Received healthy response to inference request in 1.9185888767242432s
Received healthy response to inference request in 1.7612581253051758s
Received healthy response to inference request in 1.6412353515625s
Received healthy response to inference request in 1.7373626232147217s
Received healthy response to inference request in 1.7697453498840332s
Received healthy response to inference request in 2.0066635608673096s
Received healthy response to inference request in 1.6546261310577393s
Received healthy response to inference request in 1.6618947982788086s
Received healthy response to inference request in 1.984579086303711s
Received healthy response to inference request in 1.6463775634765625s
Received healthy response to inference request in 1.6415259838104248s
Received healthy response to inference request in 1.701155185699463s
Received healthy response to inference request in 2.080763101577759s
30 requests
0 failed requests
5th percentile: 1.6043891429901123
10th percentile: 1.637658667564392
20th percentile: 1.645407247543335
30th percentile: 1.658278512954712
40th percentile: 1.6859194755554199
50th percentile: 1.708532691001892
60th percentile: 1.740120267868042
70th percentile: 1.7638042926788329
80th percentile: 1.9017799854278565
90th percentile: 2.001422095298767
95th percentile: 2.0474183082580564
99th percentile: 2.1156802463531497
mean time: 1.7589829126993815
Pipeline stage StressChecker completed in 236.10s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.57s
Shutdown handler de-registered
chaiml-pony-d3a-mv1-plc_89556_v4 status is now deployed due to DeploymentManager action
chaiml-pony-d3a-mv1-plc_89556_v4 status is now inactive due to admin request