Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-d3a-mv1-son-59529-v2-uploader
Waiting for job on chaiml-pony-d3a-mv1-son-59529-v2-uploader to finish
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Using quantization_mode: fp8
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Checking if ChaiML/pony-d3a-mv1-sonnetwintop2-q27b-lr1e5ep1g4-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Downloading snapshot of ChaiML/pony-d3a-mv1-sonnetwintop2-q27b-lr1e5ep1g4...
2026-03-28T14:35:11.877483+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Downloaded in 20.976s
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Loading /tmp/model_input...
chaiml-pony-d3a-mv1-son-59529-v2-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Applying quantization...
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:15.769701+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:18.391508+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:18.400004+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:18.470455+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:18.470812+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:18.485943+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:26.164834+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-d3a-mv1-son-59529-v2-uploader: 2026-03-28T14:35:26.165007+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Saving to /dev/shm/model_output...
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-d3a-mv1-son-59529-v2-uploader: warnings.warn(
2026-03-28T14:36:11.970391+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Pushing to ChaiML/pony-d3a-mv1-sonnetwintop2-q27b-lr1e5ep1g4-FP8
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Checking if ChaiML/pony-d3a-mv1-sonnetwintop2-q27b-lr1e5ep1g4-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-son-59529-v2-uploader: ChaiML/pony-d3a-mv1-sonnetwintop2-q27b-lr1e5ep1g4-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-son-59529-v2-uploader: Processed model ChaiML/pony-d3a-mv1-sonnetwintop2-q27b-lr1e5ep1g4 in 81.691s
chaiml-pony-d3a-mv1-son-59529-v2-uploader: creating bucket guanaco-vllm-models
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-d3a-mv1-son-59529-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-d3a-mv1-son-59529-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
2026-03-28T14:37:12.778923+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
chaiml-pony-d3a-mv1-son-59529-v2-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-59529-v2/default/model.safetensors
Job chaiml-pony-d3a-mv1-son-59529-v2-uploader completed after 193.94s with status: succeeded
Stopping job with name chaiml-pony-d3a-mv1-son-59529-v2-uploader
Pipeline stage VLLMUploader completed in 194.42s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.13s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.30s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-d3a-mv1-son-59529-v2
Waiting for inference service chaiml-pony-d3a-mv1-son-59529-v2 to be ready
2026-03-28T14:38:12.886341+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
2026-03-28T14:39:12.983697+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
2026-03-28T14:40:13.147083+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
Inference service chaiml-pony-d3a-mv1-son-59529-v2 ready after 190.2583191394806s
Pipeline stage VLLMDeployer completed in 190.66s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T14:41:13.249327+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 11.825740575790405s
2026-03-28T14:42:13.343341+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.16800332069397s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9940004348754883s
2026-03-28T14:43:13.439445+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9004640579223633s
Received healthy response to inference request in 1.8884458541870117s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.225031852722168s
Received healthy response to inference request in 2.0075197219848633s
Received healthy response to inference request in 1.9262144565582275s
Received healthy response to inference request in 2.311215400695801s
Received healthy response to inference request in 4.2379467487335205s
Received healthy response to inference request in 2.4232752323150635s
Received healthy response to inference request in 12.664094686508179s
Received healthy response to inference request in 2.173100709915161s
2026-03-28T14:44:13.530691+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
Received healthy response to inference request in 2.4608747959136963s
Received healthy response to inference request in 1.9942295551300049s
Received healthy response to inference request in 1.982551097869873s
Received healthy response to inference request in 2.283492088317871s
Received healthy response to inference request in 2.002183675765991s
Received healthy response to inference request in 2.066560983657837s
Received healthy response to inference request in 2.092395305633545s
Received healthy response to inference request in 2.011995553970337s
Received healthy response to inference request in 2.123694658279419s
30 requests
8 failed requests
5th percentile: 1.9120517373085022
10th percentile: 1.9769174337387085
20th percentile: 2.000592851638794
30th percentile: 2.050191354751587
40th percentile: 2.1533382892608643
50th percentile: 2.297353744506836
60th percentile: 3.143726205825803
70th percentile: 12.077246809005734
80th percentile: 20.114438009262084
90th percentile: 20.127545952796936
95th percentile: 20.141544568538666
99th percentile: 20.151771552562714
mean time: 7.72639570236206
%s, retrying in %s seconds...
Received healthy response to inference request in 1.8022537231445312s
Received healthy response to inference request in 1.777672290802002s
Received healthy response to inference request in 2.0727765560150146s
Received healthy response to inference request in 2.0058634281158447s
Received healthy response to inference request in 1.8098526000976562s
Received healthy response to inference request in 2.178321361541748s
Received healthy response to inference request in 2.2293803691864014s
Received healthy response to inference request in 3.363637685775757s
Received healthy response to inference request in 1.8686678409576416s
Received healthy response to inference request in 2.019277572631836s
Received healthy response to inference request in 1.962951898574829s
Received healthy response to inference request in 1.9092609882354736s
Received healthy response to inference request in 2.14849591255188s
Received healthy response to inference request in 1.825676441192627s
Received healthy response to inference request in 2.1114439964294434s
Received healthy response to inference request in 1.91969895362854s
Received healthy response to inference request in 2.0367038249969482s
Received healthy response to inference request in 2.691946506500244s
2026-03-28T14:45:13.629046+00:00 monitor updated for chaiml-pony-d3a-mv1-son_59529_v2
Received healthy response to inference request in 2.031607151031494s
Received healthy response to inference request in 2.03377103805542s
Received healthy response to inference request in 1.9419364929199219s
Received healthy response to inference request in 2.0984976291656494s
Received healthy response to inference request in 2.0171148777008057s
Received healthy response to inference request in 2.0756657123565674s
Received healthy response to inference request in 1.9834473133087158s
Received healthy response to inference request in 2.0039963722229004s
Received healthy response to inference request in 2.115147352218628s
Received healthy response to inference request in 2.002525806427002s
Received healthy response to inference request in 1.9763476848602295s
Received healthy response to inference request in 2.1156747341156006s
30 requests
0 failed requests
5th percentile: 1.8056732177734376
10th percentile: 1.8240940570831299
20th percentile: 1.9176113605499268
30th percentile: 1.9723289489746094
40th percentile: 2.003408145904541
50th percentile: 2.018196225166321
60th percentile: 2.0349441528320313
70th percentile: 2.082515287399292
80th percentile: 2.1152528285980225
90th percentile: 2.1834272623062136
95th percentile: 2.4837917447090136
99th percentile: 3.1688472437858586
mean time: 2.0709871371587116
Pipeline stage StressChecker completed in 300.00s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.84s
Shutdown handler de-registered
chaiml-pony-d3a-mv1-son_59529_v2 status is now deployed due to DeploymentManager action
chaiml-pony-d3a-mv1-son_59529_v2 status is now inactive due to admin request
chaiml-pony-d3a-mv1-son_59529_v2 status is now torndown due to DeploymentManager action