Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-d3a-mv1-plc-89556-v1-uploader
Waiting for job on chaiml-pony-d3a-mv1-plc-89556-v1-uploader to finish
2026-03-27T06:49:18.403949+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Using quantization_mode: fp8
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Checking if ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Downloading snapshot of ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8...
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Downloaded in 29.669s
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Loading /tmp/model_input...
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Applying quantization...
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:49:54.612742+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:49:56.989822+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:49:56.993106+0000 | moe_calibration_context | INFO - Found 40 MoE modules to replace
2026-03-27T06:50:18.525085+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:11.846855+0000 | moe_calibration_context | INFO - Replaced 40 MoE modules for calibration
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:11.847085+0000 | moe_calibration_context | INFO - 40/40 modules will remain in calibration form (permanent)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:11.847171+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:15.120469+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:15.121011+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:15.439736+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:43.811408+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: 2026-03-27T06:50:43.811657+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: warnings.warn(
2026-03-27T06:51:18.609412+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Pushing to ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8-FP8
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Checking if ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Creating repo ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: - model.safetensors: 37.7GB
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: ---------- 2026-03-27 06:51:37 (0:00:00) ----------
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Files: hashed 5/7 (32.5K/37.7G) | pre-uploaded: 0/0 (0.0/37.7G) (+7 unsure) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: ---------------------------------------------------
2026-03-27T06:52:18.696072+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
chaiml-pony-d3a-mv1-plc-89556-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: ---------- 2026-03-27 06:52:37 (0:01:00) ----------
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Files: hashed 7/7 (37.7G/37.7G) | pre-uploaded: 1/2 (20.0M/37.7G) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: ---------------------------------------------------
2026-03-27T06:53:18.794096+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Processed model ChaiML/pony-d3a-mv1-plc-q35b-lr5e6ep2g8 in 268.293s
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default/generation_config.json
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default/chat_template.jinja
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default/config.json
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default/tokenizer_config.json
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default/recipe.yaml
2026-03-27T06:54:18.888626+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
chaiml-pony-d3a-mv1-plc-89556-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-plc-89556-v1/default/model.safetensors
Job chaiml-pony-d3a-mv1-plc-89556-v1-uploader completed after 397.47s with status: succeeded
Stopping job with name chaiml-pony-d3a-mv1-plc-89556-v1-uploader
Pipeline stage VLLMUploader completed in 397.91s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.65s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-d3a-mv1-plc-89556-v1
Waiting for inference service chaiml-pony-d3a-mv1-plc-89556-v1 to be ready
2026-03-27T06:55:19.015917+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
2026-03-27T06:56:19.111164+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
2026-03-27T06:57:19.238833+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
Inference service chaiml-pony-d3a-mv1-plc-89556-v1 ready after 172.70419144630432s
Pipeline stage VLLMDeployer completed in 173.13s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-27T06:58:19.384740+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-27T06:59:19.482288+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.794384002685547s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.5728363990783691s
2026-03-27T07:00:19.575247+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.652704477310181s
Received healthy response to inference request in 1.2710506916046143s
Received healthy response to inference request in 1.2744255065917969s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.89831018447876s
Received healthy response to inference request in 1.2899231910705566s
Received healthy response to inference request in 1.3749194145202637s
Received healthy response to inference request in 1.3406288623809814s
2026-03-27T07:01:19.732565+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
Received healthy response to inference request in 5.737311363220215s
Received healthy response to inference request in 1.7344167232513428s
Received healthy response to inference request in 1.322338581085205s
Received healthy response to inference request in 1.2929191589355469s
Received healthy response to inference request in 1.3019287586212158s
Received healthy response to inference request in 1.6337270736694336s
Received healthy response to inference request in 1.4072341918945312s
Received healthy response to inference request in 1.2905514240264893s
Received healthy response to inference request in 1.4362707138061523s
Received healthy response to inference request in 1.35903000831604s
Received healthy response to inference request in 1.3331406116485596s
Received healthy response to inference request in 1.3148159980773926s
30 requests
9 failed requests
5th percentile: 1.2813994646072389
10th percentile: 1.290488600730896
20th percentile: 1.3122385501861573
30th percentile: 1.3383823871612548
40th percentile: 1.3943082809448242
50th percentile: 1.6032817363739014
60th percentile: 5.200067901611327
70th percentile: 10.094850182533223
80th percentile: 20.13875894546509
90th percentile: 20.165928101539613
95th percentile: 20.17970416545868
99th percentile: 20.448112258911134
mean time: 7.5805514574050905
%s, retrying in %s seconds...
Received healthy response to inference request in 1.200998306274414s
Received healthy response to inference request in 1.3912100791931152s
Received healthy response to inference request in 1.223726511001587s
Received healthy response to inference request in 1.2104666233062744s
Received healthy response to inference request in 1.4456346035003662s
Received healthy response to inference request in 1.2760207653045654s
Received healthy response to inference request in 1.2572431564331055s
Received healthy response to inference request in 1.3174099922180176s
Received healthy response to inference request in 1.32657790184021s
Received healthy response to inference request in 1.300443172454834s
Received healthy response to inference request in 1.4471967220306396s
Received healthy response to inference request in 1.5220730304718018s
Received healthy response to inference request in 1.3523635864257812s
Received healthy response to inference request in 1.3406074047088623s
Received healthy response to inference request in 1.7843527793884277s
Received healthy response to inference request in 1.255157470703125s
Received healthy response to inference request in 1.2845842838287354s
Received healthy response to inference request in 1.3394355773925781s
Received healthy response to inference request in 1.2792513370513916s
Received healthy response to inference request in 1.5114283561706543s
Received healthy response to inference request in 1.356476068496704s
Received healthy response to inference request in 1.424607753753662s
Received healthy response to inference request in 1.362083911895752s
Received healthy response to inference request in 1.7735075950622559s
Received healthy response to inference request in 1.6694262027740479s
2026-03-27T07:02:20.541658+00:00 monitor updated for chaiml-pony-d3a-mv1-plc_89556_v1
Received healthy response to inference request in 1.3478820323944092s
Received healthy response to inference request in 1.2988789081573486s
Received healthy response to inference request in 1.2824110984802246s
Received healthy response to inference request in 1.3370695114135742s
Received healthy response to inference request in 1.9625532627105713s
30 requests
0 failed requests
5th percentile: 1.216433572769165
10th percentile: 1.2520143747329713
20th percentile: 1.2786052227020264
30th percentile: 1.2945905208587647
40th percentile: 1.322910737991333
50th percentile: 1.3400214910507202
60th percentile: 1.3540085792541503
70th percentile: 1.4012293815612793
80th percentile: 1.4600430488586427
90th percentile: 1.6798343420028687
95th percentile: 1.7794724464416503
99th percentile: 1.9108751225471499
mean time: 1.3960359334945678
Pipeline stage StressChecker completed in 276.62s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.66s
Shutdown handler de-registered
chaiml-pony-d3a-mv1-plc_89556_v1 status is now deployed due to DeploymentManager action
chaiml-pony-d3a-mv1-plc_89556_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-d3a-mv1-plc_89556_v1 status is now torndown due to DeploymentManager action