Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-d3a-mv1-son-75599-v1-uploader
Waiting for job on chaiml-pony-d3a-mv1-son-75599-v1-uploader to finish
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Using quantization_mode: fp8
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Checking if ChaiML/pony-d3a-mv1-sonnetwintop2-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Downloading snapshot of ChaiML/pony-d3a-mv1-sonnetwintop2-q35b-lr5e6ep2g8...
2026-03-27T06:49:29.470510+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Downloaded in 24.418s
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Loading /tmp/model_input...
chaiml-pony-d3a-mv1-son-75599-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Applying quantization...
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:49:47.653162+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:49:49.913281+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:49:49.915335+0000 | moe_calibration_context | INFO - Found 40 MoE modules to replace
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:04.788104+0000 | moe_calibration_context | INFO - Replaced 40 MoE modules for calibration
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:04.788303+0000 | moe_calibration_context | INFO - 40/40 modules will remain in calibration form (permanent)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:04.788398+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:07.979684+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:07.980247+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:08.275716+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
2026-03-27T06:50:29.559848+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:36.951867+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-d3a-mv1-son-75599-v1-uploader: 2026-03-27T06:50:36.952134+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: warnings.warn(
2026-03-27T06:51:29.676286+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Pushing to ChaiML/pony-d3a-mv1-sonnetwintop2-q35b-lr5e6ep2g8-FP8
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Checking if ChaiML/pony-d3a-mv1-sonnetwintop2-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Creating repo ChaiML/pony-d3a-mv1-sonnetwintop2-q35b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-d3a-mv1-son-75599-v1-uploader: - model.safetensors: 37.7GB
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-d3a-mv1-son-75599-v1-uploader: ---------- 2026-03-27 06:51:31 (0:00:00) ----------
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Files: hashed 5/7 (32.5K/37.7G) | pre-uploaded: 0/0 (0.0/37.7G) (+7 unsure) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-d3a-mv1-son-75599-v1-uploader: ---------------------------------------------------
2026-03-27T06:52:29.762478+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
chaiml-pony-d3a-mv1-son-75599-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-pony-d3a-mv1-son-75599-v1-uploader: ---------- 2026-03-27 06:52:31 (0:01:00) ----------
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Files: hashed 7/7 (37.7G/37.7G) | pre-uploaded: 1/2 (20.0M/37.7G) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-d3a-mv1-son-75599-v1-uploader: ---------------------------------------------------
2026-03-27T06:53:29.864093+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Processed model ChaiML/pony-d3a-mv1-sonnetwintop2-q35b-lr5e6ep2g8 in 264.100s
chaiml-pony-d3a-mv1-son-75599-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-d3a-mv1-son-75599-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-d3a-mv1-son-75599-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-d3a-mv1-son-75599-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/chat_template.jinja
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/generation_config.json
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/tokenizer_config.json
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/recipe.yaml
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/config.json
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/tokenizer.json
2026-03-27T06:54:29.966251+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
chaiml-pony-d3a-mv1-son-75599-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-d3a-mv1-son-75599-v1/default/model.safetensors
Job chaiml-pony-d3a-mv1-son-75599-v1-uploader completed after 376.69s with status: succeeded
Stopping job with name chaiml-pony-d3a-mv1-son-75599-v1-uploader
Pipeline stage VLLMUploader completed in 377.14s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.75s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-d3a-mv1-son-75599-v1
Waiting for inference service chaiml-pony-d3a-mv1-son-75599-v1 to be ready
2026-03-27T06:55:30.058721+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
2026-03-27T06:56:37.867613+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
2026-03-27T06:57:37.961931+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
2026-03-27T06:58:38.054744+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
Inference service chaiml-pony-d3a-mv1-son-75599-v1 ready after 250.48547649383545s
Pipeline stage VLLMDeployer completed in 250.93s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-27T06:59:38.151433+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 12.6013023853302s
2026-03-27T07:00:38.244732+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 8.946395635604858s
Received healthy response to inference request in 3.8990254402160645s
Received healthy response to inference request in 5.8823323249816895s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.05070424079895s
2026-03-27T07:01:38.365851+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.789217472076416s
Received healthy response to inference request in 1.2479972839355469s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.6082053184509277s
Received healthy response to inference request in 3.823354959487915s
Received healthy response to inference request in 1.5588035583496094s
Received healthy response to inference request in 1.2974302768707275s
Received healthy response to inference request in 1.3660006523132324s
Received healthy response to inference request in 1.270186185836792s
Received healthy response to inference request in 1.928468942642212s
Received healthy response to inference request in 1.5574791431427002s
Received healthy response to inference request in 1.2678728103637695s
2026-03-27T07:02:38.491519+00:00 monitor updated for chaiml-pony-d3a-mv1-son_75599_v1
Received healthy response to inference request in 1.336366891860962s
Received healthy response to inference request in 1.3740968704223633s
Received healthy response to inference request in 1.5295295715332031s
Received healthy response to inference request in 1.424804449081421s
Received healthy response to inference request in 1.744269847869873s
Received healthy response to inference request in 1.3989379405975342s
30 requests
8 failed requests
5th percentile: 1.2689138293266295
10th percentile: 1.294705867767334
20th percentile: 1.3724776268005372
30th percentile: 1.4981120347976684
40th percentile: 1.6700833320617678
50th percentile: 2.76833713054657
60th percentile: 3.9596969604492185
70th percentile: 10.04286766052245
80th percentile: 20.124676084518434
90th percentile: 20.134336352348328
95th percentile: 20.136957955360412
99th percentile: 20.137259850502016
mean time: 7.531528623898824
%s, retrying in %s seconds...
Received healthy response to inference request in 1.1809191703796387s
Received healthy response to inference request in 1.1841681003570557s
Received healthy response to inference request in 1.2521436214447021s
Received healthy response to inference request in 1.1924967765808105s
Received healthy response to inference request in 1.1895217895507812s
Received healthy response to inference request in 1.6712207794189453s
Received healthy response to inference request in 1.2621595859527588s
Received healthy response to inference request in 1.280564785003662s
Received healthy response to inference request in 1.5251960754394531s
Received healthy response to inference request in 1.5553300380706787s
Received healthy response to inference request in 1.2476353645324707s
Received healthy response to inference request in 1.5508382320404053s
Received healthy response to inference request in 1.353708028793335s
Received healthy response to inference request in 1.3148794174194336s
Received healthy response to inference request in 1.2610023021697998s
Received healthy response to inference request in 1.355210304260254s
Received healthy response to inference request in 1.2634556293487549s
Received healthy response to inference request in 1.2683660984039307s
Received healthy response to inference request in 1.4344563484191895s
Received healthy response to inference request in 1.2659595012664795s
Received healthy response to inference request in 1.3210337162017822s
Received healthy response to inference request in 1.373896598815918s
Received healthy response to inference request in 1.4437448978424072s
Received healthy response to inference request in 1.4020640850067139s
Received healthy response to inference request in 1.3346507549285889s
Received healthy response to inference request in 1.4601593017578125s
Received healthy response to inference request in 1.338343620300293s
Received healthy response to inference request in 1.4358046054840088s
Received healthy response to inference request in 1.9038090705871582s
Received healthy response to inference request in 1.4098942279815674s
30 requests
0 failed requests
5th percentile: 1.1865772604942322
10th percentile: 1.1921992778778077
20th percentile: 1.2592305660247802
30th percentile: 1.2652083396911622
40th percentile: 1.3011535644531251
50th percentile: 1.336497187614441
60th percentile: 1.3626848220825194
70th percentile: 1.417262864112854
80th percentile: 1.4470277786254884
90th percentile: 1.5512874126434326
95th percentile: 1.619069945812225
99th percentile: 1.8363584661483767
mean time: 1.3677544275919595
Pipeline stage StressChecker completed in 273.44s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.83s
Shutdown handler de-registered
chaiml-pony-d3a-mv1-son_75599_v1 status is now deployed due to DeploymentManager action
chaiml-pony-d3a-mv1-son_75599_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-d3a-mv1-son_75599_v1 status is now torndown due to DeploymentManager action