Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-d3b-mv1-win-84391-v1-uploader
Waiting for job on chaiml-pony-d3b-mv1-win-84391-v1-uploader to finish
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Using quantization_mode: fp8
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Checking if ChaiML/pony-d3b-mv1-winall-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Downloading snapshot of ChaiML/pony-d3b-mv1-winall-q35b-lr5e6ep2g8...
2026-03-27T06:49:40.475388+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Downloaded in 26.417s
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Loading /tmp/model_input...
chaiml-pony-d3b-mv1-win-84391-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Applying quantization...
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:06.920976+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:09.288672+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:09.291302+0000 | moe_calibration_context | INFO - Found 40 MoE modules to replace
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:23.617840+0000 | moe_calibration_context | INFO - Replaced 40 MoE modules for calibration
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:23.618148+0000 | moe_calibration_context | INFO - 40/40 modules will remain in calibration form (permanent)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:23.618276+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:26.897312+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:26.897832+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:27.183036+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
2026-03-27T06:50:40.563126+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:58.551038+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-d3b-mv1-win-84391-v1-uploader: 2026-03-27T06:50:58.551272+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: warnings.warn(
2026-03-27T06:51:40.659749+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Pushing to ChaiML/pony-d3b-mv1-winall-q35b-lr5e6ep2g8-FP8
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Checking if ChaiML/pony-d3b-mv1-winall-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Creating repo ChaiML/pony-d3b-mv1-winall-q35b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-d3b-mv1-win-84391-v1-uploader: - model.safetensors: 37.7GB
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-d3b-mv1-win-84391-v1-uploader: ---------- 2026-03-27 06:51:55 (0:00:00) ----------
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Files: hashed 5/7 (32.5K/37.7G) | pre-uploaded: 0/0 (0.0/37.7G) (+7 unsure) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-d3b-mv1-win-84391-v1-uploader: ---------------------------------------------------
2026-03-27T06:52:41.072647+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
chaiml-pony-d3b-mv1-win-84391-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-pony-d3b-mv1-win-84391-v1-uploader: ---------- 2026-03-27 06:52:55 (0:01:00) ----------
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Files: hashed 7/7 (37.7G/37.7G) | pre-uploaded: 1/2 (20.0M/37.7G) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-d3b-mv1-win-84391-v1-uploader: ---------------------------------------------------
2026-03-27T06:53:41.490291+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Processed model ChaiML/pony-d3b-mv1-winall-q35b-lr5e6ep2g8 in 270.919s
chaiml-pony-d3b-mv1-win-84391-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-d3b-mv1-win-84391-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-d3b-mv1-win-84391-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-d3b-mv1-win-84391-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/generation_config.json
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/recipe.yaml
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/chat_template.jinja
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/config.json
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/tokenizer_config.json
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/tokenizer.json
2026-03-27T06:54:41.583521+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
chaiml-pony-d3b-mv1-win-84391-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-win-84391-v1/default/model.safetensors
Job chaiml-pony-d3b-mv1-win-84391-v1-uploader completed after 388.87s with status: succeeded
Stopping job with name chaiml-pony-d3b-mv1-win-84391-v1-uploader
Pipeline stage VLLMUploader completed in 389.32s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.09s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.31s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-d3b-mv1-win-84391-v1
Waiting for inference service chaiml-pony-d3b-mv1-win-84391-v1 to be ready
2026-03-27T06:55:41.672198+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
2026-03-27T06:56:41.760709+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
2026-03-27T06:57:41.892139+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
Inference service chaiml-pony-d3b-mv1-win-84391-v1 ready after 200.73524284362793s
Pipeline stage VLLMDeployer completed in 201.20s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-27T06:58:41.981244+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-27T06:59:42.079601+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
Received healthy response to inference request in 10.54435396194458s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.939502716064453s
Received healthy response to inference request in 3.3223307132720947s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.4143590927124023s
2026-03-27T07:00:42.173944+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.3299510478973389s
Received healthy response to inference request in 9.774691581726074s
2026-03-27T07:01:42.268196+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.3122172355651855s
Received healthy response to inference request in 5.789796590805054s
Received healthy response to inference request in 10.181659936904907s
Received healthy response to inference request in 1.4043872356414795s
Received healthy response to inference request in 1.8901491165161133s
Received healthy response to inference request in 1.2671687602996826s
Received healthy response to inference request in 1.9222886562347412s
Received healthy response to inference request in 1.5835134983062744s
Received healthy response to inference request in 1.2919704914093018s
Received healthy response to inference request in 1.434847116470337s
Received healthy response to inference request in 1.7542493343353271s
Received healthy response to inference request in 1.33221435546875s
Received healthy response to inference request in 1.2962043285369873s
Received healthy response to inference request in 1.42854905128479s
Received healthy response to inference request in 1.3492684364318848s
Received healthy response to inference request in 1.5450210571289062s
30 requests
8 failed requests
5th percentile: 1.2938757181167602
10th percentile: 1.3106159448623658
20th percentile: 1.3458576202392578
30th percentile: 1.4242920637130738
40th percentile: 1.5681165218353272
50th percentile: 1.9062188863754272
60th percentile: 5.849679040908813
70th percentile: 10.290468144416808
80th percentile: 20.122184562683106
90th percentile: 20.139022994041444
95th percentile: 20.165276610851286
99th percentile: 20.198790588378905
mean time: 7.675886122385661
%s, retrying in %s seconds...
Received healthy response to inference request in 1.2773618698120117s
Received healthy response to inference request in 1.1943755149841309s
Received healthy response to inference request in 1.2168669700622559s
Received healthy response to inference request in 1.3410284519195557s
Received healthy response to inference request in 1.2054150104522705s
Received healthy response to inference request in 1.2514126300811768s
Received healthy response to inference request in 1.297511339187622s
Received healthy response to inference request in 1.327378273010254s
Received healthy response to inference request in 1.2581024169921875s
Received healthy response to inference request in 1.2790534496307373s
Received healthy response to inference request in 1.2276265621185303s
2026-03-27T07:02:42.379137+00:00 monitor updated for chaiml-pony-d3b-mv1-win_84391_v1
Received healthy response to inference request in 1.227222204208374s
Received healthy response to inference request in 1.2587378025054932s
Received healthy response to inference request in 1.3758561611175537s
Received healthy response to inference request in 1.2870211601257324s
Received healthy response to inference request in 1.3902337551116943s
Received healthy response to inference request in 1.3377060890197754s
Received healthy response to inference request in 1.9367148876190186s
Received healthy response to inference request in 1.3942441940307617s
Received healthy response to inference request in 1.3073382377624512s
Received healthy response to inference request in 1.363539218902588s
Received healthy response to inference request in 1.2709472179412842s
Received healthy response to inference request in 1.2920632362365723s
Received healthy response to inference request in 1.3957324028015137s
Received healthy response to inference request in 1.4496362209320068s
Received healthy response to inference request in 1.3073937892913818s
Received healthy response to inference request in 1.545609712600708s
Received healthy response to inference request in 1.4148190021514893s
Received healthy response to inference request in 1.4640531539916992s
Received healthy response to inference request in 1.4078586101531982s
30 requests
0 failed requests
5th percentile: 1.2105683922767638
10th percentile: 1.2261866807937623
20th percentile: 1.2567644596099854
30th percentile: 1.2754374742507935
40th percentile: 1.2900464057922363
50th percentile: 1.3073660135269165
60th percentile: 1.3390350341796875
70th percentile: 1.380169439315796
80th percentile: 1.3981576442718506
90th percentile: 1.4510779142379762
95th percentile: 1.508909261226654
99th percentile: 1.8232943868637088
mean time: 1.343428651491801
Pipeline stage StressChecker completed in 276.08s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.81s
Shutdown handler de-registered
chaiml-pony-d3b-mv1-win_84391_v1 status is now deployed due to DeploymentManager action
chaiml-pony-d3b-mv1-win_84391_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-d3b-mv1-win_84391_v1 status is now torndown due to DeploymentManager action