Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-mega-v1-sonnetwi-11582-v1-uploader
Waiting for job on chaiml-mega-v1-sonnetwi-11582-v1-uploader to finish
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Using quantization_mode: fp8
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Checking if ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Downloading snapshot of ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8...
2026-03-28T07:06:50.352630+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Downloaded in 60.393s
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Loading /tmp/model_input...
chaiml-mega-v1-sonnetwi-11582-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Applying quantization...
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:33.565955+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:35.570253+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:35.572452+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:35.620006+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:35.620283+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:35.633230+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
2026-03-28T07:07:50.445256+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:42.841768+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-v1-sonnetwi-11582-v1-uploader: 2026-03-28T07:07:42.841990+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Saving to /dev/shm/model_output...
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-v1-sonnetwi-11582-v1-uploader: warnings.warn(
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Pushing to ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Checking if ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Creating repo ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-mega-v1-sonnetwi-11582-v1-uploader: - model.safetensors: 35.9GB
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Large files may slow down loading and processing.
chaiml-mega-v1-sonnetwi-11582-v1-uploader: ---------- 2026-03-28 07:08:34 (0:00:00) ----------
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Files: hashed 5/7 (34.1K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-mega-v1-sonnetwi-11582-v1-uploader: ---------------------------------------------------
2026-03-28T07:08:50.738213+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
chaiml-mega-v1-sonnetwi-11582-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-mega-v1-sonnetwi-11582-v1-uploader: ---------- 2026-03-28 07:09:34 (0:01:00) ----------
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-mega-v1-sonnetwi-11582-v1-uploader: ---------------------------------------------------
2026-03-28T07:09:50.886521+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Processed model ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8 in 226.036s
chaiml-mega-v1-sonnetwi-11582-v1-uploader: creating bucket guanaco-vllm-models
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-mega-v1-sonnetwi-11582-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-mega-v1-sonnetwi-11582-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-mega-v1-sonnetwi-11582-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-mega-v1-sonnetwi-11582-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/chat_template.jinja
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/generation_config.json
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/config.json
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/tokenizer_config.json
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/recipe.yaml
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/tokenizer.json
2026-03-28T07:10:50.990936+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
Retrying (%r) after connection broken by '%r': %s
chaiml-mega-v1-sonnetwi-11582-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v1/default/model.safetensors
Job chaiml-mega-v1-sonnetwi-11582-v1-uploader completed after 330.4s with status: succeeded
Stopping job with name chaiml-mega-v1-sonnetwi-11582-v1-uploader
Pipeline stage VLLMUploader completed in 330.87s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.09s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.17s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-mega-v1-sonnetwi-11582-v1
Waiting for inference service chaiml-mega-v1-sonnetwi-11582-v1 to be ready
2026-03-28T07:11:51.080170+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
2026-03-28T07:12:51.199464+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
2026-03-28T07:13:51.351143+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
Inference service chaiml-mega-v1-sonnetwi-11582-v1 ready after 180.32821226119995s
Pipeline stage VLLMDeployer completed in 180.75s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:14:51.444430+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 12.438383340835571s
2026-03-28T07:15:51.538195+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:16:51.673026+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
{"detail":"('http://chaiml-mega-v1-sonnetwi-11582-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination')"}
Received unhealthy response to inference request!
Received healthy response to inference request in 4.244635581970215s
Received healthy response to inference request in 4.390058279037476s
Received healthy response to inference request in 1.9012620449066162s
Received healthy response to inference request in 2.027315139770508s
Received healthy response to inference request in 1.8911640644073486s
Received healthy response to inference request in 2.4136228561401367s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.129103422164917s
Received healthy response to inference request in 1.9312667846679688s
Received healthy response to inference request in 1.7871448993682861s
Received healthy response to inference request in 1.8082013130187988s
2026-03-28T07:17:51.765583+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9986348152160645s
Received healthy response to inference request in 2.2808966636657715s
Received healthy response to inference request in 9.81415319442749s
Received healthy response to inference request in 2.112879514694214s
Received healthy response to inference request in 2.2843105792999268s
Received healthy response to inference request in 1.9797470569610596s
Received healthy response to inference request in 1.8160064220428467s
Received healthy response to inference request in 1.9368786811828613s
Received healthy response to inference request in 1.9528369903564453s
30 requests
10 failed requests
5th percentile: 1.8117136120796205
10th percentile: 1.8836483001708986
20th percentile: 1.935756301879883
30th percentile: 1.992968487739563
40th percentile: 2.122613859176636
50th percentile: 2.3489667177200317
60th percentile: 6.559696245193473
70th percentile: 20.119062399864198
80th percentile: 20.134252405166627
90th percentile: 20.138962316513062
95th percentile: 20.160769402980804
99th percentile: 20.199246180057525
mean time: 8.819353890419006
%s, retrying in %s seconds...
2026-03-28T07:18:51.864859+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
Received healthy response to inference request in 1.7544291019439697s
Received healthy response to inference request in 2.459390163421631s
Received healthy response to inference request in 1.7548868656158447s
Received healthy response to inference request in 1.9232349395751953s
Received healthy response to inference request in 1.7698190212249756s
Received healthy response to inference request in 1.8715522289276123s
Received healthy response to inference request in 1.9542546272277832s
Received healthy response to inference request in 2.2594590187072754s
Received healthy response to inference request in 1.743873119354248s
Received healthy response to inference request in 1.8458037376403809s
Received healthy response to inference request in 1.7806789875030518s
Received healthy response to inference request in 1.860137701034546s
Received healthy response to inference request in 1.9053924083709717s
Received healthy response to inference request in 1.742962121963501s
Received healthy response to inference request in 1.90580153465271s
Received healthy response to inference request in 1.8471486568450928s
Received healthy response to inference request in 2.031238555908203s
Received healthy response to inference request in 2.3053882122039795s
Received healthy response to inference request in 1.8476665019989014s
Received healthy response to inference request in 1.898108959197998s
Received healthy response to inference request in 1.8824002742767334s
Received healthy response to inference request in 2.0002012252807617s
Received healthy response to inference request in 1.950807809829712s
Received healthy response to inference request in 1.9017069339752197s
Received healthy response to inference request in 2.364725351333618s
Received healthy response to inference request in 1.8163049221038818s
Received healthy response to inference request in 1.9044380187988281s
Received healthy response to inference request in 2.0108132362365723s
Received healthy response to inference request in 2.043496608734131s
Received healthy response to inference request in 2.2079739570617676s
30 requests
0 failed requests
5th percentile: 1.7486233115196228
10th percentile: 1.7548410892486572
20th percentile: 1.8091797351837158
2026-03-28T07:19:51.968454+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v1
30th percentile: 1.8475111484527589
40th percentile: 1.878061056137085
50th percentile: 1.903072476387024
60th percentile: 1.912774896621704
70th percentile: 1.9680386066436766
80th percentile: 2.033690166473389
90th percentile: 2.2640519380569457
95th percentile: 2.3380236387252804
99th percentile: 2.4319373679161074
mean time: 1.9514698266983033
Pipeline stage StressChecker completed in 329.06s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.73s
Shutdown handler de-registered
chaiml-mega-v1-sonnetwi_11582_v1 status is now deployed due to DeploymentManager action
chaiml-mega-v1-sonnetwi_11582_v1 status is now inactive due to auto deactivation removed underperforming models