Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-mega-v1-winall-q-28145-v1-uploader
Waiting for job on chaiml-mega-v1-winall-q-28145-v1-uploader to finish
Failed to get response for submission chaiml-gspo-glm47-combi_10268_v1: ('http://chaiml-gspo-glm47-combi-10268-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
chaiml-mega-v1-winall-q-28145-v1-uploader: Using quantization_mode: fp8
chaiml-mega-v1-winall-q-28145-v1-uploader: Checking if ChaiML/mega-v1-winall-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-winall-q-28145-v1-uploader: Downloading snapshot of ChaiML/mega-v1-winall-q27b-lr5e6ep2g8...
2026-03-28T13:08:16.566348+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
chaiml-mega-v1-winall-q-28145-v1-uploader: Downloaded in 50.439s
chaiml-mega-v1-winall-q-28145-v1-uploader: Loading /tmp/model_input...
chaiml-mega-v1-winall-q-28145-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-v1-winall-q-28145-v1-uploader: Applying quantization...
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:32.573934+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:34.626205+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:34.628504+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:34.675991+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:34.676252+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:34.691866+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:41.563815+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-v1-winall-q-28145-v1-uploader: 2026-03-28T13:08:41.564020+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-v1-winall-q-28145-v1-uploader: Saving to /dev/shm/model_output...
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-v1-winall-q-28145-v1-uploader: warnings.warn(
2026-03-28T13:09:16.665585+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
chaiml-mega-v1-winall-q-28145-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-mega-v1-winall-q-28145-v1-uploader: Pushing to ChaiML/mega-v1-winall-q27b-lr5e6ep2g8-FP8
chaiml-mega-v1-winall-q-28145-v1-uploader: Checking if ChaiML/mega-v1-winall-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-winall-q-28145-v1-uploader: Creating repo ChaiML/mega-v1-winall-q27b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-mega-v1-winall-q-28145-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-mega-v1-winall-q-28145-v1-uploader: - model.safetensors: 35.9GB
chaiml-mega-v1-winall-q-28145-v1-uploader: Large files may slow down loading and processing.
chaiml-mega-v1-winall-q-28145-v1-uploader: ---------- 2026-03-28 13:09:32 (0:00:00) ----------
chaiml-mega-v1-winall-q-28145-v1-uploader: Files: hashed 5/7 (34.1K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-mega-v1-winall-q-28145-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-mega-v1-winall-q-28145-v1-uploader: ---------------------------------------------------
2026-03-28T13:10:16.811798+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
chaiml-mega-v1-winall-q-28145-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-mega-v1-winall-q-28145-v1-uploader: ---------- 2026-03-28 13:10:32 (0:01:00) ----------
chaiml-mega-v1-winall-q-28145-v1-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-mega-v1-winall-q-28145-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-mega-v1-winall-q-28145-v1-uploader: ---------------------------------------------------
2026-03-28T13:11:16.981129+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
Failed to get request counts for guanaco-submitter. Falling back to default
chaiml-mega-v1-winall-q-28145-v1-uploader: Processed model ChaiML/mega-v1-winall-q27b-lr5e6ep2g8 in 224.883s
chaiml-mega-v1-winall-q-28145-v1-uploader: creating bucket guanaco-vllm-models
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-winall-q-28145-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-mega-v1-winall-q-28145-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-winall-q-28145-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-winall-q-28145-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-winall-q-28145-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-winall-q-28145-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-mega-v1-winall-q-28145-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-mega-v1-winall-q-28145-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-mega-v1-winall-q-28145-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-mega-v1-winall-q-28145-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-mega-v1-winall-q-28145-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/chat_template.jinja
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/generation_config.json
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/recipe.yaml
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/tokenizer_config.json
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/config.json
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/tokenizer.json
2026-03-28T13:12:17.155720+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
chaiml-mega-v1-winall-q-28145-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-mega-v1-winall-q-28145-v1/default/model.safetensors
Job chaiml-mega-v1-winall-q-28145-v1-uploader completed after 318.84s with status: succeeded
Stopping job with name chaiml-mega-v1-winall-q-28145-v1-uploader
Pipeline stage VLLMUploader completed in 319.26s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.76s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-mega-v1-winall-q-28145-v1
Waiting for inference service chaiml-mega-v1-winall-q-28145-v1 to be ready
2026-03-28T13:13:17.327239+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
2026-03-28T13:14:17.423175+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
2026-03-28T13:15:17.523966+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
Inference service chaiml-mega-v1-winall-q-28145-v1 ready after 160.19949865341187s
Pipeline stage VLLMDeployer completed in 160.63s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T13:16:17.661200+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 11.9788076877594s
Received healthy response to inference request in 4.533012866973877s
2026-03-28T13:17:17.760007+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
Received healthy response to inference request in 1.8110222816467285s
Received healthy response to inference request in 1.7922465801239014s
Received healthy response to inference request in 4.402658462524414s
Received healthy response to inference request in 1.9921495914459229s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.952615737915039s
Received healthy response to inference request in 2.458251476287842s
2026-03-28T13:18:17.866991+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
{"detail":"('http://chaiml-mega-v1-winall-q-28145-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'upstream connect error or disconnect/reset before headers. reset reason: connection termination')"}
Received unhealthy response to inference request!
Received healthy response to inference request in 1.949167013168335s
Received healthy response to inference request in 8.710394382476807s
Received healthy response to inference request in 1.8780369758605957s
Received healthy response to inference request in 2.023808717727661s
Received healthy response to inference request in 2.4382057189941406s
Received healthy response to inference request in 1.938450813293457s
Received healthy response to inference request in 2.2209651470184326s
Received healthy response to inference request in 1.929567575454712s
Received healthy response to inference request in 2.259737968444824s
Received healthy response to inference request in 2.0545248985290527s
Received healthy response to inference request in 2.102703809738159s
Received healthy response to inference request in 1.9474608898162842s
Received healthy response to inference request in 1.9459047317504883s
Received healthy response to inference request in 2.4114956855773926s
30 requests
8 failed requests
5th percentile: 1.8411788940429688
10th percentile: 1.9244145154953003
20th percentile: 1.947149658203125
30th percentile: 1.9802894353866576
40th percentile: 2.083432245254517
50th percentile: 2.3356168270111084
60th percentile: 3.236014270782468
70th percentile: 9.690918374061575
80th percentile: 20.123420429229736
90th percentile: 20.129610657691956
95th percentile: 20.14643965959549
99th percentile: 20.164080681800844
mean time: 7.480960075060526
%s, retrying in %s seconds...
Received healthy response to inference request in 1.7422864437103271s
Received healthy response to inference request in 2.5594704151153564s
Received healthy response to inference request in 1.7576110363006592s
Received healthy response to inference request in 1.7893569469451904s
2026-03-28T13:19:17.973909+00:00 monitor updated for chaiml-mega-v1-winall-q_28145_v1
Received healthy response to inference request in 1.8637945652008057s
Received healthy response to inference request in 1.9303159713745117s
Received healthy response to inference request in 1.9073302745819092s
Received healthy response to inference request in 1.9825758934020996s
Received healthy response to inference request in 2.0973002910614014s
Received healthy response to inference request in 1.9392378330230713s
Received healthy response to inference request in 1.7484099864959717s
Received healthy response to inference request in 1.8910174369812012s
Received healthy response to inference request in 1.9384796619415283s
Received healthy response to inference request in 1.9959673881530762s
Received healthy response to inference request in 1.7660973072052002s
Received healthy response to inference request in 2.23401141166687s
Received healthy response to inference request in 2.109989643096924s
Received healthy response to inference request in 2.240163564682007s
Received healthy response to inference request in 1.9565858840942383s
Received healthy response to inference request in 1.9632782936096191s
Received healthy response to inference request in 2.060033082962036s
Received healthy response to inference request in 1.961266279220581s
Received healthy response to inference request in 2.261977195739746s
Received healthy response to inference request in 1.9471795558929443s
Received healthy response to inference request in 1.9861066341400146s
Received healthy response to inference request in 1.921311378479004s
Received healthy response to inference request in 2.0177905559539795s
Received healthy response to inference request in 2.986452341079712s
Received healthy response to inference request in 2.0451412200927734s
Received healthy response to inference request in 2.1920089721679688s
30 requests
0 failed requests
5th percentile: 1.7525504589080811
10th percentile: 1.765248680114746
20th percentile: 1.885572862625122
30th percentile: 1.9276145935058593
40th percentile: 1.9440028667449951
50th percentile: 1.9622722864151
60th percentile: 1.9900509357452392
70th percentile: 2.0496087789535524
80th percentile: 2.126393508911133
90th percentile: 2.242344927787781
95th percentile: 2.425598466396331
99th percentile: 2.8626275825500493
mean time: 2.0264182488123574
Pipeline stage StressChecker completed in 294.77s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.88s
Shutdown handler de-registered
chaiml-mega-v1-winall-q_28145_v1 status is now deployed due to DeploymentManager action
chaiml-mega-v1-winall-q_28145_v1 status is now inactive due to admin request
chaiml-mega-v1-winall-q_28145_v1 status is now torndown due to DeploymentManager action