Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-mega-v1-sonnetwi-11582-v2-uploader
Waiting for job on chaiml-mega-v1-sonnetwi-11582-v2-uploader to finish
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Using quantization_mode: fp8
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Checking if ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Downloading snapshot of ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8...
2026-03-28T07:07:23.401407+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Downloaded in 48.386s
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Loading /tmp/model_input...
chaiml-mega-v1-sonnetwi-11582-v2-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Applying quantization...
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:07:54.738161+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:07:56.856778+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:07:56.859122+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:07:56.905039+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:07:56.905280+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:07:56.917925+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:08:03.478128+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-v1-sonnetwi-11582-v2-uploader: 2026-03-28T07:08:03.478349+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Saving to /dev/shm/model_output...
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-v1-sonnetwi-11582-v2-uploader: warnings.warn(
2026-03-28T07:08:23.496411+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Pushing to ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Checking if ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-sonnetwi-11582-v2-uploader: ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Processed model ChaiML/mega-v1-sonnetwintop2-q27b-lr5e6ep2g8 in 109.982s
chaiml-mega-v1-sonnetwi-11582-v2-uploader: creating bucket guanaco-vllm-models
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-mega-v1-sonnetwi-11582-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-mega-v1-sonnetwi-11582-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-mega-v1-sonnetwi-11582-v2-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-mega-v1-sonnetwi-11582-v2-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/chat_template.jinja
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/tokenizer_config.json
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/recipe.yaml
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/generation_config.json
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/config.json
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/tokenizer.json
2026-03-28T07:09:23.587519+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
chaiml-mega-v1-sonnetwi-11582-v2-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-mega-v1-sonnetwi-11582-v2/default/model.safetensors
Job chaiml-mega-v1-sonnetwi-11582-v2-uploader completed after 215.1s with status: succeeded
Stopping job with name chaiml-mega-v1-sonnetwi-11582-v2-uploader
Pipeline stage VLLMUploader completed in 215.56s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.32s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-mega-v1-sonnetwi-11582-v2
Waiting for inference service chaiml-mega-v1-sonnetwi-11582-v2 to be ready
2026-03-28T07:10:23.687580+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
2026-03-28T07:11:29.632863+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
2026-03-28T07:12:29.750538+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Inference service chaiml-mega-v1-sonnetwi-11582-v2 ready after 180.23049187660217s
Pipeline stage VLLMDeployer completed in 180.68s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:13:29.846162+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 6.4003050327301025s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:14:29.971677+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Received healthy response to inference request in 7.719114065170288s
Received healthy response to inference request in 5.3374879360198975s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.799242258071899s
Received healthy response to inference request in 5.2865517139434814s
Received healthy response to inference request in 4.907512903213501s
2026-03-28T07:15:30.103083+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Received healthy response to inference request in 19.474870204925537s
Received healthy response to inference request in 5.268590688705444s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.031743049621582s
Received healthy response to inference request in 4.335277557373047s
Received healthy response to inference request in 4.834999322891235s
Received healthy response to inference request in 4.376963138580322s
2026-03-28T07:16:30.211222+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Received healthy response to inference request in 15.798967599868774s
Received healthy response to inference request in 4.049773693084717s
Received healthy response to inference request in 4.813002347946167s
Received healthy response to inference request in 4.6873204708099365s
Received healthy response to inference request in 4.777029037475586s
Received healthy response to inference request in 4.395367383956909s
Received healthy response to inference request in 4.869901418685913s
Received healthy response to inference request in 4.835489273071289s
Received healthy response to inference request in 4.714156150817871s
Received healthy response to inference request in 5.094900846481323s
Received healthy response to inference request in 4.7992167472839355s
2026-03-28T07:17:30.309535+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Received healthy response to inference request in 4.783429145812988s
30 requests
6 failed requests
5th percentile: 4.3540360689163204
10th percentile: 4.39352695941925
20th percentile: 4.764454460144043
30th percentile: 4.79923460483551
40th percentile: 4.835293292999268
50th percentile: 4.9696279764175415
60th percentile: 5.2757750988006595
70th percentile: 6.795947742462155
80th percentile: 19.60399594306946
90th percentile: 20.125273561477663
95th percentile: 20.134244978427887
99th percentile: 20.149622089862824
mean time: 8.872740618387859
%s, retrying in %s seconds...
Received healthy response to inference request in 4.907282829284668s
Received healthy response to inference request in 4.762383460998535s
Received healthy response to inference request in 4.908521413803101s
Received healthy response to inference request in 5.214538812637329s
Received healthy response to inference request in 4.2570130825042725s
Received healthy response to inference request in 4.1567542552948s
Received healthy response to inference request in 5.259005069732666s
Received healthy response to inference request in 5.046595335006714s
Received healthy response to inference request in 4.657857418060303s
Received healthy response to inference request in 4.788200855255127s
Received healthy response to inference request in 4.727200031280518s
Received healthy response to inference request in 4.772112131118774s
2026-03-28T07:18:30.414364+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Received healthy response to inference request in 4.908267259597778s
Received healthy response to inference request in 4.83371639251709s
Received healthy response to inference request in 4.883122682571411s
Received healthy response to inference request in 4.662230491638184s
Received healthy response to inference request in 5.044516086578369s
Received healthy response to inference request in 4.686923980712891s
Received healthy response to inference request in 4.7777628898620605s
Received healthy response to inference request in 4.717977523803711s
Received healthy response to inference request in 5.1349711418151855s
Received healthy response to inference request in 4.3190016746521s
Received healthy response to inference request in 4.738907337188721s
Received healthy response to inference request in 4.423011541366577s
2026-03-28T07:19:30.844154+00:00 monitor updated for chaiml-mega-v1-sonnetwi_11582_v2
Received healthy response to inference request in 5.0094263553619385s
Received healthy response to inference request in 4.4603869915008545s
Received healthy response to inference request in 4.735269784927368s
Received healthy response to inference request in 4.756206512451172s
Received healthy response to inference request in 4.638639211654663s
Received healthy response to inference request in 4.770282030105591s
30 requests
0 failed requests
5th percentile: 4.284907948970795
10th percentile: 4.41261055469513
20th percentile: 4.654013776779175
30th percentile: 4.708661460876465
40th percentile: 4.7374523162841795
50th percentile: 4.766332745552063
60th percentile: 4.781938076019287
70th percentile: 4.890370726585388
80th percentile: 4.928702402114869
90th percentile: 5.055432915687561
95th percentile: 5.178733360767365
99th percentile: 5.2461098551750185
mean time: 4.765269486109416
Pipeline stage StressChecker completed in 415.44s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.58s
Shutdown handler de-registered
chaiml-mega-v1-sonnetwi_11582_v2 status is now deployed due to DeploymentManager action
chaiml-mega-v1-sonnetwi_11582_v2 status is now inactive due to auto deactivation removed underperforming models