Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-mega-v1-plc-q27b-57593-v2-uploader
Waiting for job on chaiml-mega-v1-plc-q27b-57593-v2-uploader to finish
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Using quantization_mode: fp8
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Downloading snapshot of ChaiML/mega-v1-plc-q27b-lr5e6ep2g8...
2026-03-28T07:07:34.298742+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Downloaded in 46.404s
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Loading /tmp/model_input...
chaiml-mega-v1-plc-q27b-57593-v2-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Applying quantization...
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:03.599879+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:05.650256+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:05.652645+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:05.700064+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:05.700331+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:05.713130+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:12.366814+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-v1-plc-q27b-57593-v2-uploader: 2026-03-28T07:08:12.367042+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Saving to /dev/shm/model_output...
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-v1-plc-q27b-57593-v2-uploader: warnings.warn(
2026-03-28T07:08:34.383344+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Pushing to ChaiML/mega-v1-plc-q27b-lr5e6ep2g8-FP8
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Checking if ChaiML/mega-v1-plc-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-plc-q27b-57593-v2-uploader: ChaiML/mega-v1-plc-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Processed model ChaiML/mega-v1-plc-q27b-lr5e6ep2g8 in 108.731s
chaiml-mega-v1-plc-q27b-57593-v2-uploader: creating bucket guanaco-vllm-models
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-mega-v1-plc-q27b-57593-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-mega-v1-plc-q27b-57593-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-mega-v1-plc-q27b-57593-v2-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-mega-v1-plc-q27b-57593-v2-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/chat_template.jinja
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/config.json
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/recipe.yaml
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/tokenizer_config.json
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/generation_config.json
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/tokenizer.json
2026-03-28T07:09:34.486928+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
chaiml-mega-v1-plc-q27b-57593-v2-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-mega-v1-plc-q27b-57593-v2/default/model.safetensors
Job chaiml-mega-v1-plc-q27b-57593-v2-uploader completed after 217.22s with status: succeeded
Stopping job with name chaiml-mega-v1-plc-q27b-57593-v2-uploader
Pipeline stage VLLMUploader completed in 217.70s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.14s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 16.69s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-mega-v1-plc-q27b-57593-v2
Waiting for inference service chaiml-mega-v1-plc-q27b-57593-v2 to be ready
2026-03-28T07:10:34.583738+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
2026-03-28T07:11:34.682998+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
2026-03-28T07:12:34.779129+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
Inference service chaiml-mega-v1-plc-q27b-57593-v2 ready after 171.9263050556183s
Pipeline stage VLLMDeployer completed in 172.55s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-28T07:13:35.374351+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:14:36.428325+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 6.462479591369629s
2026-03-28T07:15:36.692324+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 6.863498687744141s
Received healthy response to inference request in 5.170611619949341s
Received healthy response to inference request in 4.314397573471069s
Received healthy response to inference request in 4.375643730163574s
Received healthy response to inference request in 6.373688459396362s
Received healthy response to inference request in 4.05800724029541s
Received healthy response to inference request in 4.489595174789429s
Received healthy response to inference request in 4.129952669143677s
2026-03-28T07:16:37.036785+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
Received healthy response to inference request in 4.852182626724243s
Received healthy response to inference request in 4.675105810165405s
Received healthy response to inference request in 5.327117204666138s
Received healthy response to inference request in 4.7790846824646s
Received healthy response to inference request in 4.828561305999756s
Received healthy response to inference request in 5.270130157470703s
Received healthy response to inference request in 4.609651327133179s
Received healthy response to inference request in 5.1173036098480225s
Retrying (%r) after connection broken by '%r': %s
Received healthy response to inference request in 4.267755746841431s
Received healthy response to inference request in 4.677187919616699s
Received healthy response to inference request in 4.858171224594116s
Received healthy response to inference request in 4.713094234466553s
Received healthy response to inference request in 3.667769193649292s
2026-03-28T07:17:37.207593+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
Received healthy response to inference request in 4.389366388320923s
30 requests
7 failed requests
5th percentile: 4.09038268327713
10th percentile: 4.253975439071655
20th percentile: 4.3866218566894535
30th percentile: 4.655469465255737
40th percentile: 4.752688503265381
50th percentile: 4.85517692565918
60th percentile: 5.210419034957885
70th percentile: 6.400325798988342
80th percentile: 20.140539026260377
90th percentile: 20.209747195243835
95th percentile: 20.294165015220642
99th percentile: 20.48450377225876
mean time: 8.468699185053508
%s, retrying in %s seconds...
Received healthy response to inference request in 5.392541885375977s
Received healthy response to inference request in 4.871896982192993s
Received healthy response to inference request in 4.823660850524902s
Received healthy response to inference request in 4.817582607269287s
Received healthy response to inference request in 5.0081634521484375s
Received healthy response to inference request in 4.607986927032471s
Received healthy response to inference request in 5.0180370807647705s
Received healthy response to inference request in 4.603373289108276s
Received healthy response to inference request in 4.826814651489258s
Received healthy response to inference request in 4.749234437942505s
Received healthy response to inference request in 4.635124206542969s
2026-03-28T07:18:37.345976+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
Received healthy response to inference request in 4.86442232131958s
Received healthy response to inference request in 4.70210337638855s
Received healthy response to inference request in 4.771770715713501s
Received healthy response to inference request in 4.763468503952026s
Received healthy response to inference request in 4.850884199142456s
Received healthy response to inference request in 4.737501382827759s
Received healthy response to inference request in 4.761744976043701s
Received healthy response to inference request in 3.918719530105591s
Received healthy response to inference request in 5.05108642578125s
Received healthy response to inference request in 4.5582075119018555s
Received healthy response to inference request in 4.772989273071289s
Received healthy response to inference request in 4.796916723251343s
2026-03-28T07:19:37.448281+00:00 monitor updated for chaiml-mega-v1-plc-q27b_57593_v2
Received healthy response to inference request in 4.807870149612427s
Received healthy response to inference request in 4.603210926055908s
Received healthy response to inference request in 4.761974811553955s
Received healthy response to inference request in 4.94874382019043s
Received healthy response to inference request in 5.322396755218506s
Received healthy response to inference request in 4.787364482879639s
Received healthy response to inference request in 4.727262020111084s
30 requests
0 failed requests
5th percentile: 4.5784590482711796
10th percentile: 4.603357052803039
20th percentile: 4.688707542419434
30th percentile: 4.745714521408081
40th percentile: 4.762871026992798
50th percentile: 4.780176877975464
60th percentile: 4.811755132675171
70th percentile: 4.834035515785217
80th percentile: 4.887266349792481
90th percentile: 5.021342015266418
95th percentile: 5.20030710697174
99th percentile: 5.37219979763031
mean time: 4.79543514251709
Pipeline stage StressChecker completed in 405.23s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.91s
Shutdown handler de-registered
chaiml-mega-v1-plc-q27b_57593_v2 status is now deployed due to DeploymentManager action
chaiml-mega-v1-plc-q27b_57593_v2 status is now inactive due to auto deactivation removed underperforming models