Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name qwen-qwen3-5-35b-a3b-v47-uploader
Waiting for job on qwen-qwen3-5-35b-a3b-v47-uploader to finish
2026-03-24T18:57:55.723311+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
2026-03-24T18:58:56.057414+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
qwen-qwen3-5-35b-a3b-v47-uploader: Using quantization_mode: fp8
qwen-qwen3-5-35b-a3b-v47-uploader: Checking if ChaiML/Qwen3.5-35B-A3B-FP8 already exists in ChaiML
qwen-qwen3-5-35b-a3b-v47-uploader: Downloading snapshot of Qwen/Qwen3.5-35B-A3B...
qwen-qwen3-5-35b-a3b-v47-uploader: Downloaded in 20.359s
qwen-qwen3-5-35b-a3b-v47-uploader: Loading /tmp/model_input...
qwen-qwen3-5-35b-a3b-v47-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
qwen-qwen3-5-35b-a3b-v47-uploader: Applying quantization...
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:36.149615+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:37.995205+0000 | reset | INFO - Compression lifecycle reset
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:37.996947+0000 | moe_calibration_context | INFO - Found 40 MoE modules to replace
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:45.893174+0000 | moe_calibration_context | INFO - Replaced 40 MoE modules for calibration
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:45.893373+0000 | moe_calibration_context | INFO - 40/40 modules will remain in calibration form (permanent)
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:45.893451+0000 | from_modifiers | INFO - Creating recipe from modifiers
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:49.090584+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:49.091042+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T18:59:49.351361+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
2026-03-24T18:59:56.252941+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T19:00:14.045819+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
qwen-qwen3-5-35b-a3b-v47-uploader: 2026-03-24T19:00:14.046067+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
qwen-qwen3-5-35b-a3b-v47-uploader: Saving to /dev/shm/model_output...
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
qwen-qwen3-5-35b-a3b-v47-uploader: warnings.warn(
2026-03-24T19:00:57.129406+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
qwen-qwen3-5-35b-a3b-v47-uploader: Cleaning quantization config in /dev/shm/model_output
qwen-qwen3-5-35b-a3b-v47-uploader: Pushing to ChaiML/Qwen3.5-35B-A3B-FP8
qwen-qwen3-5-35b-a3b-v47-uploader: Checking if ChaiML/Qwen3.5-35B-A3B-FP8 already exists in ChaiML
qwen-qwen3-5-35b-a3b-v47-uploader: Creating repo ChaiML/Qwen3.5-35B-A3B-FP8 and uploading /dev/shm/model_output to it
qwen-qwen3-5-35b-a3b-v47-uploader: Found 1 files larger than 20GB (recommended limit):
qwen-qwen3-5-35b-a3b-v47-uploader: - model.safetensors: 37.7GB
qwen-qwen3-5-35b-a3b-v47-uploader: Large files may slow down loading and processing.
qwen-qwen3-5-35b-a3b-v47-uploader: ---------- 2026-03-24 19:01:08 (0:00:00) ----------
qwen-qwen3-5-35b-a3b-v47-uploader: Files: hashed 5/7 (32.4K/37.7G) | pre-uploaded: 0/0 (0.0/37.7G) (+7 unsure) | committed: 0/7 (0.0/37.7G) | ignored: 0
qwen-qwen3-5-35b-a3b-v47-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
qwen-qwen3-5-35b-a3b-v47-uploader: ---------------------------------------------------
2026-03-24T19:01:57.308055+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
qwen-qwen3-5-35b-a3b-v47-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
qwen-qwen3-5-35b-a3b-v47-uploader: ---------- 2026-03-24 19:02:08 (0:01:00) ----------
qwen-qwen3-5-35b-a3b-v47-uploader: Files: hashed 7/7 (37.7G/37.7G) | pre-uploaded: 2/2 (37.7G/37.7G) | committed: 0/7 (0.0/37.7G) | ignored: 0
qwen-qwen3-5-35b-a3b-v47-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 0 | committing: 1 | waiting: 63
qwen-qwen3-5-35b-a3b-v47-uploader: ---------------------------------------------------
qwen-qwen3-5-35b-a3b-v47-uploader: Processed model Qwen/Qwen3.5-35B-A3B in 196.847s
qwen-qwen3-5-35b-a3b-v47-uploader: creating bucket guanaco-vllm-models
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-5-35b-a3b-v47-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
qwen-qwen3-5-35b-a3b-v47-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-5-35b-a3b-v47-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-5-35b-a3b-v47-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-5-35b-a3b-v47-uploader: if re.search("-\.", bucket, re.UNICODE):
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-5-35b-a3b-v47-uploader: if re.search("\.\.", bucket, re.UNICODE):
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
qwen-qwen3-5-35b-a3b-v47-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
qwen-qwen3-5-35b-a3b-v47-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
qwen-qwen3-5-35b-a3b-v47-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
qwen-qwen3-5-35b-a3b-v47-uploader: Bucket 's3://guanaco-vllm-models/' created
qwen-qwen3-5-35b-a3b-v47-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/chat_template.jinja
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/recipe.yaml
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/config.json
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/generation_config.json
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/tokenizer_config.json
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/tokenizer.json
2026-03-24T19:02:57.920833+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
qwen-qwen3-5-35b-a3b-v47-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/qwen-qwen3-5-35b-a3b-v47/default/model.safetensors
Job qwen-qwen3-5-35b-a3b-v47-uploader completed after 401.72s with status: succeeded
Stopping job with name qwen-qwen3-5-35b-a3b-v47-uploader
Pipeline stage VLLMUploader completed in 402.82s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.10s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen3-5-35b-a3b-v47
Waiting for inference service qwen-qwen3-5-35b-a3b-v47 to be ready
2026-03-24T19:03:58.128416+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
2026-03-24T19:04:58.307107+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
2026-03-24T19:05:58.489043+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
Inference service qwen-qwen3-5-35b-a3b-v47 ready after 172.02250933647156s
Pipeline stage VLLMDeployer completed in 173.84s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
2026-03-24T19:06:58.678115+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.839680910110474s
2026-03-24T19:07:58.865135+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9419314861297607s
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.7312490940093994s
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.3488945960998535s
2026-03-24T19:08:59.112669+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.5115556716918945s
Received healthy response to inference request in 2.0425801277160645s
HTTPConnectionPool(host='qwen-qwen3-5-35b-a3b-v47-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.5185229778289795s
Received healthy response to inference request in 1.478628158569336s
Received healthy response to inference request in 1.8521511554718018s
Received healthy response to inference request in 1.8169245719909668s
Received healthy response to inference request in 1.8934812545776367s
2026-03-24T19:09:59.357252+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
Received healthy response to inference request in 17.80845618247986s
Received healthy response to inference request in 1.861513376235962s
Received healthy response to inference request in 4.287402153015137s
Received healthy response to inference request in 1.9218733310699463s
Received healthy response to inference request in 1.8935463428497314s
Received healthy response to inference request in 1.8093438148498535s
Received healthy response to inference request in 1.8353807926177979s
Received healthy response to inference request in 3.0664288997650146s
Received healthy response to inference request in 2.3484368324279785s
Received healthy response to inference request in 3.081392765045166s
Received healthy response to inference request in 2.1172332763671875s
30 requests
8 failed requests
5th percentile: 1.7663917183876037
10th percentile: 1.8161664962768556
20th percentile: 1.85964093208313
30th percentile: 1.913375234603882
40th percentile: 2.0873720169067385
50th percentile: 2.707661747932434
60th percentile: 4.37706356048584
70th percentile: 8.730313491821253
80th percentile: 20.443319272994994
90th percentile: 20.461698174476624
95th percentile: 20.609268379211425
99th percentile: 20.76297621965408
mean time: 7.840740203857422
%s, retrying in %s seconds...
Received healthy response to inference request in 1.989750623703003s
Received healthy response to inference request in 2.0970444679260254s
Received healthy response to inference request in 2.159616231918335s
Received healthy response to inference request in 1.963822364807129s
Received healthy response to inference request in 1.7582430839538574s
Received healthy response to inference request in 1.9483041763305664s
Received healthy response to inference request in 1.8942546844482422s
Received healthy response to inference request in 1.7914149761199951s
2026-03-24T19:10:59.563828+00:00 monitor updated for qwen-qwen3-5-35b-a3b_v47
Received healthy response to inference request in 2.104942560195923s
Received healthy response to inference request in 1.2206451892852783s
Received healthy response to inference request in 1.673980951309204s
Received healthy response to inference request in 2.492459535598755s
Received healthy response to inference request in 1.9525706768035889s
Received healthy response to inference request in 1.6321959495544434s
Received healthy response to inference request in 1.6808249950408936s
Received healthy response to inference request in 1.2825767993927002s
Received healthy response to inference request in 1.8160967826843262s
Received healthy response to inference request in 2.0367271900177s
Received healthy response to inference request in 1.2855339050292969s
Received healthy response to inference request in 1.3758375644683838s
Received healthy response to inference request in 1.9052097797393799s
Received healthy response to inference request in 2.075429916381836s
Received healthy response to inference request in 1.876784324645996s
Received healthy response to inference request in 1.6906354427337646s
Received healthy response to inference request in 1.6174728870391846s
Received healthy response to inference request in 2.0831663608551025s
Received healthy response to inference request in 1.7135136127471924s
Received healthy response to inference request in 1.5322458744049072s
Received healthy response to inference request in 1.9706699848175049s
Received healthy response to inference request in 2.059683084487915s
30 requests
0 failed requests
5th percentile: 1.2839074969291686
10th percentile: 1.3668071985244752
20th percentile: 1.6292513370513917
30th percentile: 1.6876923084259032
40th percentile: 1.77814621925354
50th percentile: 1.8855195045471191
60th percentile: 1.9500107765197754
70th percentile: 1.9763941764831543
80th percentile: 2.0628324508666993
90th percentile: 2.097834277153015
95th percentile: 2.135013079643249
99th percentile: 2.3959349775314336
mean time: 1.822721799214681
Pipeline stage StressChecker completed in 308.00s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.28s
Shutdown handler de-registered
qwen-qwen3-5-35b-a3b_v47 status is now deployed due to DeploymentManager action
qwen-qwen3-5-35b-a3b_v47 status is now torndown due to DeploymentManager action