Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-henry-cavill2603-87433-v1-uploader
Waiting for job on chaiml-henry-cavill2603-87433-v1-uploader to finish
chaiml-henry-cavill2603-87433-v1-uploader: Using quantization_mode: fp8
chaiml-henry-cavill2603-87433-v1-uploader: Checking if ChaiML/Henry-Cavill260304000002_sft-FP8 already exists in ChaiML
chaiml-henry-cavill2603-87433-v1-uploader: Downloading snapshot of ChaiML/Henry-Cavill260304000002_sft...
chaiml-henry-cavill2603-87433-v1-uploader: Downloaded in 83.694s
chaiml-henry-cavill2603-87433-v1-uploader: Loading /tmp/model_input...
chaiml-henry-cavill2603-87433-v1-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-henry-cavill2603-87433-v1-uploader: `torch_dtype` is deprecated! Use `dtype` instead!
chaiml-henry-cavill2603-87433-v1-uploader: Some parameters are on the meta device because they were offloaded to the cpu.
chaiml-henry-cavill2603-87433-v1-uploader: Applying quantization...
chaiml-henry-cavill2603-87433-v1-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:14:42.404401-0800 | reset | INFO - Compression lifecycle reset
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:14:42.405353-0800 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:14:42.472947-0800 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:14:42.473199-0800 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-henry-cavill2603-87433-v1-uploader: Some parameters are on the meta device because they were offloaded to the cpu.
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:15:12.280998-0800 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:15:14.387496-0800 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-henry-cavill2603-87433-v1-uploader: Saving to /dev/shm/model_output...
chaiml-henry-cavill2603-87433-v1-uploader: 2026-03-03T16:15:14.414704-0800 | get_model_compressor | INFO - skip_sparsity_compression_stats set to True. Skipping sparsity compression statistic calculations. No sparsity compressor will be applied.
chaiml-henry-cavill2603-87433-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-henry-cavill2603-87433-v1-uploader: Pushing to ChaiML/Henry-Cavill260304000002_sft-FP8
chaiml-henry-cavill2603-87433-v1-uploader: Checking if ChaiML/Henry-Cavill260304000002_sft-FP8 already exists in ChaiML
chaiml-henry-cavill2603-87433-v1-uploader: Creating repo ChaiML/Henry-Cavill260304000002_sft-FP8 and uploading /dev/shm/model_output to it
chaiml-henry-cavill2603-87433-v1-uploader: ---------- 2026-03-03 16:16:03 (0:00:00) ----------
chaiml-henry-cavill2603-87433-v1-uploader: Files: hashed 6/13 (276.1K/24.9G) | pre-uploaded: 0/0 (0.0/24.9G) (+13 unsure) | committed: 0/13 (0.0/24.9G) | ignored: 0
chaiml-henry-cavill2603-87433-v1-uploader: Workers: hashing: 7 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 114
chaiml-henry-cavill2603-87433-v1-uploader: ---------------------------------------------------
chaiml-henry-cavill2603-87433-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-henry-cavill2603-87433-v1-uploader: ---------- 2026-03-03 16:17:03 (0:01:00) ----------
chaiml-henry-cavill2603-87433-v1-uploader: Files: hashed 13/13 (24.9G/24.9G) | pre-uploaded: 7/7 (24.9G/24.9G) | committed: 0/13 (0.0/24.9G) | ignored: 0
chaiml-henry-cavill2603-87433-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 0 | committing: 1 | waiting: 125
chaiml-henry-cavill2603-87433-v1-uploader: ---------------------------------------------------
chaiml-henry-cavill2603-87433-v1-uploader: Processed model ChaiML/Henry-Cavill260304000002_sft in 245.703s
chaiml-henry-cavill2603-87433-v1-uploader: creating bucket guanaco-vllm-models
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-henry-cavill2603-87433-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-henry-cavill2603-87433-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-henry-cavill2603-87433-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-henry-cavill2603-87433-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-henry-cavill2603-87433-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-henry-cavill2603-87433-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-henry-cavill2603-87433-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-henry-cavill2603-87433-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-henry-cavill2603-87433-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-henry-cavill2603-87433-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-henry-cavill2603-87433-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/config.json
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/special_tokens_map.json
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model.safetensors.index.json
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/recipe.yaml
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/tokenizer_config.json
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/generation_config.json
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/tokenizer.json
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model-00006-of-00006.safetensors s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model-00006-of-00006.safetensors
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model-00005-of-00006.safetensors s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model-00005-of-00006.safetensors
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model-00002-of-00006.safetensors s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model-00002-of-00006.safetensors
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model-00003-of-00006.safetensors s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model-00003-of-00006.safetensors
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model-00004-of-00006.safetensors s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model-00004-of-00006.safetensors
chaiml-henry-cavill2603-87433-v1-uploader: cp /dev/shm/model_output/model-00001-of-00006.safetensors s3://guanaco-vllm-models/chaiml-henry-cavill2603-87433-v1/default/model-00001-of-00006.safetensors
Job chaiml-henry-cavill2603-87433-v1-uploader completed after 275.69s with status: succeeded
Stopping job with name chaiml-henry-cavill2603-87433-v1-uploader
Pipeline stage VLLMUploader completed in 276.15s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.17s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-henry-cavill2603-87433-v1
Waiting for inference service chaiml-henry-cavill2603-87433-v1 to be ready
Inference service chaiml-henry-cavill2603-87433-v1 ready after 160.59821605682373s
Pipeline stage VLLMDeployer completed in 166.86s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.8183400630950928s
Received healthy response to inference request in 2.747407913208008s
Received healthy response to inference request in 2.71358060836792s
Received healthy response to inference request in 2.8535242080688477s
Received healthy response to inference request in 2.8117690086364746s
Received healthy response to inference request in 2.8439886569976807s
Received healthy response to inference request in 2.658498764038086s
Received healthy response to inference request in 2.7314765453338623s
Received healthy response to inference request in 2.678673028945923s
Received healthy response to inference request in 2.947359323501587s
Received healthy response to inference request in 2.679084062576294s
Received healthy response to inference request in 2.9106175899505615s
Received healthy response to inference request in 2.6808207035064697s
Received healthy response to inference request in 2.6569814682006836s
Received healthy response to inference request in 2.9126083850860596s
Received healthy response to inference request in 2.664288282394409s
Received healthy response to inference request in 2.992619037628174s
Received healthy response to inference request in 2.6732218265533447s
Received healthy response to inference request in 2.7786335945129395s
Received healthy response to inference request in 2.7511141300201416s
Received healthy response to inference request in 2.6707773208618164s
Received healthy response to inference request in 2.7106645107269287s
Received healthy response to inference request in 2.7416207790374756s
Received healthy response to inference request in 2.7956020832061768s
Received healthy response to inference request in 2.66906476020813s
Received healthy response to inference request in 3.0043554306030273s
Received healthy response to inference request in 2.7458152770996094s
Received healthy response to inference request in 2.760805130004883s
Received healthy response to inference request in 2.959873676300049s
admin requested tearing down of function_pilat_2026-03-01
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
Shutdown handler de-registered
function_pilat_2026-03-01 status is now torndown due to DeploymentManager action
Received healthy response to inference request in 2.6637470722198486s
30 requests
0 failed requests
5th percentile: 2.660860502719879
10th percentile: 2.664234161376953
20th percentile: 2.672732925415039
30th percentile: 2.680299711227417
40th percentile: 2.7243181705474853
50th percentile: 2.7466115951538086
60th percentile: 2.7679365158081053
70th percentile: 2.8137403249740602
80th percentile: 2.8649428844451905
90th percentile: 2.948610758781433
95th percentile: 2.9778836250305174
99th percentile: 3.00095187664032
mean time: 2.7742311080296833
Pipeline stage StressChecker completed in 88.61s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.02s
Shutdown handler de-registered
chaiml-henry-cavill2603_87433_v1 status is now deployed due to DeploymentManager action
chaiml-henry-cavill2603_87433_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-henry-cavill2603_87433_v1 status is now torndown due to DeploymentManager action