Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-leon-1-27-26-mel-37100-v1-uploader
Waiting for job on chaiml-leon-1-27-26-mel-37100-v1-uploader to finish
chaiml-leon-1-27-26-mel-37100-v1-uploader: Using quantization_mode: fp8
chaiml-leon-1-27-26-mel-37100-v1-uploader: Checking if ChaiML/Leon-1-27-26_Melkor_Melkor_Melkor_Melk260223201356_sft-FP8 already exists in ChaiML
chaiml-leon-1-27-26-mel-37100-v1-uploader: Downloading snapshot of ChaiML/Leon-1-27-26_Melkor_Melkor_Melkor_Melk260223201356_sft...
chaiml-leon-1-27-26-mel-37100-v1-uploader: Downloaded in 140.592s
chaiml-leon-1-27-26-mel-37100-v1-uploader: Loading /tmp/model_input...
chaiml-leon-1-27-26-mel-37100-v1-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-leon-1-27-26-mel-37100-v1-uploader: `torch_dtype` is deprecated! Use `dtype` instead!
chaiml-leon-1-27-26-mel-37100-v1-uploader: Some parameters are on the meta device because they were offloaded to the cpu.
chaiml-leon-1-27-26-mel-37100-v1-uploader: Applying quantization...
chaiml-leon-1-27-26-mel-37100-v1-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:21.203729-0800 | reset | INFO - Compression lifecycle reset
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:21.204625-0800 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:21.272097-0800 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:21.272357-0800 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-leon-1-27-26-mel-37100-v1-uploader: Some parameters are on the meta device because they were offloaded to the cpu.
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:50.882425-0800 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:52.992629-0800 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-leon-1-27-26-mel-37100-v1-uploader: Saving to /dev/shm/model_output...
chaiml-leon-1-27-26-mel-37100-v1-uploader: 2026-02-23T12:31:53.019335-0800 | get_model_compressor | INFO - skip_sparsity_compression_stats set to True. Skipping sparsity compression statistic calculations. No sparsity compressor will be applied.
chaiml-leon-1-27-26-mel-37100-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-leon-1-27-26-mel-37100-v1-uploader: Pushing to ChaiML/Leon-1-27-26_Melkor_Melkor_Melkor_Melk260223201356_sft-FP8
chaiml-leon-1-27-26-mel-37100-v1-uploader: Checking if ChaiML/Leon-1-27-26_Melkor_Melkor_Melkor_Melk260223201356_sft-FP8 already exists in ChaiML
chaiml-leon-1-27-26-mel-37100-v1-uploader: Creating repo ChaiML/Leon-1-27-26_Melkor_Melkor_Melkor_Melk260223201356_sft-FP8 and uploading /dev/shm/model_output to it
chaiml-leon-1-27-26-mel-37100-v1-uploader: ---------- 2026-02-23 12:32:39 (0:00:00) ----------
chaiml-leon-1-27-26-mel-37100-v1-uploader: Files: hashed 4/13 (274.2K/24.9G) | pre-uploaded: 0/0 (0.0/24.9G) (+13 unsure) | committed: 0/13 (0.0/24.9G) | ignored: 0
chaiml-leon-1-27-26-mel-37100-v1-uploader: Workers: hashing: 13 | get upload mode: 0 | pre-uploading: 0 | committing: 0 | waiting: 113
chaiml-leon-1-27-26-mel-37100-v1-uploader: ---------------------------------------------------
chaiml-leon-1-27-26-mel-37100-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-leon-1-27-26-mel-37100-v1-uploader: ---------- 2026-02-23 12:33:39 (0:01:00) ----------
chaiml-leon-1-27-26-mel-37100-v1-uploader: Files: hashed 13/13 (24.9G/24.9G) | pre-uploaded: 7/7 (24.9G/24.9G) | committed: 0/13 (0.0/24.9G) | ignored: 0
chaiml-leon-1-27-26-mel-37100-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 0 | committing: 1 | waiting: 125
chaiml-leon-1-27-26-mel-37100-v1-uploader: ---------------------------------------------------
chaiml-leon-1-27-26-mel-37100-v1-uploader: Processed model ChaiML/Leon-1-27-26_Melkor_Melkor_Melkor_Melk260223201356_sft in 310.914s
chaiml-leon-1-27-26-mel-37100-v1-uploader: creating bucket guanaco-vllm-models
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-leon-1-27-26-mel-37100-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-leon-1-27-26-mel-37100-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-leon-1-27-26-mel-37100-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-leon-1-27-26-mel-37100-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-leon-1-27-26-mel-37100-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-leon-1-27-26-mel-37100-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-leon-1-27-26-mel-37100-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-leon-1-27-26-mel-37100-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-leon-1-27-26-mel-37100-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-leon-1-27-26-mel-37100-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-leon-1-27-26-mel-37100-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/recipe.yaml
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model.safetensors.index.json
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/generation_config.json
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/config.json
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/tokenizer_config.json
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/special_tokens_map.json
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/tokenizer.json
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model-00006-of-00006.safetensors s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model-00006-of-00006.safetensors
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model-00005-of-00006.safetensors s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model-00005-of-00006.safetensors
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model-00002-of-00006.safetensors s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model-00002-of-00006.safetensors
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model-00001-of-00006.safetensors s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model-00001-of-00006.safetensors
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model-00003-of-00006.safetensors s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model-00003-of-00006.safetensors
chaiml-leon-1-27-26-mel-37100-v1-uploader: cp /dev/shm/model_output/model-00004-of-00006.safetensors s3://guanaco-vllm-models/chaiml-leon-1-27-26-mel-37100-v1/default/model-00004-of-00006.safetensors
Job chaiml-leon-1-27-26-mel-37100-v1-uploader completed after 370.83s with status: succeeded
Stopping job with name chaiml-leon-1-27-26-mel-37100-v1-uploader
Pipeline stage VLLMUploader completed in 371.33s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-leon-1-27-26-mel-37100-v1
Waiting for inference service chaiml-leon-1-27-26-mel-37100-v1 to be ready
Inference service chaiml-leon-1-27-26-mel-37100-v1 ready after 161.12732696533203s
Pipeline stage VLLMDeployer completed in 161.68s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.3692235946655273s
Received healthy response to inference request in 1.4465904235839844s
Received healthy response to inference request in 2.195775032043457s
Received healthy response to inference request in 1.4705448150634766s
Received healthy response to inference request in 1.720796823501587s
Received healthy response to inference request in 1.7671630382537842s
Received healthy response to inference request in 1.3608779907226562s
Received healthy response to inference request in 2.001936674118042s
Received healthy response to inference request in 2.0168871879577637s
Received healthy response to inference request in 1.5195090770721436s
Received healthy response to inference request in 1.5239827632904053s
Received healthy response to inference request in 1.4212064743041992s
HTTP Request: %s %s "%s %d %s"
Received healthy response to inference request in 1.626157522201538s
Received healthy response to inference request in 1.3472528457641602s
Received healthy response to inference request in 1.3980450630187988s
Received healthy response to inference request in 2.1222939491271973s
Received healthy response to inference request in 1.3833460807800293s
Received healthy response to inference request in 1.6489520072937012s
Received healthy response to inference request in 1.3491671085357666s
Received healthy response to inference request in 1.543553352355957s
Received healthy response to inference request in 1.398606777191162s
Received healthy response to inference request in 1.5523619651794434s
Received healthy response to inference request in 1.4235656261444092s
Received healthy response to inference request in 1.3223581314086914s
Received healthy response to inference request in 1.4145748615264893s
Received healthy response to inference request in 1.3838651180267334s
Received healthy response to inference request in 1.330171823501587s
Received healthy response to inference request in 1.5514423847198486s
Received healthy response to inference request in 1.468127727508545s
Received healthy response to inference request in 1.3792126178741455s
30 requests
0 failed requests
5th percentile: 1.3378582835197448
10th percentile: 1.3489756822586059
20th percentile: 1.3772148132324218
30th percentile: 1.3937910795211792
40th percentile: 1.4185538291931152
50th percentile: 1.4573590755462646
60th percentile: 1.5212985515594482
70th percentile: 1.551718258857727
80th percentile: 1.6633209705352785
90th percentile: 2.003431725502014
95th percentile: 2.074860906600952
99th percentile: 2.1744655179977417
mean time: 1.5485849618911742
Pipeline stage StressChecker completed in 50.51s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.78s
Shutdown handler de-registered
chaiml-leon-1-27-26-mel_37100_v1 status is now deployed due to DeploymentManager action