Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-98p-2ff-chaiml-m-26775-v2-uploader
Waiting for job on chaiml-98p-2ff-chaiml-m-26775-v2-uploader to finish
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Using quantization_mode: fp8
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Checking if ChaiML/98p_2ff_chaiml_mistral_24b_2048_1_404_v1_cp624_merged-FP8 already exists in ChaiML
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Downloading snapshot of ChaiML/98p_2ff_chaiml_mistral_24b_2048_1_404_v1_cp624_merged...
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Downloaded in 47.593s
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Loading /tmp/model_input...
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: `torch_dtype` is deprecated! Use `dtype` instead!
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Some parameters are on the meta device because they were offloaded to the cpu.
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Applying quantization...
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:27:42.075127-0800 | reset | INFO - Compression lifecycle reset
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:27:42.076080-0800 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:27:42.165708-0800 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:27:42.166022-0800 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Some parameters are on the meta device because they were offloaded to the cpu.
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:28:24.002547-0800 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:28:26.198186-0800 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Saving to /dev/shm/model_output...
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: 2026-02-19T17:28:26.225344-0800 | get_model_compressor | INFO - skip_sparsity_compression_stats set to True. Skipping sparsity compression statistic calculations. No sparsity compressor will be applied.
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Pushing to ChaiML/98p_2ff_chaiml_mistral_24b_2048_1_404_v1_cp624_merged-FP8
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Checking if ChaiML/98p_2ff_chaiml_mistral_24b_2048_1_404_v1_cp624_merged-FP8 already exists in ChaiML
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Creating repo ChaiML/98p_2ff_chaiml_mistral_24b_2048_1_404_v1_cp624_merged-FP8 and uploading /dev/shm/model_output to it
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: ---------- 2026-02-19 17:29:19 (0:00:00) ----------
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Files: hashed 6/13 (276.0K/27.6G) | pre-uploaded: 0/0 (0.0/27.6G) (+13 unsure) | committed: 0/13 (0.0/27.6G) | ignored: 0
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Workers: hashing: 7 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 114
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: ---------------------------------------------------
chaiml-98p-2ff-chaiml-m-26775-v2-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: ---------- 2026-02-19 17:30:19 (0:01:00) ----------
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Files: hashed 13/13 (27.6G/27.6G) | pre-uploaded: 7/7 (27.6G/27.6G) | committed: 0/13 (0.0/27.6G) | ignored: 0
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 0 | committing: 1 | waiting: 125
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: ---------------------------------------------------
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Processed model ChaiML/98p_2ff_chaiml_mistral_24b_2048_1_404_v1_cp624_merged in 226.389s
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: creating bucket guanaco-vllm-models
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/config.json
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/special_tokens_map.json
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/model.safetensors.index.json
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/tokenizer_config.json
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/recipe.yaml
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/generation_config.json
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/tokenizer.json
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/model-00006-of-00006.safetensors s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/model-00006-of-00006.safetensors
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/model-00005-of-00006.safetensors s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/model-00005-of-00006.safetensors
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/model-00003-of-00006.safetensors s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/model-00003-of-00006.safetensors
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/model-00004-of-00006.safetensors s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/model-00004-of-00006.safetensors
chaiml-98p-2ff-chaiml-m-26775-v2-uploader: cp /dev/shm/model_output/model-00002-of-00006.safetensors s3://guanaco-vllm-models/chaiml-98p-2ff-chaiml-m-26775-v2/default/model-00002-of-00006.safetensors
Job chaiml-98p-2ff-chaiml-m-26775-v2-uploader completed after 377.07s with status: succeeded
Stopping job with name chaiml-98p-2ff-chaiml-m-26775-v2-uploader
Pipeline stage VLLMUploader completed in 377.53s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.17s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-98p-2ff-chaiml-m-26775-v2
Waiting for inference service chaiml-98p-2ff-chaiml-m-26775-v2 to be ready
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Inference service chaiml-98p-2ff-chaiml-m-26775-v2 ready after 1045.642592906952s
Pipeline stage VLLMDeployer completed in 1046.22s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.9914591312408447s
Received healthy response to inference request in 1.655851125717163s
Received healthy response to inference request in 1.435760736465454s
Received healthy response to inference request in 1.384669303894043s
Received healthy response to inference request in 1.3693182468414307s
Received healthy response to inference request in 1.3993592262268066s
Received healthy response to inference request in 1.3900372982025146s
Received healthy response to inference request in 1.4599933624267578s
Received healthy response to inference request in 1.4944305419921875s
Received healthy response to inference request in 1.5826990604400635s
Received healthy response to inference request in 1.445671796798706s
Received healthy response to inference request in 1.483170509338379s
Received healthy response to inference request in 1.389512538909912s
Received healthy response to inference request in 1.4414825439453125s
Received healthy response to inference request in 1.3616392612457275s
Received healthy response to inference request in 1.4788947105407715s
Received healthy response to inference request in 1.50014328956604s
Received healthy response to inference request in 1.417872428894043s
Received healthy response to inference request in 1.4319391250610352s
Received healthy response to inference request in 1.4849879741668701s
Received healthy response to inference request in 1.3591134548187256s
Received healthy response to inference request in 1.4778776168823242s
Received healthy response to inference request in 1.537707805633545s
Received healthy response to inference request in 1.376356601715088s
Received healthy response to inference request in 1.3834195137023926s
Received healthy response to inference request in 1.424231767654419s
Received healthy response to inference request in 1.5405137538909912s
Received healthy response to inference request in 1.390059471130371s
Received healthy response to inference request in 1.4123122692108154s
Received healthy response to inference request in 1.3710308074951172s
30 requests
0 failed requests
5th percentile: 1.3650948047637939
10th percentile: 1.3708595514297486
20th percentile: 1.3844193458557128
30th percentile: 1.390052819252014
40th percentile: 1.4156483650207519
50th percentile: 1.4338499307632446
60th percentile: 1.4514004230499267
70th percentile: 1.4801774501800538
80th percentile: 1.495573091506958
90th percentile: 1.5447322845458984
95th percentile: 1.622932696342468
99th percentile: 1.8941328096389773
mean time: 1.4623838424682618
Pipeline stage StressChecker completed in 49.29s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.64s
Shutdown handler de-registered
chaiml-98p-2ff-chaiml-m_26775_v2 status is now deployed due to DeploymentManager action
chaiml-98p-2ff-chaiml-m_26775_v2 status is now inactive due to auto deactivation removed underperforming models