Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v3a-q27b-lr-21575-v2-uploader
Waiting for job on chaiml-pony-v3a-q27b-lr-21575-v2-uploader to finish
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Using quantization_mode: fp8
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Checking if ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Downloading snapshot of ChaiML/pony-v3a-q27b-lr5e6ep2g8...
2026-03-28T13:08:38.362710+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Downloaded in 61.192s
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Loading /tmp/model_input...
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Applying quantization...
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:07.502449+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:09.536473+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:09.538734+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:09.587077+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:09.587353+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:09.600900+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:16.647502+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: 2026-03-28T13:09:16.647722+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Saving to /dev/shm/model_output...
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: warnings.warn(
2026-03-28T13:09:38.453117+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Pushing to ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Checking if ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Processed model ChaiML/pony-v3a-q27b-lr5e6ep2g8 in 124.677s
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: creating bucket guanaco-vllm-models
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/chat_template.jinja
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/recipe.yaml
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/config.json
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/tokenizer_config.json
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/generation_config.json
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/tokenizer.json
2026-03-28T13:10:38.548451+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
chaiml-pony-v3a-q27b-lr-21575-v2-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v2/default/model.safetensors
Job chaiml-pony-v3a-q27b-lr-21575-v2-uploader completed after 224.54s with status: succeeded
Stopping job with name chaiml-pony-v3a-q27b-lr-21575-v2-uploader
Pipeline stage VLLMUploader completed in 224.97s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.39s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v3a-q27b-lr-21575-v2
Waiting for inference service chaiml-pony-v3a-q27b-lr-21575-v2 to be ready
2026-03-28T13:11:38.640530+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
2026-03-28T13:12:38.750131+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
2026-03-28T13:13:38.843571+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
Inference service chaiml-pony-v3a-q27b-lr-21575-v2 ready after 140.24989461898804s
Pipeline stage VLLMDeployer completed in 140.66s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T13:14:38.961696+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T13:15:39.051116+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
Received healthy response to inference request in 13.090837717056274s
Received healthy response to inference request in 4.811469078063965s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9471909999847412s
Received healthy response to inference request in 13.14875602722168s
2026-03-28T13:16:39.144803+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.001145124435425s
Received healthy response to inference request in 2.7235708236694336s
Received healthy response to inference request in 1.792309284210205s
Received healthy response to inference request in 2.047187328338623s
Received healthy response to inference request in 2.02731990814209s
Received healthy response to inference request in 1.8322083950042725s
Received healthy response to inference request in 2.117239236831665s
Received healthy response to inference request in 2.2877285480499268s
Received healthy response to inference request in 2.2455334663391113s
Received healthy response to inference request in 1.979724645614624s
Received healthy response to inference request in 1.8790991306304932s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9972589015960693s
Received healthy response to inference request in 2.0483953952789307s
Received healthy response to inference request in 2.0192298889160156s
Received healthy response to inference request in 4.103867292404175s
Received healthy response to inference request in 1.9851629734039307s
2026-03-28T13:17:39.265423+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
Received healthy response to inference request in 2.135166645050049s
Received healthy response to inference request in 2.3235225677490234s
30 requests
8 failed requests
5th percentile: 1.8533092260360717
10th percentile: 1.9403818130493165
20th percentile: 1.9948397159576416
30th percentile: 2.041227102279663
40th percentile: 2.1279956817626955
50th percentile: 2.305625557899475
60th percentile: 4.042233991622925
70th percentile: 13.108213210105896
80th percentile: 20.114809036254883
90th percentile: 20.13327000141144
95th percentile: 20.137815153598787
99th percentile: 20.149021022319793
mean time: 7.85271201133728
%s, retrying in %s seconds...
Received healthy response to inference request in 1.6692891120910645s
Received healthy response to inference request in 1.8219764232635498s
Received healthy response to inference request in 1.7257671356201172s
Received healthy response to inference request in 2.0601000785827637s
Received healthy response to inference request in 3.0830636024475098s
Received healthy response to inference request in 1.7941350936889648s
Received healthy response to inference request in 1.726379156112671s
Received healthy response to inference request in 1.83774733543396s
Received healthy response to inference request in 2.0224146842956543s
Received healthy response to inference request in 2.112178087234497s
Received healthy response to inference request in 2.411843776702881s
Received healthy response to inference request in 1.7546124458312988s
Received healthy response to inference request in 2.2850887775421143s
Received healthy response to inference request in 1.9410946369171143s
Received healthy response to inference request in 1.9120311737060547s
Received healthy response to inference request in 1.8673198223114014s
Received healthy response to inference request in 1.8663768768310547s
Received healthy response to inference request in 1.7555732727050781s
Received healthy response to inference request in 1.9014670848846436s
Received healthy response to inference request in 1.9320576190948486s
Received healthy response to inference request in 2.1716043949127197s
Received healthy response to inference request in 1.9914658069610596s
Received healthy response to inference request in 1.8567636013031006s
Received healthy response to inference request in 2.2537269592285156s
Received healthy response to inference request in 2.16109561920166s
Received healthy response to inference request in 2.166689157485962s
2026-03-28T13:18:39.369856+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v2
Received healthy response to inference request in 2.040570020675659s
Received healthy response to inference request in 1.8573405742645264s
Received healthy response to inference request in 1.985856056213379s
Received healthy response to inference request in 2.172727584838867s
30 requests
0 failed requests
5th percentile: 1.7260425448417664
10th percentile: 1.751789116859436
20th percentile: 1.8164081573486328
30th percentile: 1.8571674823760986
40th percentile: 1.8878081798553468
50th percentile: 1.9365761280059814
60th percentile: 2.0038453578948974
70th percentile: 2.0757234811782834
80th percentile: 2.1676722049713133
90th percentile: 2.2568631410598754
95th percentile: 2.3548040270805357
99th percentile: 2.888409852981568
mean time: 2.004611865679423
Pipeline stage StressChecker completed in 301.25s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.75s
Shutdown handler de-registered
chaiml-pony-v3a-q27b-lr_21575_v2 status is now deployed due to DeploymentManager action
chaiml-pony-v3a-q27b-lr_21575_v2 status is now inactive due to admin request
chaiml-pony-v3a-q27b-lr_21575_v2 status is now torndown due to DeploymentManager action