Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v3-q27b-lr5-22882-v1-uploader
Waiting for job on chaiml-pony-v3-q27b-lr5-22882-v1-uploader to finish
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Using quantization_mode: fp8
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Checking if ChaiML/pony-v3-q27b-lr5e6ep1g8-FP8 already exists in ChaiML
2026-03-28T07:07:12.552262+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Downloaded in 53.996s
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Loading /tmp/model_input...
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Applying quantization...
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:07:49.501100+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:07:51.696480+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:07:51.698560+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:07:51.775190+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:07:51.775660+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:07:51.795840+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:08:00.307068+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: 2026-03-28T07:08:00.307265+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: warnings.warn(
2026-03-28T07:08:12.636136+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Pushing to ChaiML/pony-v3-q27b-lr5e6ep1g8-FP8
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Checking if ChaiML/pony-v3-q27b-lr5e6ep1g8-FP8 already exists in ChaiML
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Creating repo ChaiML/pony-v3-q27b-lr5e6ep1g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: - model.safetensors: 35.9GB
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: ---------- 2026-03-28 07:08:50 (0:00:00) ----------
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Files: hashed 5/7 (34.1K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: ---------------------------------------------------
2026-03-28T07:09:12.730452+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
chaiml-pony-v3-q27b-lr5-22882-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: ---------- 2026-03-28 07:09:50 (0:01:00) ----------
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: ---------------------------------------------------
2026-03-28T07:10:12.823896+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Processed model ChaiML/pony-v3-q27b-lr5e6ep1g8 in 219.796s
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/chat_template.jinja
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/tokenizer_config.json
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/recipe.yaml
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/config.json
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/generation_config.json
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/tokenizer.json
2026-03-28T07:11:12.922615+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
chaiml-pony-v3-q27b-lr5-22882-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v1/default/model.safetensors
Job chaiml-pony-v3-q27b-lr5-22882-v1-uploader completed after 326.59s with status: succeeded
Stopping job with name chaiml-pony-v3-q27b-lr5-22882-v1-uploader
Pipeline stage VLLMUploader completed in 327.14s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.24s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v3-q27b-lr5-22882-v1
Waiting for inference service chaiml-pony-v3-q27b-lr5-22882-v1 to be ready
2026-03-28T07:12:13.106383+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
2026-03-28T07:13:13.205293+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
2026-03-28T07:14:13.309654+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
Inference service chaiml-pony-v3-q27b-lr5-22882-v1 ready after 171.08755922317505s
Pipeline stage VLLMDeployer completed in 171.53s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:15:13.410185+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:16:13.542937+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.178226947784424s
Received healthy response to inference request in 4.6629579067230225s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 11.171021223068237s
Received healthy response to inference request in 1.850806474685669s
Received healthy response to inference request in 2.161139726638794s
2026-03-28T07:17:13.664744+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 2.3094890117645264s
Received healthy response to inference request in 2.0075268745422363s
Received healthy response to inference request in 1.9215033054351807s
Received healthy response to inference request in 2.1094393730163574s
Received healthy response to inference request in 1.7001101970672607s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9456846714019775s
2026-03-28T07:18:13.767115+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.9588720798492432s
Received healthy response to inference request in 2.151106119155884s
Received healthy response to inference request in 2.499800205230713s
Received healthy response to inference request in 2.5145890712738037s
Received healthy response to inference request in 2.0085668563842773s
Received healthy response to inference request in 1.973271131515503s
Received healthy response to inference request in 2.2074670791625977s
Received healthy response to inference request in 2.1185011863708496s
Received healthy response to inference request in 2.0889346599578857s
30 requests
10 failed requests
5th percentile: 1.8826200485229492
10th percentile: 1.9432665348052978
20th percentile: 2.0006757259368895
30th percentile: 2.103287959098816
40th percentile: 2.15712628364563
50th percentile: 2.4046446084976196
60th percentile: 4.372119331359863
70th percentile: 20.132980632781983
80th percentile: 20.15094017982483
90th percentile: 20.166315460205077
95th percentile: 20.192514407634736
99th percentile: 20.214320528507233
mean time: 8.572175812721252
%s, retrying in %s seconds...
Received healthy response to inference request in 1.88643217086792s
Received healthy response to inference request in 1.7543985843658447s
Received healthy response to inference request in 1.789351463317871s
Received healthy response to inference request in 2.1844980716705322s
Received healthy response to inference request in 1.763411521911621s
Received healthy response to inference request in 1.9638729095458984s
Received healthy response to inference request in 1.8624792098999023s
Received healthy response to inference request in 1.9645190238952637s
Received healthy response to inference request in 2.3753082752227783s
2026-03-28T07:19:13.877215+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v1
Received healthy response to inference request in 1.9894111156463623s
Received healthy response to inference request in 1.9245193004608154s
Received healthy response to inference request in 1.9865026473999023s
Received healthy response to inference request in 1.909452199935913s
Received healthy response to inference request in 1.9433207511901855s
Received healthy response to inference request in 1.6397740840911865s
Received healthy response to inference request in 2.034402847290039s
Received healthy response to inference request in 1.8771934509277344s
Received healthy response to inference request in 1.9179236888885498s
Received healthy response to inference request in 2.4759209156036377s
Received healthy response to inference request in 2.008972406387329s
Received healthy response to inference request in 2.1348464488983154s
Received healthy response to inference request in 2.0430808067321777s
Received healthy response to inference request in 2.108795642852783s
Received healthy response to inference request in 1.870781660079956s
Received healthy response to inference request in 2.064337968826294s
Received healthy response to inference request in 1.9801921844482422s
Received healthy response to inference request in 2.0029819011688232s
Received healthy response to inference request in 2.1810336112976074s
Received healthy response to inference request in 2.176537275314331s
Received healthy response to inference request in 2.003074884414673s
30 requests
0 failed requests
5th percentile: 1.758454406261444
10th percentile: 1.786757469177246
20th percentile: 1.8759110927581788
30th percentile: 1.9153822422027589
40th percentile: 1.9556520462036133
50th percentile: 1.9833474159240723
60th percentile: 2.003019094467163
70th percentile: 2.0370062351226808
80th percentile: 2.1140058040618896
90th percentile: 2.1813800573349
95th percentile: 2.289443683624267
99th percentile: 2.4467432498931885
mean time: 1.9939109007517497
Pipeline stage StressChecker completed in 322.58s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 1.31s
Shutdown handler de-registered
chaiml-pony-v3-q27b-lr5_22882_v1 status is now deployed due to DeploymentManager action
chaiml-pony-v3-q27b-lr5_22882_v1 status is now inactive due to auto deactivation removed underperforming models