Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v3b-q27b-lr-29466-v1-uploader
Waiting for job on chaiml-pony-v3b-q27b-lr-29466-v1-uploader to finish
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Using quantization_mode: fp8
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Checking if ChaiML/pony-v3b-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Downloading snapshot of ChaiML/pony-v3b-q27b-lr5e6ep2g8...
2026-03-30T07:34:38.164852+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Downloaded in 50.389s
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Loading /tmp/model_input...
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Applying quantization...
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:10.860631+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:12.909074+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:12.912931+0000 | norm_calibration_context | INFO - Found 161 offset-norm modules to convert
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:12.921828+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:12.966994+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:12.967230+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:12.979479+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:19.653923+0000 | norm_calibration_context | INFO - Restoring 161 norm modules to offset convention
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:19.660787+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: 2026-03-30T07:35:19.660890+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: warnings.warn(
2026-03-30T07:35:38.304903+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Updating config in /dev/shm/model_output
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Pushing to ChaiML/pony-v3b-q27b-lr5e6ep2g8-FP8
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Checking if ChaiML/pony-v3b-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Creating repo ChaiML/pony-v3b-q27b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: - model.safetensors: 35.9GB
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: ---------- 2026-03-30 07:36:09 (0:00:00) ----------
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Files: hashed 5/7 (34.3K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: ---------------------------------------------------
2026-03-30T07:36:38.440283+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
chaiml-pony-v3b-q27b-lr-29466-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: ---------- 2026-03-30 07:37:09 (0:01:00) ----------
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: ---------------------------------------------------
2026-03-30T07:37:38.519681+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Processed model ChaiML/pony-v3b-q27b-lr5e6ep2g8 in 213.777s
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/tokenizer_config.json
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/recipe.yaml
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/generation_config.json
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/chat_template.jinja
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/config.json
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/tokenizer.json
2026-03-30T07:38:38.604673+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
chaiml-pony-v3b-q27b-lr-29466-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-v3b-q27b-lr-29466-v1/default/model.safetensors
Job chaiml-pony-v3b-q27b-lr-29466-v1-uploader completed after 316.83s with status: succeeded
Stopping job with name chaiml-pony-v3b-q27b-lr-29466-v1-uploader
Pipeline stage VLLMUploader completed in 317.29s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.04s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v3b-q27b-lr-29466-v1
Waiting for inference service chaiml-pony-v3b-q27b-lr-29466-v1 to be ready
2026-03-30T07:39:38.700214+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
2026-03-30T07:40:38.794980+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
2026-03-30T07:41:38.890666+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
Inference service chaiml-pony-v3b-q27b-lr-29466-v1 ready after 190.2277798652649s
Pipeline stage VLLMDeployer completed in 191.32s
run pipeline stage %s
Running pipeline stage StressChecker
Failed to get request counts for guanaco-submitter. Falling back to default
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-30T07:42:38.978940+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-30T07:43:39.089041+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 17.707504272460938s
Received healthy response to inference request in 4.3656861782073975s
Received healthy response to inference request in 1.8661737442016602s
Received healthy response to inference request in 2.258363962173462s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-30T07:44:39.178327+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.68421745300293s
Received healthy response to inference request in 1.4934170246124268s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.1899847984313965s
Received healthy response to inference request in 1.8958728313446045s
Received healthy response to inference request in 1.8946058750152588s
Received healthy response to inference request in 2.010617733001709s
Received healthy response to inference request in 1.9741523265838623s
2026-03-30T07:45:39.273562+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
Received healthy response to inference request in 2.0254790782928467s
Received healthy response to inference request in 2.1338140964508057s
Received healthy response to inference request in 2.020969867706299s
Received healthy response to inference request in 6.776648998260498s
Received healthy response to inference request in 2.27431321144104s
Received healthy response to inference request in 1.9736394882202148s
Received healthy response to inference request in 2.3186497688293457s
Received healthy response to inference request in 1.9633898735046387s
Received healthy response to inference request in 2.2094626426696777s
Received healthy response to inference request in 2.0073368549346924s
Received healthy response to inference request in 4.264887571334839s
30 requests
8 failed requests
5th percentile: 1.8789682030677795
10th percentile: 1.8957461357116698
20th percentile: 1.9740497589111328
30th percentile: 2.017864227294922
40th percentile: 2.179203224182129
50th percentile: 2.296481490135193
60th percentile: 4.49309868812561
70th percentile: 10.055905580520598
80th percentile: 20.113020753860475
90th percentile: 20.13238196372986
95th percentile: 20.18436415195465
99th percentile: 20.635075755119324
mean time: 7.902667109171549
%s, retrying in %s seconds...
Received healthy response to inference request in 2.289536237716675s
Received healthy response to inference request in 2.0161361694335938s
Received healthy response to inference request in 1.9351518154144287s
Received healthy response to inference request in 2.0175094604492188s
Received healthy response to inference request in 1.6771736145019531s
Received healthy response to inference request in 1.8559718132019043s
Received healthy response to inference request in 1.8192923069000244s
Received healthy response to inference request in 1.8281447887420654s
Received healthy response to inference request in 1.9075908660888672s
Received healthy response to inference request in 2.1694679260253906s
Received healthy response to inference request in 1.8718476295471191s
Received healthy response to inference request in 1.8267741203308105s
Received healthy response to inference request in 1.8010106086730957s
Received healthy response to inference request in 1.9735589027404785s
Received healthy response to inference request in 1.9863147735595703s
2026-03-30T07:46:39.373200+00:00 monitor updated for chaiml-pony-v3b-q27b-lr_29466_v1
Received healthy response to inference request in 1.8614170551300049s
Received healthy response to inference request in 1.707045316696167s
Received healthy response to inference request in 1.8944618701934814s
Received healthy response to inference request in 2.186121940612793s
Received healthy response to inference request in 2.615434169769287s
Received healthy response to inference request in 2.0123372077941895s
Received healthy response to inference request in 2.200591802597046s
Received healthy response to inference request in 2.0698423385620117s
Received healthy response to inference request in 1.9978630542755127s
Received healthy response to inference request in 1.9629027843475342s
Received healthy response to inference request in 2.105292797088623s
Received healthy response to inference request in 1.8889038562774658s
Received healthy response to inference request in 2.0318551063537598s
Received healthy response to inference request in 2.362511396408081s
Received healthy response to inference request in 2.3612327575683594s
30 requests
0 failed requests
5th percentile: 1.7493296980857849
10th percentile: 1.8174641370773315
20th percentile: 1.8504064083099365
30th percentile: 1.8837869882583618
40th percentile: 1.9241274356842042
50th percentile: 1.9799368381500244
60th percentile: 2.013856792449951
70th percentile: 2.0432512760162354
80th percentile: 2.172798728942871
90th percentile: 2.2967058897018435
95th percentile: 2.3619360089302064
99th percentile: 2.5420865654945377
mean time: 2.007776482899984
Pipeline stage StressChecker completed in 302.67s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.83s
Shutdown handler de-registered
chaiml-pony-v3b-q27b-lr_29466_v1 status is now deployed due to DeploymentManager action
chaiml-pony-v3b-q27b-lr_29466_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-v3b-q27b-lr_29466_v1 status is now torndown due to DeploymentManager action