Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v3-q27b-lr5-49140-v1-uploader
Waiting for job on chaiml-pony-v3-q27b-lr5-49140-v1-uploader to finish
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Using quantization_mode: fp8
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Checking if ChaiML/pony-v3-q27b-lr5e6ep2g8-30k-FP8 already exists in ChaiML
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Downloading snapshot of ChaiML/pony-v3-q27b-lr5e6ep2g8-30k...
2026-03-30T07:34:27.254479+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Downloaded in 47.001s
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Loading /tmp/model_input...
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Applying quantization...
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:34:59.979716+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:35:02.304790+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:35:02.308694+0000 | norm_calibration_context | INFO - Found 161 offset-norm modules to convert
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:35:02.318506+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:35:02.366466+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:35:02.366705+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: 2026-03-30T07:35:02.380144+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: warnings.warn(
2026-03-30T07:35:27.356474+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Updating config in /dev/shm/model_output
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Pushing to ChaiML/pony-v3-q27b-lr5e6ep2g8-30k-FP8
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Checking if ChaiML/pony-v3-q27b-lr5e6ep2g8-30k-FP8 already exists in ChaiML
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Creating repo ChaiML/pony-v3-q27b-lr5e6ep2g8-30k-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: - model.safetensors: 35.9GB
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: ---------- 2026-03-30 07:35:59 (0:00:00) ----------
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Files: hashed 5/7 (34.3K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: ---------------------------------------------------
2026-03-30T07:36:27.477570+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
chaiml-pony-v3-q27b-lr5-49140-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: ---------- 2026-03-30 07:36:59 (0:01:00) ----------
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: ---------------------------------------------------
2026-03-30T07:37:27.570668+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Processed model ChaiML/pony-v3-q27b-lr5e6ep2g8-30k in 212.579s
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/chat_template.jinja
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/recipe.yaml
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/generation_config.json
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/tokenizer_config.json
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/config.json
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/tokenizer.json
2026-03-30T07:38:27.800919+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
chaiml-pony-v3-q27b-lr5-49140-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-49140-v1/default/model.safetensors
Job chaiml-pony-v3-q27b-lr5-49140-v1-uploader completed after 349.17s with status: succeeded
Stopping job with name chaiml-pony-v3-q27b-lr5-49140-v1-uploader
Pipeline stage VLLMUploader completed in 349.61s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.09s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.71s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v3-q27b-lr5-49140-v1
Waiting for inference service chaiml-pony-v3-q27b-lr5-49140-v1 to be ready
2026-03-30T07:39:27.891408+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
2026-03-30T07:40:27.989924+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
2026-03-30T07:41:33.424371+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
Inference service chaiml-pony-v3-q27b-lr5-49140-v1 ready after 180.54615950584412s
Pipeline stage VLLMDeployer completed in 181.12s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-30T07:42:33.953040+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
{"detail":"('http://chaiml-pony-v3-q27b-lr5-49140-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'request timeout')"}
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-30T07:43:34.057203+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.397862195968628s
Received healthy response to inference request in 2.385892629623413s
Received healthy response to inference request in 4.219558954238892s
2026-03-30T07:44:34.253465+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.0974204540252686s
Received healthy response to inference request in 2.1753344535827637s
2026-03-30T07:45:34.351461+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 6.080609560012817s
Received healthy response to inference request in 2.117716073989868s
Received healthy response to inference request in 2.0093204975128174s
Received healthy response to inference request in 1.918494462966919s
Received healthy response to inference request in 1.916581630706787s
Received healthy response to inference request in 2.658316135406494s
Received healthy response to inference request in 2.1482794284820557s
Received healthy response to inference request in 2.151067018508911s
Received healthy response to inference request in 3.054673433303833s
Failed to get request counts for guanaco-submitter. Falling back to default
Received healthy response to inference request in 2.190035820007324s
Received healthy response to inference request in 2.10196852684021s
Received healthy response to inference request in 2.1961169242858887s
Received healthy response to inference request in 2.5134646892547607s
Received healthy response to inference request in 2.4371087551116943s
Received healthy response to inference request in 2.0376641750335693s
Received healthy response to inference request in 2.52728009223938s
30 requests
9 failed requests
5th percentile: 1.9593661785125733
10th percentile: 2.0348298072814943
20th percentile: 2.142166757583618
30th percentile: 2.185625410079956
40th percentile: 2.4166223049163817
50th percentile: 2.592798113822937
60th percentile: 4.146275854110717
70th percentile: 10.287069606780966
80th percentile: 20.12474536895752
90th percentile: 20.139711117744447
95th percentile: 20.147939765453337
99th percentile: 20.464857850074768
mean time: 7.965722012519836
%s, retrying in %s seconds...
Received healthy response to inference request in 1.8243868350982666s
Received healthy response to inference request in 2.210336685180664s
Received healthy response to inference request in 1.8570268154144287s
Received healthy response to inference request in 1.7898728847503662s
Received healthy response to inference request in 2.58723521232605s
2026-03-30T07:46:34.454939+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_49140_v1
Received healthy response to inference request in 1.900228500366211s
Received healthy response to inference request in 1.820683479309082s
Received healthy response to inference request in 1.8286685943603516s
Received healthy response to inference request in 1.8310301303863525s
Received healthy response to inference request in 2.0676724910736084s
Received healthy response to inference request in 1.8811376094818115s
Received healthy response to inference request in 2.281165361404419s
Received healthy response to inference request in 2.0623810291290283s
Received healthy response to inference request in 1.9007573127746582s
Received healthy response to inference request in 1.9185290336608887s
Received healthy response to inference request in 2.1001241207122803s
Received healthy response to inference request in 2.0465760231018066s
Received healthy response to inference request in 1.996434211730957s
Received healthy response to inference request in 1.9103567600250244s
Received healthy response to inference request in 2.0111300945281982s
Received healthy response to inference request in 1.9977989196777344s
Received healthy response to inference request in 2.3575923442840576s
Received healthy response to inference request in 2.2021961212158203s
Received healthy response to inference request in 2.340047836303711s
Received healthy response to inference request in 1.98659348487854s
Received healthy response to inference request in 1.9654273986816406s
Received healthy response to inference request in 1.9840459823608398s
Received healthy response to inference request in 2.0703964233398438s
Received healthy response to inference request in 2.626730442047119s
Received healthy response to inference request in 2.005302667617798s
30 requests
0 failed requests
5th percentile: 1.822349989414215
10th percentile: 1.828240418434143
20th percentile: 1.876315450668335
30th percentile: 1.9074769258499145
40th percentile: 1.9765985488891602
50th percentile: 1.9971165657043457
60th percentile: 2.0253084659576417
70th percentile: 2.068489670753479
80th percentile: 2.2038242340087892
90th percentile: 2.3418022871017454
95th percentile: 2.483895921707153
99th percentile: 2.615276825428009
mean time: 2.0453954935073853
Pipeline stage StressChecker completed in 308.25s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 2.06s
Shutdown handler de-registered
chaiml-pony-v3-q27b-lr5_49140_v1 status is now deployed due to DeploymentManager action
chaiml-pony-v3-q27b-lr5_49140_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-v3-q27b-lr5_49140_v1 status is now torndown due to DeploymentManager action