Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-qwen-bobo-19k-re-33926-v2-uploader
Waiting for job on chaiml-qwen-bobo-19k-re-33926-v2-uploader to finish
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Using quantization_mode: fp8
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Checking if ChaiML/qwen_bobo_19k_reward_dpo_10k_250_higher_dpo_2_merged-FP8 already exists in ChaiML
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Downloading snapshot of ChaiML/qwen_bobo_19k_reward_dpo_10k_250_higher_dpo_2_merged...
2026-03-25T20:43:29.621001+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Downloaded in 48.172s
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Loading /tmp/model_input...
chaiml-qwen-bobo-19k-re-33926-v2-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Applying quantization...
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:39.340271+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:39.345675+0000 | reset | INFO - Compression lifecycle reset
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:39.347099+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:39.391721+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:39.391954+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:39.404510+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:46.375598+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-qwen-bobo-19k-re-33926-v2-uploader: 2026-03-25T20:43:46.375822+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Saving to /dev/shm/model_output...
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-qwen-bobo-19k-re-33926-v2-uploader: warnings.warn(
2026-03-25T20:44:29.843763+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Pushing to ChaiML/qwen_bobo_19k_reward_dpo_10k_250_higher_dpo_2_merged-FP8
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Checking if ChaiML/qwen_bobo_19k_reward_dpo_10k_250_higher_dpo_2_merged-FP8 already exists in ChaiML
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Creating repo ChaiML/qwen_bobo_19k_reward_dpo_10k_250_higher_dpo_2_merged-FP8 and uploading /dev/shm/model_output to it
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-qwen-bobo-19k-re-33926-v2-uploader: - model.safetensors: 35.9GB
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Large files may slow down loading and processing.
chaiml-qwen-bobo-19k-re-33926-v2-uploader: ---------- 2026-03-25 20:44:34 (0:00:00) ----------
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Files: hashed 5/7 (34.0K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-qwen-bobo-19k-re-33926-v2-uploader: ---------------------------------------------------
2026-03-25T20:45:30.026338+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
chaiml-qwen-bobo-19k-re-33926-v2-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-qwen-bobo-19k-re-33926-v2-uploader: ---------- 2026-03-25 20:45:34 (0:01:00) ----------
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-qwen-bobo-19k-re-33926-v2-uploader: ---------------------------------------------------
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Processed model ChaiML/qwen_bobo_19k_reward_dpo_10k_250_higher_dpo_2_merged in 208.108s
chaiml-qwen-bobo-19k-re-33926-v2-uploader: creating bucket guanaco-vllm-models
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-qwen-bobo-19k-re-33926-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-qwen-bobo-19k-re-33926-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-qwen-bobo-19k-re-33926-v2-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-qwen-bobo-19k-re-33926-v2-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/chat_template.jinja
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/tokenizer_config.json
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/generation_config.json
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/recipe.yaml
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/config.json
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/tokenizer.json
2026-03-25T20:46:30.196177+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
chaiml-qwen-bobo-19k-re-33926-v2-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-qwen-bobo-19k-re-33926-v2/default/model.safetensors
Job chaiml-qwen-bobo-19k-re-33926-v2-uploader completed after 286.55s with status: succeeded
Stopping job with name chaiml-qwen-bobo-19k-re-33926-v2-uploader
Pipeline stage VLLMUploader completed in 287.69s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.04s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-qwen-bobo-19k-re-33926-v2
Waiting for inference service chaiml-qwen-bobo-19k-re-33926-v2 to be ready
2026-03-25T20:47:30.451047+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
2026-03-25T20:48:30.631315+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
2026-03-25T20:49:30.802116+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
Inference service chaiml-qwen-bobo-19k-re-33926-v2 ready after 172.16994976997375s
Pipeline stage VLLMDeployer completed in 173.42s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-25T20:50:30.996172+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
2026-03-25T20:51:31.181761+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 17.540326595306396s
Received healthy response to inference request in 11.052357196807861s
Received healthy response to inference request in 5.7208733558654785s
2026-03-25T20:52:31.373663+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
Received healthy response to inference request in 11.153432607650757s
Received healthy response to inference request in 5.890111207962036s
Received healthy response to inference request in 5.607340574264526s
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.510494947433472s
Received healthy response to inference request in 5.602447032928467s
2026-03-25T20:53:31.548550+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
HTTPConnectionPool(host='chaiml-qwen-bobo-19k-re-33926-v2-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=20.0)
Received unhealthy response to inference request!
Received healthy response to inference request in 6.758442640304565s
Received healthy response to inference request in 10.742146968841553s
Received healthy response to inference request in 5.374124526977539s
Received healthy response to inference request in 5.5403828620910645s
Received healthy response to inference request in 5.899421453475952s
Received healthy response to inference request in 5.632051229476929s
Received healthy response to inference request in 5.8774731159210205s
2026-03-25T20:54:31.740189+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
Received healthy response to inference request in 5.54792332649231s
Received healthy response to inference request in 5.564132213592529s
Received healthy response to inference request in 5.700692415237427s
Received healthy response to inference request in 5.525588274002075s
Received healthy response to inference request in 5.385369300842285s
Received healthy response to inference request in 5.7444517612457275s
Received healthy response to inference request in 5.730184078216553s
Received healthy response to inference request in 5.631296634674072s
30 requests
7 failed requests
5th percentile: 5.441675841808319
10th percentile: 5.5240789413452145
20th percentile: 5.560890436172485
30th percentile: 5.624109816551209
40th percentile: 5.712800979614258
50th percentile: 5.810962438583374
60th percentile: 6.243029928207396
70th percentile: 11.08267982006073
80th percentile: 20.412558698654173
90th percentile: 20.42017729282379
95th percentile: 20.46694267988205
99th percentile: 20.712529480457306
mean time: 10.070760989189148
%s, retrying in %s seconds...
2026-03-25T20:55:32.181488+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
Received healthy response to inference request in 5.680675268173218s
Received healthy response to inference request in 5.537509441375732s
Received healthy response to inference request in 5.5186381340026855s
Received healthy response to inference request in 5.728370189666748s
Received healthy response to inference request in 5.537822008132935s
Received healthy response to inference request in 5.956852197647095s
Received healthy response to inference request in 5.55864953994751s
Received healthy response to inference request in 5.3676841259002686s
Received healthy response to inference request in 5.807215690612793s
Received healthy response to inference request in 5.77208948135376s
2026-03-25T20:56:32.423697+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
Received healthy response to inference request in 5.7820823192596436s
Received healthy response to inference request in 5.462658405303955s
Received healthy response to inference request in 5.578701496124268s
Received healthy response to inference request in 5.714734077453613s
Received healthy response to inference request in 5.324531078338623s
Received healthy response to inference request in 5.920622825622559s
Received healthy response to inference request in 5.4794676303863525s
Received healthy response to inference request in 5.637181997299194s
Received healthy response to inference request in 5.552342891693115s
Received healthy response to inference request in 5.427395343780518s
Received healthy response to inference request in 5.689076900482178s
2026-03-25T20:57:32.666903+00:00 monitor updated for chaiml-qwen-bobo-19k-re_33926_v2
Received healthy response to inference request in 5.476512432098389s
Received healthy response to inference request in 5.522429943084717s
Received healthy response to inference request in 5.449671745300293s
Received healthy response to inference request in 5.629935026168823s
Received healthy response to inference request in 5.523674726486206s
Received healthy response to inference request in 5.568935394287109s
Received healthy response to inference request in 5.573929309844971s
Received healthy response to inference request in 5.622216463088989s
Received healthy response to inference request in 5.682653903961182s
30 requests
0 failed requests
5th percentile: 5.39455417394638
10th percentile: 5.447444105148316
20th percentile: 5.47887659072876
30th percentile: 5.523301291465759
40th percentile: 5.546534538269043
50th percentile: 5.57143235206604
60th percentile: 5.625303888320923
70th percentile: 5.681268858909607
80th percentile: 5.71746129989624
90th percentile: 5.784595656394958
95th percentile: 5.869589614868164
99th percentile: 5.946345679759979
mean time: 5.602808666229248
Pipeline stage StressChecker completed in 491.40s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.28s
Shutdown handler de-registered
chaiml-qwen-bobo-19k-re_33926_v2 status is now deployed due to DeploymentManager action
chaiml-qwen-bobo-19k-re_33926_v2 status is now inactive due to auto deactivation removed underperforming models
chaiml-qwen-bobo-19k-re_33926_v2 status is now torndown due to DeploymentManager action