Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-grpo-q235b-kimid-96451-v1-uploader
Waiting for job on chaiml-grpo-q235b-kimid-96451-v1-uploader to finish
HTTP Request: %s %s "%s %d %s"
chaiml-grpo-q235b-kimid-96451-v1-uploader: Checking if ChaiML/grpo-q235b-kimid-v5d-merged-chai-rm-mid-kl-averaged-loras-W4A16 already exists in ChaiML
chaiml-grpo-q235b-kimid-96451-v1-uploader: Downloading snapshot of ChaiML/grpo-q235b-kimid-v5d-merged-chai-rm-mid-kl-averaged-loras...
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
chaiml-grpo-q235b-kimid-96451-v1-uploader: Downloaded in 173.488s
chaiml-grpo-q235b-kimid-96451-v1-uploader: Applying quantization...
chaiml-grpo-q235b-kimid-96451-v1-uploader: The tokenizer you are loading from '/tmp/model_input' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
chaiml-grpo-q235b-kimid-96451-v1-uploader: [33;1m2026-02-19 14:30:24 WARNING modeling_utils.py L4670: `torch_dtype` is deprecated! Use `dtype` instead![0m
chaiml-grpo-q235b-kimid-96451-v1-uploader: [38;20m2026-02-19 14:30:37 INFO base.py L366: using torch.bfloat16 for quantization tuning[0m
chaiml-grpo-q235b-kimid-96451-v1-uploader: [38;20m2026-02-19 14:30:42 INFO base.py L1145: start to compute imatrix[0m
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/local/lib/python3.12/dist-packages/torch/backends/cuda/__init__.py:131: UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'. Old settings, e.g, torch.backends.cuda.matmul.allow_tf32 = True, torch.backends.cudnn.allow_tf32 = True, allowTF32CuDNN() and allowTF32CuBLAS() will be deprecated after Pytorch 2.9. Please see https://pytorch.org/docs/main/notes/cuda.html#tensorfloat-32-tf32-on-ampere-and-later-devices (Triggered internally at /pytorch/aten/src/ATen/Context.cpp:80.)
chaiml-grpo-q235b-kimid-96451-v1-uploader: return torch._C._get_cublas_allow_tf32()
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:43.735000 7 torch/_dynamo/convert_frame.py:1358] [6/8] torch._dynamo hit config.recompile_limit (8)
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:43.735000 7 torch/_dynamo/convert_frame.py:1358] [6/8] function: 'forward' (/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py:208)
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:43.735000 7 torch/_dynamo/convert_frame.py:1358] [6/8] last reason: 6/7: self._modules['up_proj'].imatrix_cnt == 1345 # module.imatrix_cnt += input.shape[0] # auto_round/compressors/base.py:1179 in get_imatrix_hook (HINT: torch.compile considers integer attributes of the nn.Module to be static. If you are observing recompilation, you might want to make this integer dynamic using torch._dynamo.config.allow_unspec_int_on_nn_module = True, or convert this integer into a tensor.)
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:43.735000 7 torch/_dynamo/convert_frame.py:1358] [6/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:43.735000 7 torch/_dynamo/convert_frame.py:1358] [6/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:46.632000 7 torch/_dynamo/convert_frame.py:1358] [3/8] torch._dynamo hit config.recompile_limit (8)
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:46.632000 7 torch/_dynamo/convert_frame.py:1358] [3/8] function: 'forward' (/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py:305)
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:46.632000 7 torch/_dynamo/convert_frame.py:1358] [3/8] last reason: 3/7: self._modules['self_attn']._modules['k_proj'].imatrix_cnt == 56 # module.imatrix_cnt += input.shape[0] # auto_round/compressors/base.py:1179 in get_imatrix_hook (HINT: torch.compile considers integer attributes of the nn.Module to be static. If you are observing recompilation, you might want to make this integer dynamic using torch._dynamo.config.allow_unspec_int_on_nn_module = True, or convert this integer into a tensor.)
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:46.632000 7 torch/_dynamo/convert_frame.py:1358] [3/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
chaiml-grpo-q235b-kimid-96451-v1-uploader: W0219 14:31:46.632000 7 torch/_dynamo/convert_frame.py:1358] [3/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/torch.compiler_troubleshooting.html
chaiml-grpo-q235b-kimid-96451-v1-uploader: [33;1m2026-02-19 14:32:04 WARNING gguf.py L297: please use more data via setting `nsamples` to improve accuracy as calibration activations contain 0[0m
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-opusd_12063_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-opusd-12063-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-opusd_12063_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-opusd-12063-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Retrying (%r) after connection broken by '%r': %s
Unable to record family friendly update due to error: Invalid JSON input: Expecting value: line 1 column 1 (char 0)
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
chaiml-grpo-q235b-kimid-96451-v1-uploader: Checking if ChaiML/grpo-q235b-kimid-v5d-merged-chai-rm-mid-kl-averaged-loras-W4A16 already exists in ChaiML
chaiml-grpo-q235b-kimid-96451-v1-uploader: Creating repo ChaiML/grpo-q235b-kimid-v5d-merged-chai-rm-mid-kl-averaged-loras-W4A16 and uploading /dev/shm/model_output to it
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------- 2026-02-19 15:44:31 (0:00:00) ----------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Files: hashed 11/38 (26.1M/131.9G) | pre-uploaded: 0/1 (0.0/131.9G) (+27 unsure) | committed: 0/38 (0.0/131.9G) | ignored: 0
chaiml-grpo-q235b-kimid-96451-v1-uploader: Workers: hashing: 27 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 98
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------------------------------------------------
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Unable to record family friendly update due to error: Invalid JSON input: Expecting value: line 1 column 1 (char 0)
chaiml-grpo-q235b-kimid-96451-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------- 2026-02-19 15:45:31 (0:01:00) ----------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Files: hashed 38/38 (131.9G/131.9G) | pre-uploaded: 2/28 (1.9G/131.9G) | committed: 0/38 (0.0/131.9G) | ignored: 0
chaiml-grpo-q235b-kimid-96451-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 26 | committing: 0 | waiting: 100
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------------------------------------------------
chaiml-grpo-q235b-kimid-96451-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------- 2026-02-19 15:46:31 (0:02:00) ----------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Files: hashed 38/38 (131.9G/131.9G) | pre-uploaded: 10/28 (41.9G/131.9G) | committed: 0/38 (0.0/131.9G) | ignored: 0
chaiml-grpo-q235b-kimid-96451-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 18 | committing: 0 | waiting: 108
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------------------------------------------------
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
chaiml-grpo-q235b-kimid-96451-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------- 2026-02-19 15:47:31 (0:03:00) ----------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Files: hashed 38/38 (131.9G/131.9G) | pre-uploaded: 18/28 (81.9G/131.9G) | committed: 0/38 (0.0/131.9G) | ignored: 0
chaiml-grpo-q235b-kimid-96451-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 10 | committing: 0 | waiting: 116
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------------------------------------------------
chaiml-grpo-q235b-kimid-96451-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------- 2026-02-19 15:48:31 (0:04:00) ----------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Files: hashed 38/38 (131.9G/131.9G) | pre-uploaded: 26/28 (121.9G/131.9G) | committed: 0/38 (0.0/131.9G) | ignored: 0
chaiml-grpo-q235b-kimid-96451-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 2 | committing: 0 | waiting: 124
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------------------------------------------------
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
chaiml-grpo-q235b-kimid-96451-v1-uploader:
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
[K[F
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------- 2026-02-19 15:49:31 (0:05:00) ----------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Files: hashed 38/38 (131.9G/131.9G) | pre-uploaded: 28/28 (131.9G/131.9G) | committed: 0/38 (0.0/131.9G) | ignored: 0
chaiml-grpo-q235b-kimid-96451-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 0 | committing: 1 | waiting: 125
chaiml-grpo-q235b-kimid-96451-v1-uploader: ---------------------------------------------------
chaiml-grpo-q235b-kimid-96451-v1-uploader: Processed model ChaiML/grpo-q235b-kimid-v5d-merged-chai-rm-mid-kl-averaged-loras in 4935.543s
chaiml-grpo-q235b-kimid-96451-v1-uploader: creating bucket guanaco-vllm-models
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-grpo-q235b-kimid-96451-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-grpo-q235b-kimid-96451-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-grpo-q235b-kimid-96451-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-grpo-q235b-kimid-96451-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-grpo-q235b-kimid-96451-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-grpo-q235b-kimid-96451-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-grpo-q235b-kimid-96451-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-grpo-q235b-kimid-96451-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-grpo-q235b-kimid-96451-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-grpo-q235b-kimid-96451-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-grpo-q235b-kimid-96451-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/quantization_config.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/quantization_config.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/vocab.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/config.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/special_tokens_map.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/merges.txt
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/generation_config.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/chat_template.jinja
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/tokenizer.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/tokenizer_config.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model.safetensors.index.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/added_tokens.json s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/added_tokens.json
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00027-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00009-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00009-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00017-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00019-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00008-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00012-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00025-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00002-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00014-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00014-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00013-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00003-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00003-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00016-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00016-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00004-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00004-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00026-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00020-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00010-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00010-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00006-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00007-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00023-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00021-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00021-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00015-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00011-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00011-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00005-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00001-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00001-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00022-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00022-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00024-of-00027.safetensors
chaiml-grpo-q235b-kimid-96451-v1-uploader: cp /dev/shm/model_output/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-grpo-q235b-kimid-96451-v1/default/model-00018-of-00027.safetensors
Job chaiml-grpo-q235b-kimid-96451-v1-uploader completed after 5023.94s with status: succeeded
Stopping job with name chaiml-grpo-q235b-kimid-96451-v1-uploader
Pipeline stage VLLMUploader completed in 5024.46s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.17s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-grpo-q235b-kimid-96451-v1
Waiting for inference service chaiml-grpo-q235b-kimid-96451-v1 to be ready
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Inference service chaiml-grpo-q235b-kimid-96451-v1 ready after 381.7727298736572s
Pipeline stage VLLMDeployer completed in 382.89s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.2309257984161377s
Received healthy response to inference request in 2.1155664920806885s
Received healthy response to inference request in 1.8999230861663818s
Received healthy response to inference request in 1.9127564430236816s
Received healthy response to inference request in 2.189225435256958s
Received healthy response to inference request in 1.9869685173034668s
Received healthy response to inference request in 2.0001914501190186s
Received healthy response to inference request in 2.0587358474731445s
Received healthy response to inference request in 1.9256489276885986s
Received healthy response to inference request in 1.9663078784942627s
Received healthy response to inference request in 1.9543135166168213s
Received healthy response to inference request in 1.9786176681518555s
Received healthy response to inference request in 2.1051459312438965s
Received healthy response to inference request in 1.9027442932128906s
Received healthy response to inference request in 2.2020699977874756s
Received healthy response to inference request in 2.008690357208252s
Received healthy response to inference request in 2.272955894470215s
Received healthy response to inference request in 2.109945058822632s
Received healthy response to inference request in 1.9631495475769043s
Received healthy response to inference request in 1.8987576961517334s
Received healthy response to inference request in 1.9130330085754395s
Received healthy response to inference request in 1.807990550994873s
Received healthy response to inference request in 1.939152479171753s
Received healthy response to inference request in 2.0684549808502197s
Received healthy response to inference request in 2.0000500679016113s
Received healthy response to inference request in 2.213254451751709s
Received healthy response to inference request in 2.1614162921905518s
Received healthy response to inference request in 2.1591813564300537s
Received healthy response to inference request in 2.0178849697113037s
Received healthy response to inference request in 1.9870200157165527s
30 requests
0 failed requests
5th percentile: 1.899282121658325
10th percentile: 1.9024621725082398
20th percentile: 1.9231257438659668
30th percentile: 1.9604987382888794
40th percentile: 1.9836281776428222
50th percentile: 2.000120759010315
60th percentile: 2.03422532081604
70th percentile: 2.106585669517517
80th percentile: 2.1596283435821535
90th percentile: 2.203188443183899
95th percentile: 2.2229736924171446
99th percentile: 2.2607671666145324
mean time: 2.031669267018636
Pipeline stage StressChecker completed in 65.05s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.73s
Shutdown handler de-registered
chaiml-grpo-q235b-kimid_96451_v1 status is now deployed due to DeploymentManager action
chaiml-grpo-q235b-kimid_96451_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-grpo-q235b-kimid_96451_v1 status is now torndown due to DeploymentManager action