Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-bol-v1-opusdv1b-13427-v3-uploader
Waiting for job on chaiml-bol-v1-opusdv1b-13427-v3-uploader to finish
chaiml-bol-v1-opusdv1b-13427-v3-uploader: bash: cannot set terminal process group (-1): Inappropriate ioctl for device
chaiml-bol-v1-opusdv1b-13427-v3-uploader: bash: no job control in this shell
chaiml-bol-v1-opusdv1b-13427-v3-uploader: /root/miniconda3/envs/nvidia/lib/python3.11/site-packages/mk1/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
chaiml-bol-v1-opusdv1b-13427-v3-uploader: __import__('pkg_resources').declare_namespace(__name__)
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ╔═════════════════════════════════════════════════════════════════════╗
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ██████ ██████ █████ ████ ████ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░░██████ ██████ ░░███ ███░ ░░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███░█████░███ ░███ ███ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███░░███ ░███ ░███████ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███ ░░░ ░███ ░███░░███ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███ ░███ ░███ ░░███ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ █████ █████ █████ ░░████ █████ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░░░░░ ░░░░░ ░░░░░ ░░░░ ░░░░░ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Version: 0.30.6+torch280 ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Features: FLYWHEEL, CUDA ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Copyright 2023-2025 MK ONE TECHNOLOGIES Inc. ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ https://mk1.ai ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ The license key for the current software has been verified as ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ belonging to: ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Chai Research Corp. ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Expiration: 2028-03-31 23:59:59 ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ╚═════════════════════════════════════════════════════════════════════╝
chaiml-bol-v1-opusdv1b-13427-v3-uploader: Downloaded to shared memory in 137.683s
chaiml-bol-v1-opusdv1b-13427-v3-uploader: Processed model ChaiML/bol-v1-opusdv1b-lr5e6ep2r64g4b01-int4-mixed in 210.040s
chaiml-bol-v1-opusdv1b-13427-v3-uploader: creating bucket guanaco-vllm-models
chaiml-bol-v1-opusdv1b-13427-v3-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-bol-v1-opusdv1b-13427-v3-uploader: uploading /dev/shm/model_cache to s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/added_tokens.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/added_tokens.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/generation_config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/generation_config.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/.gitattributes s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/.gitattributes
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/special_tokens_map.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/chat_template.jinja s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/chat_template.jinja
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/config.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/tokenizer_config.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/quantization_config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/quantization_config.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/merges.txt s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/merges.txt
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/vocab.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/vocab.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/tokenizer.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model.safetensors.index.json
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG retryable error: RequestError: send request failed
chaiml-bol-v1-opusdv1b-13427-v3-uploader: caused by: Put "https://object.ord1.coreweave.com/guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00001-of-00027.safetensors?partNumber=9&uploadId=2~PtXQM6w8HVFyhSbgMm4qW3R2OFYcCjL": write tcp 10.0.1.140:36546->216.153.53.63:443: write: connection reset by peer
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ERROR "cp /dev/shm/model_cache/model-00001-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00001-of-00027.safetensors": MultipartUpload: upload multipart failed upload id: 2~PtXQM6w8HVFyhSbgMm4qW3R2OFYcCjL caused by: SignatureDoesNotMatch: status code: 403, request id: tx00000f763ec32a4c42fa3-006974d461-149d0ff886-default, host id:
HTTP Request: %s %s "%s %d %s"
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG retryable error: RequestError: send request failed
chaiml-bol-v1-opusdv1b-13427-v3-uploader: caused by: Put "https://object.ord1.coreweave.com/guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00007-of-00027.safetensors?partNumber=40&uploadId=2~8RqQwSPapc0XrDhjzRYi4w05f4tHi25": write tcp 10.0.1.140:40882->216.153.53.63:443: write: connection reset by peer
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ERROR "cp /dev/shm/model_cache/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00007-of-00027.safetensors": MultipartUpload: upload multipart failed upload id: 2~8RqQwSPapc0XrDhjzRYi4w05f4tHi25 caused by: SignatureDoesNotMatch: status code: 403, request id: tx000009603f1b461d5a118-006974d508-14a26732ad-default, host id:
HTTP Request: %s %s "%s %d %s"
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00027-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00024-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00008-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00015-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00014-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00014-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00020-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00004-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00004-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00011-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00011-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00010-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00010-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00026-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00019-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00006-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00022-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00022-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00013-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00017-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00002-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00023-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00025-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00018-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00005-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00003-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00003-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00021-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00021-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00012-of-00027.safetensors
Job chaiml-bol-v1-opusdv1b-13427-v3-uploader completed after 696.16s with status: failed
Stopping job with name chaiml-bol-v1-opusdv1b-13427-v3-uploader
%s, retrying in %s seconds...
Starting job with name chaiml-bol-v1-opusdv1b-13427-v3-uploader
Waiting for job on chaiml-bol-v1-opusdv1b-13427-v3-uploader to finish
chaiml-bol-v1-opusdv1b-13427-v3-uploader: bash: cannot set terminal process group (-1): Inappropriate ioctl for device
chaiml-bol-v1-opusdv1b-13427-v3-uploader: bash: no job control in this shell
chaiml-bol-v1-opusdv1b-13427-v3-uploader: /root/miniconda3/envs/nvidia/lib/python3.11/site-packages/mk1/__init__.py:1: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
chaiml-bol-v1-opusdv1b-13427-v3-uploader: __import__('pkg_resources').declare_namespace(__name__)
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ╔═════════════════════════════════════════════════════════════════════╗
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ██████ ██████ █████ ████ ████ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░░██████ ██████ ░░███ ███░ ░░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███░█████░███ ░███ ███ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███░░███ ░███ ░███████ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███ ░░░ ░███ ░███░░███ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░███ ░███ ░███ ░░███ ░███ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ █████ █████ █████ ░░████ █████ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ░░░░░ ░░░░░ ░░░░░ ░░░░ ░░░░░ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Version: 0.30.6+torch280 ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Features: FLYWHEEL, CUDA ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Copyright 2023-2025 MK ONE TECHNOLOGIES Inc. ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ https://mk1.ai ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ The license key for the current software has been verified as ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ belonging to: ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Chai Research Corp. ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ Expiration: 2028-03-31 23:59:59 ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ║ ║
chaiml-bol-v1-opusdv1b-13427-v3-uploader: ╚═════════════════════════════════════════════════════════════════════╝
Failed to get response for submission chaiml-csfs-v3-3-dpo-lr_86358_v2: ('http://guanaco-model-mesh-load-balancer.model-mesh.k2.chaiverse.com/models/chaiml-csfs-v3-3-dpo-lr_86358_v2/predict', '{"detail":"1 validation error for RuntimeResponse\\npredictions\\n Field required [type=missing, input_value={\'detail\': \\"503, message=...o-lr_86358_v2/predict\'\\"}, input_type=dict]\\n For further information visit https://errors.pydantic.dev/2.12/v/missing"}')
chaiml-bol-v1-opusdv1b-13427-v3-uploader: Downloaded to shared memory in 133.367s
HTTP Request: %s %s "%s %d %s"
chaiml-bol-v1-opusdv1b-13427-v3-uploader: Processed model ChaiML/bol-v1-opusdv1b-lr5e6ep2r64g4b01-int4-mixed in 190.256s
chaiml-bol-v1-opusdv1b-13427-v3-uploader: creating bucket guanaco-vllm-models
chaiml-bol-v1-opusdv1b-13427-v3-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-bol-v1-opusdv1b-13427-v3-uploader: uploading /dev/shm/model_cache to s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/.gitattributes s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/.gitattributes": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/added_tokens.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/added_tokens.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/chat_template.jinja s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/chat_template.jinja": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/config.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/generation_config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/generation_config.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/merges.txt s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/merges.txt": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00002-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00003-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00003-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00004-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00004-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00005-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00006-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00008-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00009-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00009-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00010-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00010-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00011-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00011-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00012-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00013-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00014-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00014-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00015-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00016-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00016-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00017-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00018-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00019-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00020-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00021-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00021-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00022-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00022-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00023-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00024-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00025-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00026-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00027-of-00027.safetensors": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model.safetensors.index.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/quantization_config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/quantization_config.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/special_tokens_map.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/special_tokens_map.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/tokenizer.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/tokenizer.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/tokenizer_config.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/tokenizer_config.json": object size matches
chaiml-bol-v1-opusdv1b-13427-v3-uploader: DEBUG "sync /dev/shm/model_cache/vocab.json s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/vocab.json": object size matches
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission blend_litaf_2026-01-22: HTTPConnectionPool(host='guanaco-model-mesh-load-balancer.model-mesh.k2.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00007-of-00027.safetensors
chaiml-bol-v1-opusdv1b-13427-v3-uploader: cp /dev/shm/model_cache/model-00001-of-00027.safetensors s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3/model-00001-of-00027.safetensors
Job chaiml-bol-v1-opusdv1b-13427-v3-uploader completed after 299.3s with status: succeeded
Stopping job with name chaiml-bol-v1-opusdv1b-13427-v3-uploader
Pipeline stage VLLMUploader completed in 996.69s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-bol-v1-opusdv1b-13427-v3
Waiting for inference service chaiml-bol-v1-opusdv1b-13427-v3 to be ready
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission blend_samot_2026-01-22: HTTPConnectionPool(host='guanaco-model-mesh-load-balancer.model-mesh.k2.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Tearing down inference service chaiml-bol-v1-opusdv1b-13427-v3
clean up pipeline due to error=DeploymentError('Timeout to start the InferenceService chaiml-bol-v1-opusdv1b-13427-v3. The InferenceService is as following: {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'kind\': \'InferenceService\', \'metadata\': {\'annotations\': {\'autoscaling.knative.dev/class\': \'hpa.autoscaling.knative.dev\', \'autoscaling.knative.dev/container-concurrency-target-percentage\': \'70\', \'autoscaling.knative.dev/initial-scale\': \'5\', \'autoscaling.knative.dev/max-scale-down-rate\': \'1.1\', \'autoscaling.knative.dev/max-scale-up-rate\': \'2\', \'autoscaling.knative.dev/metric\': \'mean_pod_latency_ms_v2\', \'autoscaling.knative.dev/panic-threshold-percentage\': \'650\', \'autoscaling.knative.dev/panic-window-percentage\': \'35\', \'autoscaling.knative.dev/scale-down-delay\': \'30s\', \'autoscaling.knative.dev/scale-to-zero-grace-period\': \'10m\', \'autoscaling.knative.dev/stable-window\': \'180s\', \'autoscaling.knative.dev/target\': \'4000\', \'autoscaling.knative.dev/target-burst-capacity\': \'-1\', \'autoscaling.knative.dev/tick-interval\': \'15s\', \'features.knative.dev/http-full-duplex\': \'Enabled\', \'networking.knative.dev/ingress-class\': \'istio.ingress.networking.knative.dev\'}, \'creationTimestamp\': \'2026-01-24T14:29:16Z\', \'finalizers\': [\'inferenceservice.finalizers\'], \'generation\': 1, \'labels\': {\'knative.coreweave.cloud/ingress\': \'istio.ingress.networking.knative.dev\', \'prometheus.k.chaiverse.com\': \'true\', \'qos.coreweave.cloud/latency\': \'low\'}, \'managedFields\': [{\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:annotations\': {\'.\': {}, \'f:autoscaling.knative.dev/class\': {}, \'f:autoscaling.knative.dev/container-concurrency-target-percentage\': {}, \'f:autoscaling.knative.dev/initial-scale\': {}, \'f:autoscaling.knative.dev/max-scale-down-rate\': {}, \'f:autoscaling.knative.dev/max-scale-up-rate\': {}, \'f:autoscaling.knative.dev/metric\': {}, \'f:autoscaling.knative.dev/panic-threshold-percentage\': {}, \'f:autoscaling.knative.dev/panic-window-percentage\': {}, \'f:autoscaling.knative.dev/scale-down-delay\': {}, \'f:autoscaling.knative.dev/scale-to-zero-grace-period\': {}, \'f:autoscaling.knative.dev/stable-window\': {}, \'f:autoscaling.knative.dev/target\': {}, \'f:autoscaling.knative.dev/target-burst-capacity\': {}, \'f:autoscaling.knative.dev/tick-interval\': {}, \'f:features.knative.dev/http-full-duplex\': {}, \'f:networking.knative.dev/ingress-class\': {}}, \'f:labels\': {\'.\': {}, \'f:knative.coreweave.cloud/ingress\': {}, \'f:prometheus.k.chaiverse.com\': {}, \'f:qos.coreweave.cloud/latency\': {}}}, \'f:spec\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:affinity\': {\'.\': {}, \'f:nodeAffinity\': {\'.\': {}, \'f:tion\': {}, \'f:requiredDuringSchedulingIgnoredDuringExecution\': {}}}, \'f:containerConcurrency\': {}, \'f:containers\': {}, \'f:imagePullSecrets\': {}, \'f:maxReplicas\': {}, \'f:minReplicas\': {}, \'f:priorityClassName\': {}, \'f:timeout\': {}, \'f:volumes\': {}}}}, \'manager\': \'OpenAPI-Generator\', \'operation\': \'Update\', \'time\': \'2026-01-24T14:29:16Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:finalizers\': {\'.\': {}, \'v:"inferenceservice.finalizers"\': {}}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'time\': \'2026-01-24T14:29:16Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:status\': {\'.\': {}, \'f:components\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:latestCreatedRevision\': {}}}, \'f:conditions\': {}, \'f:modelStatus\': {\'.\': {}, \'f:states\': {\'.\': {}, \'f:activeModelState\': {}, \'f:targetModelState\': {}}, \'f:transitionStatus\': {}}, \'f:observedGeneration\': {}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'subresource\': \'status\', \'time\': \'2026-01-24T14:39:18Z\'}], \'name\': \'chaiml-bol-v1-opusdv1b-13427-v3\', \'namespace\': \'tenant-chaiml-guanaco\', \'resourceVersion\': \'381529497\', \'uid\': \'3569b6d8-773a-4d7e-8423-204e08e0fac8\'}, \'spec\': {\'predictor\': {\'affinity\': {\'nodeAffinity\': {\'tion\': [{\'preference\': {\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}, \'weight\': 5}], \'requiredDuringSchedulingIgnoredDuringExecution\': {\'nodeSelectorTerms\': [{\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}]}}}, \'containerConcurrency\': 0, \'containers\': [{\'args\': [\'serve\', \'s3://guanaco-vllm-models/chaiml-bol-v1-opusdv1b-13427-v3\', \'--port\', \'8080\', \'--tensor-parallel-size\', \'2\', \'--max-model-len\', \'10240\', \'--max-num-batched-tokens\', \'10240\', \'--max-num-seqs\', \'64\', \'--gpu-memory-utilization\', \'0.92\', \'--trust-remote-code\', \'--load-format\', \'runai_streamer\', \'--served-model-name\', \'ChaiML/bol-v1-opusdv1b-lr5e6ep2r64g4b01-int4-mixed\'], \'env\': [{\'name\': \'RESERVE_MEMORY\', \'value\': \'2048\'}, {\'name\': \'DOWNLOAD_TO_LOCAL\', \'value\': \'/dev/shm/model_cache\'}, {\'name\': \'NUM_GPUS\', \'value\': \'2\'}, {\'name\': \'VLLM_ASSETS_CACHE\', \'value\': \'/code/vllm_assets_cache\'}, {\'name\': \'RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING\', \'value\': \'0\'}, {\'name\': \'AWS_ACCESS_KEY_ID\', \'value\': \'LETMTTRMLFFAMTBK\'}, {\'name\': \'AWS_SECRET_ACCESS_KEY\', \'value\': \'VwwZaqefOOoaouNxUk03oUmK9pVEfruJhjBHPGdgycK\'}, {\'name\': \'AWS_ENDPOINT_URL\', \'value\': \'https://object.ord1.coreweave.com\'}, {\'name\': \'HF_TOKEN\', \'valueFrom\': {\'secretKeyRef\': {\'key\': \'token\', \'name\': \'hf-token\'}}}], \'image\': \'gcr.io/chai-959f8/vllm:v0.13.0\', \'imagePullPolicy\': \'IfNotPresent\', \'name\': \'kserve-container\', \'readinessProbe\': {\'failureThreshold\': 1, \'httpGet\': {\'path\': \'/v1/models\', \'port\': 8080}, \'initialDelaySeconds\': 60, \'periodSeconds\': 10, \'successThreshold\': 1, \'timeoutSeconds\': 5}, \'resources\': {\'limits\': {\'cpu\': \'4\', \'memory\': \'128Gi\', \'nvidia.com/gpu\': \'2\'}, \'requests\': {\'cpu\': \'4\', \'memory\': \'128Gi\', \'nvidia.com/gpu\': \'2\'}}, \'volumeMounts\': [{\'mountPath\': \'/dev/shm\', \'name\': \'shared-memory-cache\'}, {\'mountPath\': \'/root/.cache\', \'name\': \'cache-volume\'}]}], \'imagePullSecrets\': [{\'name\': \'docker-creds\'}], \'maxReplicas\': 40, \'minReplicas\': 0, \'priorityClassName\': \'creator-studio\', \'timeout\': 60, \'volumes\': [{\'emptyDir\': {\'medium\': \'Memory\', \'sizeLimit\': \'128Gi\'}, \'name\': \'shared-memory-cache\'}, {\'name\': \'cache-volume\', \'persistentVolumeClaim\': {\'claimName\': \'cache-pvc\'}}]}}, \'status\': {\'components\': {\'predictor\': {\'latestCreatedRevision\': \'chaiml-bol-v1-opusdv1b-13427-v3-predictor-00001\'}}, \'conditions\': [{\'lastTransitionTime\': \'2026-01-24T14:29:17Z\', \'reason\': \'PredictorConfigurationReady not ready\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'LatestDeploymentReady\'}, {\'lastTransitionTime\': \'2026-01-24T14:39:18Z\', \'message\': \'Revision "chaiml-bol-v1-opusdv1b-13427-v3-predictor-00001" failed with message: Container failed with: pid=1)\\x1b[0;0m ^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m self.engine_core = EngineCoreClient.make_async_mp_client(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m return AsyncMPClient(*client_args)\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 820, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m super().__init__(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 477, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m with launch_core_engines(vllm_config, executor_class, log_stats) as (\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m next(self.gen)\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 903, in launch_core_engines\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m wait_for_engine_startup(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 960, in wait_for_engine_startup\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m raise RuntimeError(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}\\n/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown\\n warnings.warn(\\\'resource_tracker: There appear to be %!!(MISSING)d(MISSING) \\\'\\n.\', \'reason\': \'RevisionFailed\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'PredictorConfigurationReady\'}, {\'lastTransitionTime\': \'2026-01-24T14:29:17Z\', \'message\': \'Configuration "chaiml-bol-v1-opusdv1b-13427-v3-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'status\': \'False\', \'type\': \'PredictorReady\'}, {\'lastTransitionTime\': \'2026-01-24T14:29:17Z\', \'message\': \'Configuration "chaiml-bol-v1-opusdv1b-13427-v3-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'PredictorRouteReady\'}, {\'lastTransitionTime\': \'2026-01-24T14:29:17Z\', \'message\': \'Configuration "chaiml-bol-v1-opusdv1b-13427-v3-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'status\': \'False\', \'type\': \'Ready\'}, {\'lastTransitionTime\': \'2026-01-24T14:29:17Z\', \'reason\': \'PredictorRouteReady not ready\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'RoutesReady\'}], \'modelStatus\': {\'states\': {\'activeModelState\': \'\', \'targetModelState\': \'Pending\'}, \'transitionStatus\': \'InProgress\'}, \'observedGeneration\': 1}}')
run pipeline stage %s
Running pipeline stage VLLMDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage VLLMDeleter completed in 0.22s
Shutdown handler de-registered
chaiml-bol-v1-opusdv1b-_13427_v3 status is now failed due to DeploymentManager action