Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-muster-v0-q235b-52842-v7-uploader
Waiting for job on chaiml-muster-v0-q235b-52842-v7-uploader to finish
chaiml-muster-v0-q235b-52842-v7-uploader: Using quantization_mode: w4a16
chaiml-muster-v0-q235b-52842-v7-uploader: Checking if ChaiML/muster-v0-q235b-lr1e4ep2r64g4-W4A16 already exists in ChaiML
chaiml-muster-v0-q235b-52842-v7-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-muster-v0-q235b-52842-v7-uploader: Downloading snapshot of ChaiML/muster-v0-q235b-lr1e4ep2r64g4-W4A16...
chaiml-muster-v0-q235b-52842-v7-uploader:
Fetching 39 files: 0%| | 0/39 [00:00<?, ?it/s]
Fetching 39 files: 3%|▎ | 1/39 [00:00<00:10, 3.64it/s]
Fetching 39 files: 18%|█▊ | 7/39 [00:14<01:07, 2.10s/it]
Fetching 39 files: 21%|██ | 8/39 [00:43<03:33, 6.90s/it]
Fetching 39 files: 74%|███████▍ | 29/39 [00:45<00:11, 1.18s/it]
Fetching 39 files: 77%|███████▋ | 30/39 [00:46<00:10, 1.15s/it]
Fetching 39 files: 79%|███████▉ | 31/39 [00:47<00:09, 1.14s/it]
Fetching 39 files: 100%|██████████| 39/39 [00:47<00:00, 1.21s/it]
chaiml-muster-v0-q235b-52842-v7-uploader: Downloaded in 47.210s
chaiml-muster-v0-q235b-52842-v7-uploader: Processed model ChaiML/muster-v0-q235b-lr1e4ep2r64g4 in 47.734s
chaiml-muster-v0-q235b-52842-v7-uploader: creating bucket guanaco-vllm-models
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v7-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-muster-v0-q235b-52842-v7-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v7-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v7-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v7-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-muster-v0-q235b-52842-v7-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-muster-v0-q235b-52842-v7-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-muster-v0-q235b-52842-v7-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-muster-v0-q235b-52842-v7-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-muster-v0-q235b-52842-v7-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-muster-v0-q235b-52842-v7-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/added_tokens.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/added_tokens.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/.gitattributes
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/special_tokens_map.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/config.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/generation_config.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/quantization_config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/quantization_config.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/merges.txt
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/chat_template.jinja
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/tokenizer_config.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/tokenizer.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/vocab.json
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model.safetensors.index.json
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00027-of-00027.safetensors
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
HTTP Request: %s %s "%s %d %s"
admin requested tearing down of junhua024-chai-19cl-full-0022_v1
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
HTTP Request: %s %s "%s %d %s"
Running pipeline stage MKMLDeleter
%s, retrying in %s seconds...
%s, retrying in %s seconds...
clean up pipeline due to error=TeardownError('401\nReason: Unauthorized\nHTTP response headers: HTTPHeaderDict({\'Audit-Id\': \'0c094c28-520c-499b-8110-610fc1f9045f\', \'Cache-Control\': \'no-cache, private\', \'Content-Type\': \'application/json\', \'Date\': \'Sat, 07 Feb 2026 15:12:04 GMT\', \'Content-Length\': \'129\'})\nHTTP response body: b\'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}\\n\'\nOriginal traceback: \n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/dynamic/client.py", line 55, in inner\n resp = func(self, *args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/dynamic/client.py", line 273, in request\n api_response = self.client.call_api(\n ^^^^^^^^^^^^^^^^^^^^^\n\n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 348, in call_api\n return self.__call_api(resource_path, method,\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 180, in __call_api\n response_data = self.request(\n ^^^^^^^^^^^^^\n\n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/client/api_client.py", line 373, in request\n return self.rest_client.GET(url,\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/client/rest.py", line 244, in GET\n return self.request("GET", url,\n ^^^^^^^^^^^^^^^^^^^^^^^^\n\n File "/root/miniconda3/envs/guanaco/lib/python3.11/site-packages/kubernetes/client/rest.py", line 238, in request\n raise ApiException(http_resp=r)\n')
Shutdown handler de-registered
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00019-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00024-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00013-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00025-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00023-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00011-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00011-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00007-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00009-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00009-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00018-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00002-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00016-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00016-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00026-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00005-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00006-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00020-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00008-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00015-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00017-of-00027.safetensors
chaiml-muster-v0-q235b-52842-v7-uploader: cp /dev/shm/model_output/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-muster-v0-q235b-52842-v7/model-00012-of-00027.safetensors
Job chaiml-muster-v0-q235b-52842-v7-uploader completed after 678.43s with status: succeeded
Stopping job with name chaiml-muster-v0-q235b-52842-v7-uploader
Pipeline stage VLLMUploader completed in 680.46s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-muster-v0-q235b-52842-v7
Waiting for inference service chaiml-muster-v0-q235b-52842-v7 to be ready
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Failed to get response for submission chaiml-02f4-69d4-linear-w01_v8: HTTPConnectionPool(host='guanaco-model-mesh-load-balancer.model-mesh.k2.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
HTTP Request: %s %s "%s %d %s"
Inference service chaiml-muster-v0-q235b-52842-v7 ready after 727.6461608409882s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Pipeline stage VLLMDeployer completed in 731.92s
Connection pool is full, discarding connection: %s. Connection pool size: %s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.3486998081207275s
Received healthy response to inference request in 2.1219558715820312s
Received healthy response to inference request in 2.416314125061035s
Received healthy response to inference request in 3.011425018310547s
Received healthy response to inference request in 2.047943592071533s
Received healthy response to inference request in 2.3093819618225098s
Received healthy response to inference request in 2.6195106506347656s
Received healthy response to inference request in 2.384087562561035s
Received healthy response to inference request in 2.0301969051361084s
Received healthy response to inference request in 2.5694522857666016s
Received healthy response to inference request in 2.3051486015319824s
Received healthy response to inference request in 2.7998530864715576s
Received healthy response to inference request in 3.327118158340454s
Received healthy response to inference request in 2.6860668659210205s
Received healthy response to inference request in 2.271235942840576s
Received healthy response to inference request in 2.1230480670928955s
Received healthy response to inference request in 2.423630714416504s
Received healthy response to inference request in 2.206883430480957s
Received healthy response to inference request in 2.243450403213501s
Received healthy response to inference request in 2.4180941581726074s
Received healthy response to inference request in 2.439068555831909s
Received healthy response to inference request in 1.9867370128631592s
Received healthy response to inference request in 2.269137144088745s
Received healthy response to inference request in 2.243640184402466s
Received healthy response to inference request in 2.034334897994995s
Received healthy response to inference request in 2.4698610305786133s
Received healthy response to inference request in 1.9266605377197266s
Received healthy response to inference request in 2.0196750164031982s
Received healthy response to inference request in 1.9711079597473145s
Received healthy response to inference request in 2.416160821914673s
30 requests
0 failed requests
5th percentile: 1.9781410336494445
10th percentile: 2.0163812160491945
20th percentile: 2.0452218532562254
30th percentile: 2.1817328214645384
40th percentile: 2.258938360214233
50th percentile: 2.307265281677246
60th percentile: 2.3969168663024902
70th percentile: 2.4197551250457763
80th percentile: 2.489779281616211
90th percentile: 2.6974454879760743
95th percentile: 2.916217648983001
99th percentile: 3.235567147731781
mean time: 2.3479960123697916
Pipeline stage StressChecker completed in 85.41s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.69s
Shutdown handler de-registered
chaiml-muster-v0-q235b-_52842_v7 status is now deployed due to DeploymentManager action
chaiml-muster-v0-q235b-_52842_v7 status is now inactive due to auto deactivation removed underperforming models
chaiml-muster-v0-q235b-_52842_v7 status is now torndown due to DeploymentManager action