Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-2fe5-c13f-linear-57126-v3-uploader
Waiting for job on chaiml-2fe5-c13f-linear-57126-v3-uploader to finish
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
chaiml-2fe5-c13f-linear-57126-v3-uploader: Using quantization_mode: none
chaiml-2fe5-c13f-linear-57126-v3-uploader: Downloading snapshot of ChaiML/2fe5-c13f-linear-w01-FP8...
chaiml-2fe5-c13f-linear-57126-v3-uploader:
Fetching 12 files: 0%| | 0/12 [00:00<?, ?it/s]
Fetching 12 files: 8%|▊ | 1/12 [00:00<00:04, 2.25it/s]
Fetching 12 files: 42%|████▏ | 5/12 [00:07<00:11, 1.65s/it]
Fetching 12 files: 50%|█████ | 6/12 [00:08<00:08, 1.38s/it]
Fetching 12 files: 100%|██████████| 12/12 [00:08<00:00, 1.43it/s]
chaiml-2fe5-c13f-linear-57126-v3-uploader: Downloaded in 8.508s
chaiml-2fe5-c13f-linear-57126-v3-uploader: Processed model ChaiML/2fe5-c13f-linear-w01-FP8 in 13.438s
chaiml-2fe5-c13f-linear-57126-v3-uploader: creating bucket guanaco-vllm-models
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-57126-v3-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-2fe5-c13f-linear-57126-v3-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-57126-v3-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-57126-v3-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-57126-v3-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-2fe5-c13f-linear-57126-v3-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-2fe5-c13f-linear-57126-v3-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-2fe5-c13f-linear-57126-v3-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-2fe5-c13f-linear-57126-v3-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-2fe5-c13f-linear-57126-v3-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-2fe5-c13f-linear-57126-v3-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/.gitattributes
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/generation_config.json
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/config.json
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/special_tokens_map.json
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/recipe.yaml
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/model.safetensors.index.json
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/chat_template.jinja
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/tokenizer_config.json
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/tokenizer.json
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/model-00003-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/model-00003-of-00003.safetensors
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/model-00001-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/model-00001-of-00003.safetensors
chaiml-2fe5-c13f-linear-57126-v3-uploader: cp /dev/shm/model_output/model-00002-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v3/model-00002-of-00003.safetensors
Job chaiml-2fe5-c13f-linear-57126-v3-uploader completed after 226.78s with status: succeeded
Stopping job with name chaiml-2fe5-c13f-linear-57126-v3-uploader
Pipeline stage VLLMUploader completed in 227.34s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-2fe5-c13f-linear-57126-v3
Waiting for inference service chaiml-2fe5-c13f-linear-57126-v3 to be ready
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Connection pool is full, discarding connection: %s. Connection pool size: %s
HTTP Request: %s %s "%s %d %s"
Inference service chaiml-2fe5-c13f-linear-57126-v3 ready after 1138.8703644275665s
Pipeline stage VLLMDeployer completed in 1139.52s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.7291395664215088s
Received healthy response to inference request in 1.8874437808990479s
Received healthy response to inference request in 1.714986801147461s
Received healthy response to inference request in 1.7157156467437744s
Received healthy response to inference request in 1.9122354984283447s
Received healthy response to inference request in 2.0548133850097656s
Received healthy response to inference request in 1.630842924118042s
Received healthy response to inference request in 1.830979347229004s
Received healthy response to inference request in 1.7034282684326172s
Received healthy response to inference request in 1.6238765716552734s
Received healthy response to inference request in 1.7802789211273193s
Received healthy response to inference request in 2.020712375640869s
Received healthy response to inference request in 1.6296629905700684s
Received healthy response to inference request in 1.8338661193847656s
Received healthy response to inference request in 2.8274781703948975s
Received healthy response to inference request in 1.7525455951690674s
Received healthy response to inference request in 1.8281528949737549s
Received healthy response to inference request in 1.9001948833465576s
Received healthy response to inference request in 1.7593994140625s
Received healthy response to inference request in 1.9905619621276855s
Received healthy response to inference request in 2.1913399696350098s
Received healthy response to inference request in 1.7331883907318115s
Received healthy response to inference request in 2.0902562141418457s
Received healthy response to inference request in 1.8115684986114502s
Received healthy response to inference request in 1.814558506011963s
Received healthy response to inference request in 1.6388678550720215s
Received healthy response to inference request in 1.8034570217132568s
Received healthy response to inference request in 1.914551019668579s
HTTP Request: %s %s "%s %d %s"
Received healthy response to inference request in 1.714935302734375s
30 requests
1 failed requests
5th percentile: 1.6301939606666564
10th percentile: 1.6380653619766234
20th percentile: 1.7149765014648437
30th percentile: 1.7319737434387208
40th percentile: 1.7719271183013916
50th percentile: 1.8130635023117065
60th percentile: 1.8321340560913086
70th percentile: 1.9038070678710937
80th percentile: 1.9965920448303223
90th percentile: 2.1003645896911625
95th percentile: 2.541215980052946
99th percentile: 15.112517249584211
mean time: 2.465646266937256
%s, retrying in %s seconds...
Received healthy response to inference request in 1.7993643283843994s
Received healthy response to inference request in 1.9157803058624268s
Received healthy response to inference request in 1.7958850860595703s
Received healthy response to inference request in 1.7980985641479492s
Received healthy response to inference request in 1.7972030639648438s
Received healthy response to inference request in 1.9970002174377441s
Received healthy response to inference request in 1.8239500522613525s
Received healthy response to inference request in 2.0660791397094727s
Received healthy response to inference request in 1.785902738571167s
Received healthy response to inference request in 1.998765230178833s
Received healthy response to inference request in 1.7993197441101074s
Received healthy response to inference request in 1.6839840412139893s
Received healthy response to inference request in 2.1525299549102783s
Received healthy response to inference request in 1.6796472072601318s
Received healthy response to inference request in 1.8192603588104248s
Received healthy response to inference request in 1.984381914138794s
Received healthy response to inference request in 1.8132987022399902s
Received healthy response to inference request in 1.607182264328003s
Received healthy response to inference request in 1.7197372913360596s
Received healthy response to inference request in 1.6247296333312988s
Received healthy response to inference request in 1.5853862762451172s
Received healthy response to inference request in 2.158994197845459s
Received healthy response to inference request in 1.5921504497528076s
Received healthy response to inference request in 1.636763572692871s
Received healthy response to inference request in 1.7668156623840332s
Received healthy response to inference request in 1.8528931140899658s
Received healthy response to inference request in 1.5933091640472412s
Received healthy response to inference request in 1.6587703227996826s
Received healthy response to inference request in 1.958470344543457s
Received healthy response to inference request in 1.6959772109985352s
30 requests
0 failed requests
5th percentile: 1.5926718711853027
10th percentile: 1.6057949542999268
20th percentile: 1.6543689727783204
30th percentile: 1.6923792600631713
40th percentile: 1.7782679080963135
50th percentile: 1.7976508140563965
60th percentile: 1.8049380779266357
70th percentile: 1.8326329708099365
80th percentile: 1.9636526584625245
90th percentile: 2.005496621131897
95th percentile: 2.1136270880699155
99th percentile: 2.1571195673942567
mean time: 1.8053876717885335
Pipeline stage StressChecker completed in 134.19s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.70s
Shutdown handler de-registered
chaiml-2fe5-c13f-linear_57126_v3 status is now deployed due to DeploymentManager action
chaiml-2fe5-c13f-linear_57126_v3 status is now inactive due to auto deactivation removed underperforming models
chaiml-2fe5-c13f-linear_57126_v3 status is now torndown due to DeploymentManager action