Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-2fe5-c13f-linear-w01-v41-uploader
Waiting for job on chaiml-2fe5-c13f-linear-w01-v41-uploader to finish
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
chaiml-2fe5-c13f-linear-w01-v41-uploader: Using quantization_mode: none
chaiml-2fe5-c13f-linear-w01-v41-uploader: Downloading snapshot of ChaiML/2fe5-c13f-linear-w01...
chaiml-2fe5-c13f-linear-w01-v41-uploader:
Fetching 14 files: 0%| | 0/14 [00:00<?, ?it/s]
Fetching 14 files: 7%|▋ | 1/14 [00:00<00:04, 3.14it/s]
Fetching 14 files: 36%|███▌ | 5/14 [00:00<00:00, 14.41it/s]
Fetching 14 files: 57%|█████▋ | 8/14 [00:11<00:10, 1.73s/it]
Fetching 14 files: 64%|██████▍ | 9/14 [00:11<00:07, 1.49s/it]
Fetching 14 files: 100%|██████████| 14/14 [00:11<00:00, 1.21it/s]
chaiml-2fe5-c13f-linear-w01-v41-uploader: Downloaded in 11.657s
chaiml-2fe5-c13f-linear-w01-v41-uploader: cp /dev/shm/model_output/model-00002-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v41/model-00002-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v41-uploader: cp /dev/shm/model_output/model-00004-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v41/model-00004-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v41-uploader: cp /dev/shm/model_output/model-00001-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v41/model-00001-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v41-uploader: cp /dev/shm/model_output/model-00005-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v41/model-00005-of-00005.safetensors
chaiml-2fe5-c13f-linear-w01-v41-uploader: cp /dev/shm/model_output/model-00003-of-00005.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v41/model-00003-of-00005.safetensors
Job chaiml-2fe5-c13f-linear-w01-v41-uploader completed after 191.04s with status: succeeded
Stopping job with name chaiml-2fe5-c13f-linear-w01-v41-uploader
Pipeline stage VLLMUploader completed in 193.24s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.14s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-2fe5-c13f-linear-w01-v41
Waiting for inference service chaiml-2fe5-c13f-linear-w01-v41 to be ready
Inference service chaiml-2fe5-c13f-linear-w01-v41 ready after 171.11400699615479s
Pipeline stage VLLMDeployer completed in 175.94s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.146657943725586s
Received healthy response to inference request in 2.128891706466675s
Received healthy response to inference request in 1.4461455345153809s
Received healthy response to inference request in 2.16194224357605s
Received healthy response to inference request in 2.4394326210021973s
Received healthy response to inference request in 2.475464344024658s
Received healthy response to inference request in 2.6241395473480225s
Received healthy response to inference request in 2.289330244064331s
Received healthy response to inference request in 3.0175440311431885s
Received healthy response to inference request in 1.6160457134246826s
Received healthy response to inference request in 2.212921142578125s
Received healthy response to inference request in 2.774818181991577s
Received healthy response to inference request in 2.075639486312866s
Received healthy response to inference request in 2.62902569770813s
Received healthy response to inference request in 3.134507894515991s
Received healthy response to inference request in 2.316896915435791s
Received healthy response to inference request in 2.742427349090576s
Received healthy response to inference request in 2.912755250930786s
Received healthy response to inference request in 2.780473232269287s
HTTP Request: %s %s "%s %d %s"
Received healthy response to inference request in 2.9002747535705566s
Received healthy response to inference request in 2.36657452583313s
Received healthy response to inference request in 2.113779067993164s
Received healthy response to inference request in 1.4494407176971436s
Received healthy response to inference request in 1.6293601989746094s
Received healthy response to inference request in 1.5475609302520752s
Received healthy response to inference request in 1.858046054840088s
Received healthy response to inference request in 1.9216439723968506s
Received healthy response to inference request in 2.1546578407287598s
Received healthy response to inference request in 2.1237618923187256s
Received healthy response to inference request in 1.453615665435791s
30 requests
0 failed requests
5th percentile: 1.451319444179535
10th percentile: 1.5381664037704468
20th percentile: 1.8123088836669923
30th percentile: 2.102337193489075
40th percentile: 2.1395514488220213
50th percentile: 2.1874316930770874
60th percentile: 2.3367679595947264
70th percentile: 2.520066905021667
80th percentile: 2.7489055156707765
90th percentile: 2.9015228033065794
95th percentile: 2.970389080047607
99th percentile: 3.1005883741378786
mean time: 2.2481258233388264
Pipeline stage StressChecker completed in 106.56s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.60s
Shutdown handler de-registered
chaiml-2fe5-c13f-linear-w01_v41 status is now deployed due to DeploymentManager action
chaiml-2fe5-c13f-linear-w01_v41 status is now inactive due to auto deactivation removed underperforming models
chaiml-2fe5-c13f-linear-w01_v41 status is now torndown due to DeploymentManager action