Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-2fe5-c13f-linear-57126-v7-uploader
Waiting for job on chaiml-2fe5-c13f-linear-57126-v7-uploader to finish
chaiml-2fe5-c13f-linear-57126-v7-uploader: Using quantization_mode: fp8
chaiml-2fe5-c13f-linear-57126-v7-uploader: Repo ChaiML/2fe5-c13f-linear-w01-FP8 already ends in FP8. Skipping...
chaiml-2fe5-c13f-linear-57126-v7-uploader: Checking if ChaiML/2fe5-c13f-linear-w01-FP8 already exists in ChaiML
chaiml-2fe5-c13f-linear-57126-v7-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-2fe5-c13f-linear-57126-v7-uploader: Downloading snapshot of ChaiML/2fe5-c13f-linear-w01-FP8...
chaiml-2fe5-c13f-linear-57126-v7-uploader: Downloaded in 8.819s
chaiml-2fe5-c13f-linear-57126-v7-uploader: Processed model ChaiML/2fe5-c13f-linear-w01-FP8 in 12.284s
chaiml-2fe5-c13f-linear-57126-v7-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/.gitattributes
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/special_tokens_map.json
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/generation_config.json
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/model.safetensors.index.json
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/config.json
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/recipe.yaml
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/chat_template.jinja
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/tokenizer_config.json
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/tokenizer.json
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/model-00003-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/model-00003-of-00003.safetensors
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/model-00002-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/model-00002-of-00003.safetensors
chaiml-2fe5-c13f-linear-57126-v7-uploader: cp /dev/shm/model_output/model-00001-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-57126-v7/default/model-00001-of-00003.safetensors
Job chaiml-2fe5-c13f-linear-57126-v7-uploader completed after 71.99s with status: succeeded
Stopping job with name chaiml-2fe5-c13f-linear-57126-v7-uploader
Pipeline stage VLLMUploader completed in 72.46s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.11s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-2fe5-c13f-linear-57126-v7
Waiting for inference service chaiml-2fe5-c13f-linear-57126-v7 to be ready
Inference service chaiml-2fe5-c13f-linear-57126-v7 ready after 160.45669555664062s
Pipeline stage VLLMDeployer completed in 160.98s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.669705867767334s
Received healthy response to inference request in 1.7756013870239258s
Received healthy response to inference request in 2.8788669109344482s
Received healthy response to inference request in 1.6841034889221191s
Received healthy response to inference request in 1.8953444957733154s
Received healthy response to inference request in 1.7251582145690918s
Received healthy response to inference request in 1.802703857421875s
Received healthy response to inference request in 2.1158390045166016s
Received healthy response to inference request in 2.0699267387390137s
Received healthy response to inference request in 1.6412293910980225s
Received healthy response to inference request in 2.8339576721191406s
Received healthy response to inference request in 1.6041896343231201s
Received healthy response to inference request in 1.863058090209961s
Received healthy response to inference request in 1.7407581806182861s
Received healthy response to inference request in 2.014237880706787s
Received healthy response to inference request in 1.673382043838501s
Received healthy response to inference request in 1.5719170570373535s
Received healthy response to inference request in 2.1586692333221436s
Received healthy response to inference request in 1.7572674751281738s
Received healthy response to inference request in 1.990027666091919s
Received healthy response to inference request in 1.839001178741455s
Received healthy response to inference request in 1.6079678535461426s
Received healthy response to inference request in 1.7201392650604248s
Received healthy response to inference request in 1.6801726818084717s
Received healthy response to inference request in 2.1090595722198486s
Received healthy response to inference request in 2.0613393783569336s
Received healthy response to inference request in 2.0045289993286133s
Received healthy response to inference request in 2.123623847961426s
Received healthy response to inference request in 2.253316640853882s
Received healthy response to inference request in 1.9639489650726318s
30 requests
0 failed requests
5th percentile: 1.6058898329734803
10th percentile: 1.6379032373428344
20th percentile: 1.6788145542144775
30th percentile: 1.7236525297164917
40th percentile: 1.768267822265625
50th percentile: 1.851029634475708
60th percentile: 1.9743804454803466
70th percentile: 2.0283683300018307
80th percentile: 2.110415458679199
90th percentile: 2.1681339740753174
95th percentile: 2.5726692080497724
99th percentile: 2.865843231678009
mean time: 1.9276347557703655
Pipeline stage StressChecker completed in 60.68s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.69s
Shutdown handler de-registered
chaiml-2fe5-c13f-linear_57126_v7 status is now deployed due to DeploymentManager action
chaiml-2fe5-c13f-linear_57126_v7 status is now inactive due to system request