Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-2fe5-c13f-linear-w01-v46-uploader
Waiting for job on chaiml-2fe5-c13f-linear-w01-v46-uploader to finish
chaiml-2fe5-c13f-linear-w01-v46-uploader: Using quantization_mode: fp8
chaiml-2fe5-c13f-linear-w01-v46-uploader: Checking if ChaiML/2fe5-c13f-linear-w01-FP8 already exists in ChaiML
chaiml-2fe5-c13f-linear-w01-v46-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-2fe5-c13f-linear-w01-v46-uploader: Downloading snapshot of ChaiML/2fe5-c13f-linear-w01-FP8...
chaiml-2fe5-c13f-linear-w01-v46-uploader: Downloaded in 8.478s
chaiml-2fe5-c13f-linear-w01-v46-uploader: Processed model ChaiML/2fe5-c13f-linear-w01 in 12.061s
chaiml-2fe5-c13f-linear-w01-v46-uploader: creating bucket guanaco-vllm-models
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/.gitattributes
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/config.json
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/recipe.yaml
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/generation_config.json
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/chat_template.jinja
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/special_tokens_map.json
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/model.safetensors.index.json
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/tokenizer_config.json
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/tokenizer.json
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/model-00003-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/model-00003-of-00003.safetensors
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/model-00001-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/model-00001-of-00003.safetensors
chaiml-2fe5-c13f-linear-w01-v46-uploader: cp /dev/shm/model_output/model-00002-of-00003.safetensors s3://guanaco-vllm-models/chaiml-2fe5-c13f-linear-w01-v46/default/model-00002-of-00003.safetensors
Job chaiml-2fe5-c13f-linear-w01-v46-uploader completed after 124.1s with status: succeeded
Stopping job with name chaiml-2fe5-c13f-linear-w01-v46-uploader
Pipeline stage VLLMUploader completed in 124.74s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.14s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-2fe5-c13f-linear-w01-v46
Waiting for inference service chaiml-2fe5-c13f-linear-w01-v46 to be ready
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Inference service chaiml-2fe5-c13f-linear-w01-v46 ready after 161.15126872062683s
Pipeline stage VLLMDeployer completed in 161.65s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.690535306930542s
Received healthy response to inference request in 1.625490665435791s
Received healthy response to inference request in 2.1019203662872314s
Received healthy response to inference request in 1.7451541423797607s
Received healthy response to inference request in 1.6903352737426758s
Received healthy response to inference request in 2.2559797763824463s
Received healthy response to inference request in 1.6755976676940918s
Received healthy response to inference request in 2.0515806674957275s
Received healthy response to inference request in 1.7327401638031006s
Received healthy response to inference request in 1.6284708976745605s
Received healthy response to inference request in 1.6251511573791504s
Received healthy response to inference request in 1.9411804676055908s
Received healthy response to inference request in 1.659912347793579s
Received healthy response to inference request in 1.636441946029663s
Received healthy response to inference request in 1.8835008144378662s
Received healthy response to inference request in 1.619764804840088s
Received healthy response to inference request in 1.630279302597046s
Received healthy response to inference request in 2.239990234375s
Received healthy response to inference request in 1.7717974185943604s
Received healthy response to inference request in 1.644090175628662s
Received healthy response to inference request in 2.8015215396881104s
Received healthy response to inference request in 1.654052495956421s
Received healthy response to inference request in 1.5916781425476074s
Received healthy response to inference request in 1.5883605480194092s
Received healthy response to inference request in 1.8340668678283691s
Received healthy response to inference request in 1.6160731315612793s
Received healthy response to inference request in 1.6167781352996826s
Received healthy response to inference request in 1.623664140701294s
Received healthy response to inference request in 1.6168816089630127s
Received healthy response to inference request in 2.4239485263824463s
30 requests
0 failed requests
5th percentile: 1.6026558876037598
10th percentile: 1.6167076349258422
20th percentile: 1.6228842735290527
30th percentile: 1.6275768280029297
40th percentile: 1.6410308837890626
50th percentile: 1.6677550077438354
60th percentile: 1.7074172496795654
70th percentile: 1.790478253364563
80th percentile: 1.9632605075836185
90th percentile: 2.2415891885757446
95th percentile: 2.3483625888824458
99th percentile: 2.692025365829468
mean time: 1.8072312911351522
Pipeline stage StressChecker completed in 57.16s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.58s
Shutdown handler de-registered
chaiml-2fe5-c13f-linear-w01_v46 status is now deployed due to DeploymentManager action