Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-4d70-fd43-linear-w01-v36-uploader
Waiting for job on chaiml-4d70-fd43-linear-w01-v36-uploader to finish
chaiml-4d70-fd43-linear-w01-v36-uploader: Using quantization_mode: fp8
chaiml-4d70-fd43-linear-w01-v36-uploader: Checking if ChaiML/4d70-fd43-linear-w01-FP8 already exists in ChaiML
chaiml-4d70-fd43-linear-w01-v36-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-4d70-fd43-linear-w01-v36-uploader: Downloading snapshot of ChaiML/4d70-fd43-linear-w01-FP8...
chaiml-4d70-fd43-linear-w01-v36-uploader: Downloaded in 8.195s
chaiml-4d70-fd43-linear-w01-v36-uploader: Processed model ChaiML/4d70-fd43-linear-w01 in 11.812s
chaiml-4d70-fd43-linear-w01-v36-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-4d70-fd43-linear-w01-v36-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-4d70-fd43-linear-w01-v36-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-4d70-fd43-linear-w01-v36-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/.gitattributes
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/special_tokens_map.json
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/chat_template.jinja
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/generation_config.json
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/recipe.yaml
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/config.json
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/model.safetensors.index.json
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/tokenizer_config.json
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/tokenizer.json
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/model-00003-of-00003.safetensors s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/model-00003-of-00003.safetensors
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/model-00002-of-00003.safetensors s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/model-00002-of-00003.safetensors
chaiml-4d70-fd43-linear-w01-v36-uploader: cp /dev/shm/model_output/model-00001-of-00003.safetensors s3://guanaco-vllm-models/chaiml-4d70-fd43-linear-w01-v36/default/model-00001-of-00003.safetensors
Job chaiml-4d70-fd43-linear-w01-v36-uploader completed after 103.29s with status: succeeded
Stopping job with name chaiml-4d70-fd43-linear-w01-v36-uploader
Pipeline stage VLLMUploader completed in 103.88s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-4d70-fd43-linear-w01-v36
Waiting for inference service chaiml-4d70-fd43-linear-w01-v36 to be ready
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
HTTP Request: %s %s "%s %d %s"
Inference service chaiml-4d70-fd43-linear-w01-v36 ready after 151.16785287857056s
Pipeline stage VLLMDeployer completed in 151.82s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.9363057613372803s
Received healthy response to inference request in 1.9036874771118164s
Received healthy response to inference request in 1.8547394275665283s
Received healthy response to inference request in 1.9922089576721191s
Received healthy response to inference request in 1.8774511814117432s
Received healthy response to inference request in 2.096795082092285s
Received healthy response to inference request in 1.9510219097137451s
Received healthy response to inference request in 1.8676550388336182s
Received healthy response to inference request in 1.8698670864105225s
Received healthy response to inference request in 1.8914430141448975s
Received healthy response to inference request in 2.0095067024230957s
Received healthy response to inference request in 1.856358289718628s
Received healthy response to inference request in 1.90336012840271s
Received healthy response to inference request in 1.8812615871429443s
Received healthy response to inference request in 1.9746673107147217s
Received healthy response to inference request in 1.911226749420166s
Received healthy response to inference request in 2.1883559226989746s
Received healthy response to inference request in 1.8899261951446533s
Received healthy response to inference request in 1.8816478252410889s
Received healthy response to inference request in 1.9347970485687256s
Received healthy response to inference request in 1.8652703762054443s
Received healthy response to inference request in 1.954056739807129s
Received healthy response to inference request in 1.9965341091156006s
Received healthy response to inference request in 2.0269601345062256s
Received healthy response to inference request in 1.88551664352417s
Received healthy response to inference request in 1.8904459476470947s
Received healthy response to inference request in 2.0481491088867188s
Received healthy response to inference request in 1.8508484363555908s
Received healthy response to inference request in 1.8613336086273193s
Received healthy response to inference request in 2.22000789642334s
30 requests
0 failed requests
5th percentile: 1.8554679155349731
10th percentile: 1.8608360767364502
20th percentile: 1.8694246768951417
30th percentile: 1.8815319538116455
40th percentile: 1.8902380466461182
50th percentile: 1.9035238027572632
60th percentile: 1.9354005336761475
70th percentile: 1.9602399110794066
80th percentile: 1.9991286277770997
90th percentile: 2.0530137062072753
95th percentile: 2.147153544425964
99th percentile: 2.210828824043274
mean time: 1.9423801898956299
Pipeline stage StressChecker completed in 61.92s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.69s
Shutdown handler de-registered
chaiml-4d70-fd43-linear-w01_v36 status is now deployed due to DeploymentManager action
chaiml-4d70-fd43-linear-w01_v36 status is now inactive due to auto deactivation removed underperforming models