Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-02f4-69d4-linear-30131-v9-uploader
Waiting for job on chaiml-02f4-69d4-linear-30131-v9-uploader to finish
chaiml-02f4-69d4-linear-30131-v9-uploader: Using quantization_mode: fp8
chaiml-02f4-69d4-linear-30131-v9-uploader: Repo ChaiML/02f4-69d4-linear-w01-FP8 already ends in FP8. Skipping...
chaiml-02f4-69d4-linear-30131-v9-uploader: Checking if ChaiML/02f4-69d4-linear-w01-FP8 already exists in ChaiML
chaiml-02f4-69d4-linear-30131-v9-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-02f4-69d4-linear-30131-v9-uploader: Downloading snapshot of ChaiML/02f4-69d4-linear-w01-FP8...
chaiml-02f4-69d4-linear-30131-v9-uploader: Downloaded in 12.145s
chaiml-02f4-69d4-linear-30131-v9-uploader: Processed model ChaiML/02f4-69d4-linear-w01-FP8 in 15.618s
chaiml-02f4-69d4-linear-30131-v9-uploader: creating bucket guanaco-vllm-models
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-30131-v9-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-02f4-69d4-linear-30131-v9-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-30131-v9-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-30131-v9-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-30131-v9-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-30131-v9-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-02f4-69d4-linear-30131-v9-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-02f4-69d4-linear-30131-v9-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-02f4-69d4-linear-30131-v9-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-02f4-69d4-linear-30131-v9-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-02f4-69d4-linear-30131-v9-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/.gitattributes
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/model.safetensors.index.json
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/recipe.yaml
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/generation_config.json
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/special_tokens_map.json
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/config.json
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/tokenizer_config.json
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/tokenizer.json
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/model-00006-of-00006.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/model-00006-of-00006.safetensors
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/model-00001-of-00006.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/model-00001-of-00006.safetensors
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/model-00002-of-00006.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/model-00002-of-00006.safetensors
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/model-00003-of-00006.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/model-00003-of-00006.safetensors
chaiml-02f4-69d4-linear-30131-v9-uploader: cp /dev/shm/model_output/model-00004-of-00006.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-30131-v9/default/model-00004-of-00006.safetensors
Job chaiml-02f4-69d4-linear-30131-v9-uploader completed after 73.02s with status: succeeded
Stopping job with name chaiml-02f4-69d4-linear-30131-v9-uploader
Pipeline stage VLLMUploader completed in 74.08s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.35s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-02f4-69d4-linear-30131-v9
Waiting for inference service chaiml-02f4-69d4-linear-30131-v9 to be ready
Inference service chaiml-02f4-69d4-linear-30131-v9 ready after 150.84839344024658s
Pipeline stage VLLMDeployer completed in 151.44s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.1628825664520264s
Received healthy response to inference request in 3.115839719772339s
Received healthy response to inference request in 3.050811529159546s
Received healthy response to inference request in 2.7939343452453613s
Received healthy response to inference request in 3.322934150695801s
Received healthy response to inference request in 3.31838059425354s
Received healthy response to inference request in 2.8441505432128906s
Received healthy response to inference request in 2.7654359340667725s
Received healthy response to inference request in 2.8914763927459717s
Received healthy response to inference request in 3.3730924129486084s
Received healthy response to inference request in 3.2066314220428467s
Received healthy response to inference request in 3.1939010620117188s
Received healthy response to inference request in 3.044445037841797s
Received healthy response to inference request in 3.301299810409546s
Received healthy response to inference request in 2.9640045166015625s
Received healthy response to inference request in 2.769827127456665s
Received healthy response to inference request in 2.9488863945007324s
Received healthy response to inference request in 2.970794916152954s
Received healthy response to inference request in 3.6924123764038086s
Received healthy response to inference request in 2.8315677642822266s
Received healthy response to inference request in 2.814931631088257s
Received healthy response to inference request in 2.8236711025238037s
Received healthy response to inference request in 2.7880921363830566s
Received healthy response to inference request in 3.310490608215332s
Received healthy response to inference request in 2.8531203269958496s
Received healthy response to inference request in 3.0832314491271973s
Received healthy response to inference request in 2.852487325668335s
Received healthy response to inference request in 3.17779803276062s
Received healthy response to inference request in 2.924337387084961s
30 requests
1 failed requests
5th percentile: 2.7780463814735414
10th percentile: 2.7933501243591308
20th percentile: 2.829988431930542
30th percentile: 2.852930426597595
40th percentile: 2.9390667915344237
50th percentile: 3.0076199769973755
60th percentile: 3.096274757385254
70th percentile: 3.1826289415359494
80th percentile: 3.3031379699707033
90th percentile: 3.3279499769210816
95th percentile: 3.5487183928489676
99th percentile: 15.350788140296949
mean time: 3.6101176182429
%s, retrying in %s seconds...
Received healthy response to inference request in 3.814924955368042s
Received healthy response to inference request in 3.002901792526245s
Received healthy response to inference request in 3.128438949584961s
Received healthy response to inference request in 3.312206268310547s
Received healthy response to inference request in 2.8908464908599854s
Received healthy response to inference request in 3.3044018745422363s
Received healthy response to inference request in 2.698438882827759s
Received healthy response to inference request in 3.3596951961517334s
Received healthy response to inference request in 2.8538119792938232s
Received healthy response to inference request in 3.3133888244628906s
Received healthy response to inference request in 3.5931949615478516s
Received healthy response to inference request in 2.7866556644439697s
Received healthy response to inference request in 3.397477149963379s
Received healthy response to inference request in 2.9303243160247803s
Received healthy response to inference request in 3.067275285720825s
Received healthy response to inference request in 2.8406472206115723s
Received healthy response to inference request in 2.8842527866363525s
Received healthy response to inference request in 3.7047524452209473s
Received healthy response to inference request in 2.8113739490509033s
Received healthy response to inference request in 3.6250855922698975s
Received healthy response to inference request in 3.103358030319214s
Received healthy response to inference request in 3.124271869659424s
Received healthy response to inference request in 3.291361093521118s
Received healthy response to inference request in 2.7167012691497803s
Received healthy response to inference request in 2.852800130844116s
Received healthy response to inference request in 2.721257209777832s
Received healthy response to inference request in 2.77217173576355s
Received healthy response to inference request in 2.797985792160034s
Received healthy response to inference request in 2.9032435417175293s
Received healthy response to inference request in 3.088075876235962s
30 requests
0 failed requests
5th percentile: 2.7187514424324037
10th percentile: 2.767080283164978
20th percentile: 2.8086963176727293
30th percentile: 2.8535084247589113
40th percentile: 2.8982847213745115
50th percentile: 3.035088539123535
60th percentile: 3.1117235660552978
70th percentile: 3.2952733278274535
80th percentile: 3.322650098800659
90th percentile: 3.5963840246200562
95th percentile: 3.6689023613929748
99th percentile: 3.7829749274253848
mean time: 3.0897107044855754
Pipeline stage StressChecker completed in 210.45s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.78s
Shutdown handler de-registered
chaiml-02f4-69d4-linear_30131_v9 status is now deployed due to DeploymentManager action
chaiml-02f4-69d4-linear_30131_v9 status is now inactive due to admin request
chaiml-02f4-69d4-linear_30131_v9 status is now torndown due to DeploymentManager action