Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v1-q235b-lr-99625-v5-uploader
Waiting for job on chaiml-pony-v1-q235b-lr-99625-v5-uploader to finish
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Using quantization_mode: w4a16
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Checking if ChaiML/pony-v1-q235b-lr1e4ep1r64g4-W4A16 already exists in ChaiML
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Model already exists. Downloading to /dev/shm/model_output...
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Downloading snapshot of ChaiML/pony-v1-q235b-lr1e4ep1r64g4-W4A16...
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Downloaded in 44.423s
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Processed model ChaiML/pony-v1-q235b-lr1e4ep1r64g4 in 45.086s
chaiml-pony-v1-q235b-lr-99625-v5-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v1-q235b-lr-99625-v5-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/.gitattributes
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/tokenizer_config.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/generation_config.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/special_tokens_map.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/special_tokens_map.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/added_tokens.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/added_tokens.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/quantization_config.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/quantization_config.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/merges.txt s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/merges.txt
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/config.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/chat_template.jinja
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model.safetensors.index.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/vocab.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/vocab.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/tokenizer.json
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00027-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00027-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00008-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00008-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00025-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00025-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00009-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00009-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00005-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00005-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00015-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00015-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00016-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00016-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00011-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00011-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00026-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00026-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00006-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00006-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00003-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00003-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00021-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00021-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00023-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00023-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00002-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00002-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00007-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00007-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00010-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00010-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00001-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00001-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00013-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00013-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00004-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00004-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00022-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00022-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00020-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00020-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00018-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00018-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00014-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00014-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00012-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00012-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00024-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00024-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00019-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00019-of-00027.safetensors
chaiml-pony-v1-q235b-lr-99625-v5-uploader: cp /dev/shm/model_output/model-00017-of-00027.safetensors s3://guanaco-vllm-models/chaiml-pony-v1-q235b-lr-99625-v5/default/model-00017-of-00027.safetensors
Job chaiml-pony-v1-q235b-lr-99625-v5-uploader completed after 245.01s with status: succeeded
Stopping job with name chaiml-pony-v1-q235b-lr-99625-v5-uploader
Pipeline stage VLLMUploader completed in 258.02s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v1-q235b-lr-99625-v5
Waiting for inference service chaiml-pony-v1-q235b-lr-99625-v5 to be ready
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-mistral-24b-2048_15988_v1: ('http://chaiml-mistral-24b-2048-15988-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048_54327_v6: ('http://chaiml-mistral-24b-2048-54327-v6-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-mistral-24b-2048-_2678_v3: ('http://chaiml-mistral-24b-2048-2678-v3-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Failed to get response for submission chaiml-grpo-q235b-kimid_37540_v1: HTTPConnectionPool(host='chaiml-grpo-q235b-kimid-37540-v1-predictor.tenant-chaiml-guanaco.kchai-coreweave-us-east-04a.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Inference service chaiml-pony-v1-q235b-lr-99625-v5 ready after 513.447169303894s
Pipeline stage VLLMDeployer completed in 519.44s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 6.348655939102173s
Received healthy response to inference request in 5.207133531570435s
Received healthy response to inference request in 6.744817495346069s
Received healthy response to inference request in 3.4303953647613525s
Received healthy response to inference request in 2.2174031734466553s
Received healthy response to inference request in 4.142759561538696s
Received healthy response to inference request in 4.023093938827515s
Received healthy response to inference request in 2.5358786582946777s
Received healthy response to inference request in 3.3880016803741455s
Received healthy response to inference request in 3.3999626636505127s
Received healthy response to inference request in 3.1108789443969727s
Received healthy response to inference request in 5.1338841915130615s
Received healthy response to inference request in 7.035696268081665s
Received healthy response to inference request in 5.191941499710083s
HTTP Request: %s %s "%s %d %s"
Received healthy response to inference request in 5.975740909576416s
Received healthy response to inference request in 6.050364017486572s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Received healthy response to inference request in 6.412262439727783s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Received healthy response to inference request in 6.406656980514526s
Received healthy response to inference request in 2.1045114994049072s
Received healthy response to inference request in 2.335050582885742s
Received healthy response to inference request in 2.018406629562378s
Received healthy response to inference request in 2.0883266925811768s
Received healthy response to inference request in 1.9774372577667236s
Received healthy response to inference request in 2.019571542739868s
Received healthy response to inference request in 2.070319890975952s
Received healthy response to inference request in 2.057976007461548s
Received healthy response to inference request in 2.3724279403686523s
Received healthy response to inference request in 1.9750261306762695s
Received healthy response to inference request in 2.176828145980835s
Received healthy response to inference request in 2.3448503017425537s
30 requests
0 failed requests
5th percentile: 1.995873475074768
10th percentile: 2.0194550514221192
20th percentile: 2.084725332260132
30th percentile: 2.2052306652069094
40th percentile: 2.361396884918213
50th percentile: 3.249440312385559
60th percentile: 3.6674747943878163
70th percentile: 5.151301383972168
80th percentile: 5.990665531158448
90th percentile: 6.407217526435852
95th percentile: 6.59516772031784
99th percentile: 6.951341423988342
mean time: 3.743208662668864
Pipeline stage StressChecker completed in 255.24s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.63s
Shutdown handler de-registered
chaiml-pony-v1-q235b-lr_99625_v5 status is now deployed due to DeploymentManager action
chaiml-pony-v1-q235b-lr_99625_v5 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-v1-q235b-lr_99625_v5 status is now torndown due to DeploymentManager action