Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name rirv938-llama-8b-big-ret-4805-v4-mkmlizer
Waiting for job on rirv938-llama-8b-big-ret-4805-v4-mkmlizer to finish
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ _____ __ __ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ /___/ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ Version: 0.11.12 ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ https://mk1.ai ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ The license key for the current software has been verified as ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ belonging to: ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ Chai Research Corp. ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ║ ║
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: Downloaded to shared memory in 20.495s
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: quantizing model to /dev/shm/model_cache, profile:t0, folder:/tmp/tmp_r4uc05h, device:0
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: quantized model in 82.803s
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: Processed model rirv938/llama_8b_big_retune_6m_4392 in 103.299s
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: creating bucket guanaco-mkml-models
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/rirv938-llama-8b-big-ret-4805-v4
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/rirv938-llama-8b-big-ret-4805-v4/config.json
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/rirv938-llama-8b-big-ret-4805-v4/special_tokens_map.json
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/rirv938-llama-8b-big-ret-4805-v4/tokenizer_config.json
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/rirv938-llama-8b-big-ret-4805-v4/tokenizer.json
rirv938-llama-8b-big-ret-4805-v4-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/rirv938-llama-8b-big-ret-4805-v4/flywheel_model.0.safetensors
rirv938-llama-8b-big-ret-4805-v4-mkmlizer:
Loading 0: 0%| | 0/291 [00:00<?, ?it/s]
Loading 0: 1%| | 3/291 [00:00<00:54, 5.26it/s]
Loading 0: 1%|▏ | 4/291 [00:01<01:28, 3.23it/s]
Loading 0: 2%|▏ | 5/291 [00:01<01:58, 2.42it/s]
Loading 0: 3%|▎ | 8/291 [00:01<01:01, 4.62it/s]
Loading 0: 3%|▎ | 9/291 [00:02<01:00, 4.68it/s]
Loading 0: 3%|▎ | 10/291 [00:02<00:52, 5.34it/s]
Loading 0: 4%|▍ | 12/291 [00:02<01:03, 4.41it/s]
Loading 0: 4%|▍ | 13/291 [00:03<01:23, 3.31it/s]
Loading 0: 5%|▍ | 14/291 [00:04<01:46, 2.61it/s]
Loading 0: 6%|▌ | 17/291 [00:04<01:01, 4.49it/s]
Loading 0: 6%|▌ | 18/291 [00:04<00:57, 4.74it/s]
Loading 0: 7%|▋ | 19/291 [00:04<00:50, 5.35it/s]
Loading 0: 7%|▋ | 21/291 [00:05<01:00, 4.46it/s]
Loading 0: 8%|▊ | 22/291 [00:05<01:19, 3.36it/s]
Loading 0: 8%|▊ | 23/291 [00:06<01:41, 2.65it/s]
Loading 0: 9%|▉ | 26/291 [00:06<00:59, 4.48it/s]
Loading 0: 9%|▉ | 27/291 [00:06<00:55, 4.73it/s]
Loading 0: 10%|█ | 30/291 [00:07<00:56, 4.65it/s]
Loading 0: 11%|█ | 31/291 [00:07<01:12, 3.60it/s]
Loading 0: 11%|█ | 32/291 [00:08<01:31, 2.84it/s]
Loading 0: 12%|█▏ | 35/291 [00:08<00:56, 4.53it/s]
Loading 0: 12%|█▏ | 36/291 [00:08<00:53, 4.76it/s]
Loading 0: 13%|█▎ | 37/291 [00:09<00:47, 5.33it/s]
Loading 0: 13%|█▎ | 39/291 [00:09<00:56, 4.50it/s]
Loading 0: 14%|█▎ | 40/291 [00:10<01:13, 3.41it/s]
Loading 0: 14%|█▍ | 41/291 [00:10<01:32, 2.70it/s]
Loading 0: 15%|█▌ | 44/291 [00:10<00:54, 4.53it/s]
Loading 0: 15%|█▌ | 45/291 [00:11<00:51, 4.77it/s]
Loading 0: 16%|█▋ | 48/291 [00:11<00:52, 4.67it/s]
Loading 0: 17%|█▋ | 49/291 [00:12<01:06, 3.63it/s]
Loading 0: 17%|█▋ | 50/291 [00:12<01:23, 2.88it/s]
Loading 0: 18%|█▊ | 53/291 [00:13<00:51, 4.59it/s]
Loading 0: 19%|█▊ | 54/291 [00:13<00:49, 4.81it/s]
Loading 0: 20%|█▉ | 57/291 [00:14<00:49, 4.68it/s]
Loading 0: 20%|█▉ | 58/291 [00:14<01:03, 3.65it/s]
Loading 0: 20%|██ | 59/291 [00:15<01:20, 2.88it/s]
Loading 0: 21%|██▏ | 62/291 [00:15<00:49, 4.61it/s]
Loading 0: 22%|██▏ | 63/291 [00:15<00:47, 4.83it/s]
Loading 0: 22%|██▏ | 64/291 [00:15<00:42, 5.37it/s]
Loading 0: 23%|██▎ | 66/291 [00:16<00:49, 4.53it/s]
Loading 0: 23%|██▎ | 67/291 [00:16<01:05, 3.43it/s]
Loading 0: 23%|██▎ | 68/291 [00:17<01:22, 2.70it/s]
Loading 0: 24%|██▍ | 71/291 [00:17<00:48, 4.55it/s]
Loading 0: 25%|██▍ | 72/291 [00:17<00:45, 4.78it/s]
Loading 0: 26%|██▌ | 75/291 [00:18<00:46, 4.68it/s]
Loading 0: 26%|██▌ | 76/291 [00:19<00:59, 3.64it/s]
Loading 0: 26%|██▋ | 77/291 [00:19<01:13, 2.90it/s]
Loading 0: 27%|██▋ | 80/291 [00:19<00:45, 4.61it/s]
Loading 0: 28%|██▊ | 81/291 [00:20<00:43, 4.83it/s]
Loading 0: 28%|██▊ | 82/291 [00:20<00:39, 5.35it/s]
Loading 0: 29%|██▊ | 83/291 [00:20<00:38, 5.45it/s]
Loading 0: 29%|██▉ | 84/291 [00:20<00:56, 3.63it/s]
Loading 0: 29%|██▉ | 85/291 [00:21<01:11, 2.87it/s]
Loading 0: 30%|██▉ | 86/291 [00:22<01:26, 2.36it/s]
Loading 0: 31%|███ | 89/291 [00:22<00:46, 4.31it/s]
Loading 0: 31%|███ | 90/291 [00:22<00:43, 4.65it/s]
Loading 0: 31%|███▏ | 91/291 [00:22<00:38, 5.25it/s]
Loading 0: 32%|███▏ | 93/291 [00:23<00:44, 4.42it/s]
Loading 0: 32%|███▏ | 94/291 [00:23<00:58, 3.34it/s]
Loading 0: 33%|███▎ | 95/291 [00:24<01:14, 2.65it/s]
Loading 0: 34%|███▎ | 98/291 [00:24<00:42, 4.59it/s]
Loading 0: 34%|███▍ | 99/291 [00:24<00:39, 4.83it/s]
Loading 0: 34%|███▍ | 100/291 [00:24<00:35, 5.43it/s]
Loading 0: 35%|███▌ | 102/291 [00:25<00:41, 4.53it/s]
Loading 0: 35%|███▌ | 103/291 [00:25<00:55, 3.41it/s]
Loading 0: 36%|███▌ | 104/291 [00:26<01:09, 2.71it/s]
Loading 0: 37%|███▋ | 107/291 [00:26<00:39, 4.61it/s]
Loading 0: 37%|███▋ | 108/291 [00:26<00:37, 4.84it/s]
Loading 0: 37%|███▋ | 109/291 [00:26<00:33, 5.41it/s]
Loading 0: 38%|███▊ | 111/291 [00:27<00:40, 4.49it/s]
Loading 0: 38%|███▊ | 112/291 [00:28<00:52, 3.38it/s]
Loading 0: 39%|███▉ | 113/291 [00:28<01:06, 2.67it/s]
Loading 0: 40%|███▉ | 116/291 [00:28<00:38, 4.59it/s]
Loading 0: 40%|████ | 117/291 [00:29<00:35, 4.83it/s]
Loading 0: 41%|████ | 118/291 [00:29<00:31, 5.44it/s]
Loading 0: 41%|████ | 120/291 [00:29<00:37, 4.54it/s]
Loading 0: 42%|████▏ | 121/291 [00:30<00:49, 3.42it/s]
Loading 0: 42%|████▏ | 122/291 [00:30<01:02, 2.71it/s]
Loading 0: 43%|████▎ | 125/291 [00:31<00:36, 4.58it/s]
Loading 0: 43%|████▎ | 126/291 [00:31<00:34, 4.83it/s]
Loading 0: 44%|████▎ | 127/291 [00:31<00:30, 5.41it/s]
Loading 0: 44%|████▍ | 129/291 [00:31<00:35, 4.54it/s]
Loading 0: 45%|████▍ | 130/291 [00:32<00:47, 3.42it/s]
Loading 0: 45%|████▌ | 131/291 [00:33<00:59, 2.70it/s]
Loading 0: 46%|████▌ | 134/291 [00:33<00:33, 4.63it/s]
Loading 0: 46%|████▋ | 135/291 [00:33<00:32, 4.87it/s]
Loading 0: 47%|████▋ | 138/291 [00:34<00:32, 4.73it/s]
Loading 0: 48%|████▊ | 139/291 [00:34<00:41, 3.67it/s]
Loading 0: 48%|████▊ | 140/291 [00:35<00:52, 2.88it/s]
Loading 0: 49%|████▉ | 143/291 [00:35<00:32, 4.61it/s]
Loading 0: 49%|████▉ | 144/291 [00:35<00:30, 4.83it/s]
Loading 0: 50%|█████ | 146/291 [00:35<00:22, 6.52it/s]
Loading 0: 51%|█████ | 148/291 [00:36<00:40, 3.49it/s]
Loading 0: 51%|█████ | 149/291 [00:37<00:49, 2.84it/s]
Loading 0: 52%|█████▏ | 152/291 [00:37<00:31, 4.44it/s]
Loading 0: 53%|█████▎ | 153/291 [00:37<00:29, 4.68it/s]
Loading 0: 53%|█████▎ | 154/291 [00:37<00:26, 5.17it/s]
Loading 0: 54%|█████▎ | 156/291 [00:38<00:30, 4.44it/s]
Loading 0: 54%|█████▍ | 157/291 [00:39<00:39, 3.40it/s]
Loading 0: 54%|█████▍ | 158/291 [00:39<00:49, 2.71it/s]
Loading 0: 55%|█████▌ | 161/291 [00:39<00:28, 4.61it/s]
Loading 0: 56%|█████▌ | 162/291 [00:40<00:26, 4.85it/s]
Loading 0: 56%|█████▌ | 163/291 [00:40<00:23, 5.44it/s]
Loading 0: 57%|█████▋ | 165/291 [00:40<00:27, 4.53it/s]
Loading 0: 57%|█████▋ | 166/291 [00:41<00:36, 3.43it/s]
Loading 0: 57%|█████▋ | 167/291 [00:41<00:45, 2.70it/s]
Loading 0: 58%|█████▊ | 170/291 [00:42<00:26, 4.62it/s]
Loading 0: 59%|█████▉ | 171/291 [00:42<00:24, 4.86it/s]
Loading 0: 59%|█████▉ | 172/291 [00:42<00:21, 5.46it/s]
Loading 0: 59%|█████▉ | 173/291 [00:42<00:31, 3.70it/s]
Loading 0: 60%|██████ | 175/291 [00:43<00:22, 5.05it/s]
Loading 0: 60%|██████ | 176/291 [00:43<00:21, 5.28it/s]
Loading 0: 62%|██████▏ | 179/291 [00:43<00:22, 4.91it/s]
Loading 0: 62%|██████▏ | 180/291 [00:44<00:30, 3.69it/s]
Loading 0: 62%|██████▏ | 181/291 [00:45<00:37, 2.90it/s]
Loading 0: 63%|██████▎ | 184/291 [00:45<00:22, 4.78it/s]
Loading 0: 64%|██████▎ | 185/291 [00:45<00:21, 4.98it/s]
Loading 0: 64%|██████▍ | 186/291 [00:45<00:19, 5.48it/s]
Loading 0: 64%|██████▍ | 187/291 [00:45<00:18, 5.50it/s]
Loading 0: 65%|██████▍ | 188/291 [00:46<00:28, 3.65it/s]
Loading 0: 65%|██████▍ | 189/291 [00:46<00:37, 2.73it/s]
Loading 0: 66%|██████▌ | 192/291 [00:47<00:26, 3.68it/s]
Loading 0: 66%|██████▋ | 193/291 [00:47<00:32, 3.04it/s]
Loading 0: 67%|██████▋ | 194/291 [00:48<00:38, 2.55it/s]
Loading 0: 68%|██████▊ | 197/291 [00:48<00:21, 4.34it/s]
Loading 0: 68%|██████▊ | 198/291 [00:48<00:19, 4.67it/s]
Loading 0: 69%|██████▉ | 201/291 [00:49<00:19, 4.64it/s]
Loading 0: 69%|██████▉ | 202/291 [00:50<00:24, 3.63it/s]
Loading 0: 70%|██████▉ | 203/291 [00:50<00:30, 2.91it/s]
Loading 0: 71%|███████ | 206/291 [00:50<00:18, 4.68it/s]
Loading 0: 71%|███████ | 207/291 [00:51<00:17, 4.91it/s]
Loading 0: 72%|███████▏ | 210/291 [00:51<00:17, 4.75it/s]
Loading 0: 73%|███████▎ | 211/291 [00:52<00:21, 3.70it/s]
Loading 0: 73%|███████▎ | 212/291 [00:52<00:26, 2.94it/s]
Loading 0: 74%|███████▍ | 215/291 [00:53<00:16, 4.69it/s]
Loading 0: 74%|███████▍ | 216/291 [00:53<00:15, 4.91it/s]
Loading 0: 75%|███████▍ | 217/291 [00:53<00:13, 5.45it/s]
Loading 0: 75%|███████▌ | 219/291 [00:53<00:15, 4.56it/s]
Loading 0: 76%|███████▌ | 220/291 [00:54<00:20, 3.45it/s]
Loading 0: 76%|███████▌ | 221/291 [00:55<00:25, 2.72it/s]
Loading 0: 77%|███████▋ | 224/291 [00:55<00:14, 4.57it/s]
Loading 0: 77%|███████▋ | 225/291 [00:55<00:13, 4.81it/s]
Loading 0: 78%|███████▊ | 226/291 [00:55<00:12, 5.39it/s]
Loading 0: 78%|███████▊ | 228/291 [00:56<00:13, 4.51it/s]
Loading 0: 79%|███████▊ | 229/291 [00:56<00:18, 3.41it/s]
Loading 0: 79%|███████▉ | 230/291 [00:57<00:22, 2.71it/s]
Loading 0: 80%|████████ | 233/291 [00:57<00:12, 4.63it/s]
Loading 0: 80%|████████ | 234/291 [00:57<00:11, 4.85it/s]
Loading 0: 81%|████████ | 235/291 [00:57<00:10, 5.44it/s]
Loading 0: 81%|████████▏ | 237/291 [00:58<00:11, 4.55it/s]
Loading 0: 82%|████████▏ | 238/291 [00:58<00:15, 3.42it/s]
Loading 0: 82%|████████▏ | 239/291 [00:59<00:19, 2.72it/s]
Loading 0: 83%|████████▎ | 242/291 [00:59<00:10, 4.67it/s]
Loading 0: 84%|████████▎ | 243/291 [00:59<00:09, 4.91it/s]
Loading 0: 85%|████████▍ | 246/291 [01:00<00:09, 4.73it/s]
Loading 0: 85%|████████▍ | 247/291 [01:01<00:11, 3.67it/s]
Loading 0: 85%|████████▌ | 248/291 [01:01<00:14, 2.92it/s]
Loading 0: 86%|████████▋ | 251/291 [01:01<00:08, 4.71it/s]
Loading 0: 87%|████████▋ | 252/291 [01:02<00:07, 4.93it/s]
Loading 0: 88%|████████▊ | 255/291 [01:02<00:07, 4.76it/s]
Loading 0: 88%|████████▊ | 256/291 [01:03<00:09, 3.72it/s]
Loading 0: 88%|████████▊ | 257/291 [01:03<00:11, 2.97it/s]
Loading 0: 89%|████████▉ | 260/291 [01:04<00:06, 4.70it/s]
Loading 0: 90%|████████▉ | 261/291 [01:04<00:06, 4.91it/s]
Loading 0: 90%|█████████ | 262/291 [01:04<00:05, 5.45it/s]
Loading 0: 91%|█████████ | 264/291 [01:04<00:05, 4.56it/s]
Loading 0: 91%|█████████ | 265/291 [01:05<00:07, 3.46it/s]
Loading 0: 91%|█████████▏| 266/291 [01:06<00:09, 2.73it/s]
Loading 0: 92%|█████████▏| 269/291 [01:06<00:04, 4.65it/s]
Loading 0: 93%|█████████▎| 270/291 [01:06<00:04, 4.88it/s]
Loading 0: 93%|█████████▎| 271/291 [01:06<00:03, 5.48it/s]
Loading 0: 94%|█████████▍| 273/291 [01:07<00:03, 4.55it/s]
Loading 0: 94%|█████████▍| 274/291 [01:07<00:04, 3.43it/s]
Loading 0: 95%|█████████▍| 275/291 [01:08<00:05, 2.71it/s]
Loading 0: 96%|█████████▌| 278/291 [01:08<00:02, 4.64it/s]
Loading 0: 96%|█████████▌| 279/291 [01:08<00:02, 4.88it/s]
Loading 0: 97%|█████████▋| 281/291 [01:09<00:02, 4.01it/s]
Loading 0: 97%|█████████▋| 282/291 [01:09<00:02, 3.06it/s]
Loading 0: 98%|█████████▊| 284/291 [01:10<00:01, 4.10it/s]
Loading 0: 98%|█████████▊| 285/291 [01:10<00:01, 4.42it/s]
Loading 0: 99%|█████████▊| 287/291 [01:10<00:00, 5.25it/s]
Loading 0: 99%|█████████▉| 288/291 [01:11<00:00, 3.73it/s]
Job rirv938-llama-8b-big-ret-4805-v4-mkmlizer completed after 123.79s with status: succeeded
Stopping job with name rirv938-llama-8b-big-ret-4805-v4-mkmlizer
Pipeline stage MKMLizer completed in 125.02s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.08s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service rirv938-llama-8b-big-ret-4805-v4
Waiting for inference service rirv938-llama-8b-big-ret-4805-v4 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Failed to get response for submission rirv938-llama-8b-big-ret_4805_v2: ('http://rirv938-llama-8b-big-ret-4805-v2-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Failed to get response for submission chaiml-chaiml-sft-horror_v2: ('http://chaiml-chaiml-sft-horror-v2-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', 'read tcp 127.0.0.1:33304->127.0.0.1:8080: read: connection reset by peer\n')
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service rirv938-llama-8b-big-ret-4805-v4 ready after 220.72666931152344s
Pipeline stage MKMLDeployer completed in 221.11s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.6806135177612305s
Received healthy response to inference request in 3.209927558898926s
Received healthy response to inference request in 4.5189502239227295s
Received healthy response to inference request in 4.869812726974487s
Failed to get response for submission rirv938-llama-8b-big-ret_4805_v2: ('http://rirv938-llama-8b-big-ret-4805-v2-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Received healthy response to inference request in 3.2872719764709473s
5 requests
0 failed requests
5th percentile: 3.22539644241333
10th percentile: 3.2408653259277345
20th percentile: 3.271803092956543
30th percentile: 3.365940284729004
40th percentile: 3.5232769012451173
50th percentile: 3.6806135177612305
60th percentile: 4.01594820022583
70th percentile: 4.3512828826904295
80th percentile: 4.589122724533081
90th percentile: 4.729467725753784
95th percentile: 4.799640226364136
99th percentile: 4.855778226852417
mean time: 3.913315200805664
%s, retrying in %s seconds...
Received healthy response to inference request in 3.5671589374542236s
Received healthy response to inference request in 3.9393439292907715s
Received healthy response to inference request in 3.215364694595337s
Received healthy response to inference request in 2.100525140762329s
Received healthy response to inference request in 3.028930425643921s
5 requests
0 failed requests
5th percentile: 2.2862061977386476
10th percentile: 2.4718872547149657
20th percentile: 2.8432493686676024
30th percentile: 3.066217279434204
40th percentile: 3.1407909870147703
50th percentile: 3.215364694595337
60th percentile: 3.356082391738892
70th percentile: 3.496800088882446
80th percentile: 3.6415959358215333
90th percentile: 3.790469932556152
95th percentile: 3.864906930923462
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
99th percentile: 3.9244565296173097
Connection pool is full, discarding connection: %s. Connection pool size: %s
mean time: 3.1702646255493163
Pipeline stage StressChecker completed in 39.47s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 5.84s
Shutdown handler de-registered
rirv938-llama-8b-big-ret_4805_v4 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.14s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service rirv938-llama-8b-big-ret-4805-v4-profiler
Waiting for inference service rirv938-llama-8b-big-ret-4805-v4-profiler to be ready
Inference service rirv938-llama-8b-big-ret-4805-v4-profiler ready after 210.5146300792694s
Pipeline stage MKMLProfilerDeployer completed in 210.90s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/rirv938-llama-8b-big696822c093d7fea69edc3f71af3f5aec-deplokv9hs:/code/chaiverse_profiler_1727289099 --namespace tenant-chaiml-guanaco
kubectl exec -it rirv938-llama-8b-big696822c093d7fea69edc3f71af3f5aec-deplokv9hs --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1727289099 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1727289099/summary.json'
kubectl exec -it rirv938-llama-8b-big696822c093d7fea69edc3f71af3f5aec-deplokv9hs --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1727289099/summary.json'
Pipeline stage MKMLProfilerRunner completed in 1932.83s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service rirv938-llama-8b-big-ret-4805-v4-profiler is running
Tearing down inference service rirv938-llama-8b-big-ret-4805-v4-profiler
Service rirv938-llama-8b-big-ret-4805-v4-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.34s
Shutdown handler de-registered
rirv938-llama-8b-big-ret_4805_v4 status is now inactive due to auto deactivation removed underperforming models
rirv938-llama-8b-big-ret_4805_v4 status is now torndown due to DeploymentManager action