Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name rica40325-spin-first-v1-mkmlizer
Waiting for job on rica40325-spin-first-v1-mkmlizer to finish
rica40325-spin-first-v1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
rica40325-spin-first-v1-mkmlizer: ║ _____ __ __ ║
rica40325-spin-first-v1-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
rica40325-spin-first-v1-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
rica40325-spin-first-v1-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
rica40325-spin-first-v1-mkmlizer: ║ /___/ ║
rica40325-spin-first-v1-mkmlizer: ║ ║
rica40325-spin-first-v1-mkmlizer: ║ Version: 0.10.1 ║
rica40325-spin-first-v1-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
rica40325-spin-first-v1-mkmlizer: ║ https://mk1.ai ║
rica40325-spin-first-v1-mkmlizer: ║ ║
rica40325-spin-first-v1-mkmlizer: ║ The license key for the current software has been verified as ║
rica40325-spin-first-v1-mkmlizer: ║ belonging to: ║
rica40325-spin-first-v1-mkmlizer: ║ ║
rica40325-spin-first-v1-mkmlizer: ║ Chai Research Corp. ║
rica40325-spin-first-v1-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
rica40325-spin-first-v1-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
rica40325-spin-first-v1-mkmlizer: ║ ║
rica40325-spin-first-v1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
Failed to get response for submission blend_hokok_2024-09-09: ('http://neversleep-noromaid-v0-8068-v150-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Connection pool is full, discarding connection: %s. Connection pool size: %s
rica40325-spin-first-v1-mkmlizer: Downloaded to shared memory in 70.423s
rica40325-spin-first-v1-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpka2353ts, device:0
rica40325-spin-first-v1-mkmlizer: Saving flywheel model at /dev/shm/model_cache
rica40325-spin-first-v1-mkmlizer: quantized model in 28.821s
rica40325-spin-first-v1-mkmlizer: Processed model rica40325/spin_first in 99.244s
rica40325-spin-first-v1-mkmlizer: creating bucket guanaco-mkml-models
rica40325-spin-first-v1-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
rica40325-spin-first-v1-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/rica40325-spin-first-v1
rica40325-spin-first-v1-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/rica40325-spin-first-v1/config.json
rica40325-spin-first-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/rica40325-spin-first-v1/tokenizer.json
rica40325-spin-first-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/rica40325-spin-first-v1/flywheel_model.0.safetensors
rica40325-spin-first-v1-mkmlizer:
Loading 0: 0%| | 0/291 [00:00<?, ?it/s]
Loading 0: 2%|▏ | 5/291 [00:00<00:11, 25.90it/s]
Loading 0: 4%|▍ | 12/291 [00:00<00:08, 34.45it/s]
Loading 0: 5%|▌ | 16/291 [00:00<00:08, 32.54it/s]
Loading 0: 7%|▋ | 21/291 [00:00<00:07, 36.82it/s]
Loading 0: 9%|▊ | 25/291 [00:00<00:07, 34.56it/s]
Loading 0: 11%|█ | 31/291 [00:00<00:06, 40.94it/s]
Loading 0: 12%|█▏ | 36/291 [00:01<00:10, 23.45it/s]
Loading 0: 14%|█▍ | 41/291 [00:01<00:10, 24.62it/s]
Loading 0: 16%|█▋ | 48/291 [00:01<00:07, 31.03it/s]
Loading 0: 18%|█▊ | 52/291 [00:01<00:07, 31.37it/s]
Loading 0: 20%|█▉ | 57/291 [00:01<00:06, 33.54it/s]
Loading 0: 21%|██ | 61/291 [00:01<00:07, 32.45it/s]
Loading 0: 23%|██▎ | 66/291 [00:02<00:06, 35.65it/s]
Loading 0: 24%|██▍ | 70/291 [00:02<00:06, 34.53it/s]
Loading 0: 25%|██▌ | 74/291 [00:02<00:06, 32.79it/s]
Loading 0: 27%|██▋ | 78/291 [00:02<00:06, 33.10it/s]
Loading 0: 28%|██▊ | 82/291 [00:02<00:09, 23.00it/s]
Loading 0: 29%|██▉ | 85/291 [00:02<00:08, 23.87it/s]
Loading 0: 31%|███ | 90/291 [00:02<00:07, 28.71it/s]
Loading 0: 32%|███▏ | 94/291 [00:03<00:06, 29.01it/s]
Loading 0: 34%|███▍ | 99/291 [00:03<00:05, 32.69it/s]
Loading 0: 35%|███▌ | 103/291 [00:03<00:05, 31.84it/s]
Loading 0: 37%|███▋ | 108/291 [00:03<00:05, 35.50it/s]
Loading 0: 38%|███▊ | 112/291 [00:03<00:05, 33.98it/s]
Loading 0: 40%|███▉ | 116/291 [00:03<00:05, 33.57it/s]
Loading 0: 42%|████▏ | 122/291 [00:03<00:04, 37.47it/s]
Loading 0: 44%|████▎ | 127/291 [00:04<00:04, 35.27it/s]
Loading 0: 46%|████▌ | 133/291 [00:04<00:05, 29.59it/s]
Loading 0: 47%|████▋ | 137/291 [00:04<00:05, 29.27it/s]
Loading 0: 48%|████▊ | 141/291 [00:04<00:05, 27.19it/s]
Loading 0: 51%|█████ | 147/291 [00:04<00:04, 31.66it/s]
Loading 0: 52%|█████▏ | 151/291 [00:04<00:04, 30.83it/s]
Loading 0: 54%|█████▎ | 156/291 [00:04<00:03, 34.42it/s]
Loading 0: 55%|█████▍ | 160/291 [00:05<00:03, 33.69it/s]
Loading 0: 57%|█████▋ | 165/291 [00:05<00:03, 36.51it/s]
Loading 0: 58%|█████▊ | 169/291 [00:05<00:03, 35.13it/s]
Loading 0: 60%|█████▉ | 174/291 [00:05<00:03, 38.04it/s]
Loading 0: 61%|██████ | 178/291 [00:05<00:03, 36.26it/s]
Loading 0: 63%|██████▎ | 184/291 [00:05<00:02, 41.94it/s]
Loading 0: 65%|██████▍ | 189/291 [00:06<00:03, 25.65it/s]
Loading 0: 67%|██████▋ | 194/291 [00:06<00:03, 26.66it/s]
Loading 0: 69%|██████▉ | 201/291 [00:06<00:02, 33.90it/s]
Loading 0: 71%|███████ | 206/291 [00:06<00:02, 34.09it/s]
Loading 0: 72%|███████▏ | 210/291 [00:06<00:02, 34.57it/s]
Loading 0: 74%|███████▎ | 214/291 [00:06<00:02, 32.74it/s]
Loading 0: 75%|███████▌ | 219/291 [00:06<00:02, 35.71it/s]
Loading 0: 77%|███████▋ | 223/291 [00:06<00:02, 33.73it/s]
Loading 0: 78%|███████▊ | 227/291 [00:07<00:01, 34.25it/s]
Loading 0: 79%|███████▉ | 231/291 [00:07<00:01, 33.00it/s]
Loading 0: 81%|████████ | 235/291 [00:07<00:02, 25.48it/s]
Loading 0: 82%|████████▏ | 239/291 [00:07<00:02, 25.09it/s]
Loading 0: 84%|████████▍ | 244/291 [00:07<00:01, 30.01it/s]
Loading 0: 85%|████████▌ | 248/291 [00:07<00:01, 27.45it/s]
Loading 0: 88%|████████▊ | 255/291 [00:08<00:01, 32.60it/s]
Loading 0: 89%|████████▉ | 259/291 [00:08<00:01, 31.00it/s]
Loading 0: 91%|█████████ | 264/291 [00:08<00:00, 33.09it/s]
Loading 0: 92%|█████████▏| 268/291 [00:08<00:00, 31.83it/s]
Loading 0: 94%|█████████▍| 273/291 [00:08<00:00, 34.77it/s]
Loading 0: 95%|█████████▌| 277/291 [00:08<00:00, 33.37it/s]
Loading 0: 97%|█████████▋| 281/291 [00:08<00:00, 34.05it/s]
Loading 0: 98%|█████████▊| 286/291 [00:14<00:01, 2.62it/s]
Loading 0: 99%|█████████▉| 289/291 [00:14<00:00, 3.26it/s]
Job rica40325-spin-first-v1-mkmlizer completed after 117.61s with status: succeeded
Stopping job with name rica40325-spin-first-v1-mkmlizer
Pipeline stage MKMLizer completed in 118.58s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.11s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service rica40325-spin-first-v1
Waiting for inference service rica40325-spin-first-v1 to be ready
Failed to get response for submission blend_sudit_2024-09-14: ('http://sao10k-mn-12b-lyra-v4a1-v3-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', 'read tcp 127.0.0.1:43214->127.0.0.1:8080: read: connection reset by peer\n')
Failed to get response for submission blend_hokok_2024-09-09: ('http://neversleep-noromaid-v0-8068-v150-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service rica40325-spin-first-v1 ready after 170.88458728790283s
Pipeline stage MKMLDeployer completed in 171.34s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.9099202156066895s
Received healthy response to inference request in 1.447563886642456s
Failed to get response for submission blend_hokok_2024-09-09: ('http://neversleep-noromaid-v0-8068-v150-predictor.tenant-chaiml-guanaco.k.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', '')
Received healthy response to inference request in 2.1052515506744385s
Received healthy response to inference request in 2.979017734527588s
Received healthy response to inference request in 2.066833257675171s
5 requests
0 failed requests
5th percentile: 1.5400351524353026
10th percentile: 1.6325064182281495
20th percentile: 1.8174489498138429
30th percentile: 1.9413028240203858
40th percentile: 2.0040680408477782
50th percentile: 2.066833257675171
60th percentile: 2.0822005748748778
70th percentile: 2.097567892074585
80th percentile: 2.2800047874450686
90th percentile: 2.629511260986328
95th percentile: 2.8042644977569577
99th percentile: 2.9440670871734618
mean time: 2.1017173290252686
Pipeline stage StressChecker completed in 11.50s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 9.28s
Shutdown handler de-registered
rica40325-spin-first_v1 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.12s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.10s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service rica40325-spin-first-v1-profiler
Waiting for inference service rica40325-spin-first-v1-profiler to be ready
Inference service rica40325-spin-first-v1-profiler ready after 170.40556693077087s
Pipeline stage MKMLProfilerDeployer completed in 170.80s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/rica40325-spin-first-v1-profiler-predictor-00001-deploymen8hdpf:/code/chaiverse_profiler_1726546998 --namespace tenant-chaiml-guanaco
kubectl exec -it rica40325-spin-first-v1-profiler-predictor-00001-deploymen8hdpf --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1726546998 && python profiles.py profile --best_of_n 16 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 512 --output_tokens 64 --summary /code/chaiverse_profiler_1726546998/summary.json'
kubectl exec -it rica40325-spin-first-v1-profiler-predictor-00001-deploymen8hdpf --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1726546998/summary.json'
Pipeline stage MKMLProfilerRunner completed in 839.50s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service rica40325-spin-first-v1-profiler is running
Tearing down inference service rica40325-spin-first-v1-profiler
Service rica40325-spin-first-v1-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.10s
Shutdown handler de-registered
rica40325-spin-first_v1 status is now inactive due to auto deactivation removed underperforming models
rica40325-spin-first_v1 status is now torndown due to DeploymentManager action