Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer
Waiting for job on bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer to finish
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ _____ __ __ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ /___/ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ Version: 0.11.12 ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ https://mk1.ai ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ The license key for the current software has been verified as ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ belonging to: ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ Chai Research Corp. ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ║ ║
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: Downloaded to shared memory in 48.024s
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpbez96dhw, device:0
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Connection pool is full, discarding connection: %s. Connection pool size: %s
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: quantized model in 36.369s
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: Processed model BBChicago/Nana-nemo-12B_v1.0 in 84.393s
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: creating bucket guanaco-mkml-models
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/bbchicago-nana-nemo-12b-v1-0-v6
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/bbchicago-nana-nemo-12b-v1-0-v6/config.json
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/bbchicago-nana-nemo-12b-v1-0-v6/special_tokens_map.json
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/bbchicago-nana-nemo-12b-v1-0-v6/tokenizer_config.json
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/bbchicago-nana-nemo-12b-v1-0-v6/tokenizer.json
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/bbchicago-nana-nemo-12b-v1-0-v6/flywheel_model.0.safetensors
bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer:
Loading 0: 0%| | 0/363 [00:00<?, ?it/s]
Loading 0: 1%|▏ | 5/363 [00:00<00:11, 30.53it/s]
Loading 0: 4%|▎ | 13/363 [00:00<00:07, 48.87it/s]
Loading 0: 5%|▌ | 19/363 [00:00<00:07, 46.29it/s]
Loading 0: 7%|▋ | 24/363 [00:00<00:07, 43.76it/s]
Loading 0: 9%|▊ | 31/363 [00:00<00:06, 49.81it/s]
Loading 0: 10%|█ | 37/363 [00:00<00:06, 47.84it/s]
Loading 0: 12%|█▏ | 42/363 [00:00<00:06, 46.57it/s]
Loading 0: 13%|█▎ | 49/363 [00:01<00:06, 51.91it/s]
Loading 0: 15%|█▌ | 55/363 [00:01<00:06, 49.48it/s]
Loading 0: 17%|█▋ | 61/363 [00:01<00:08, 34.82it/s]
Loading 0: 18%|█▊ | 66/363 [00:01<00:08, 35.70it/s]
Loading 0: 20%|█▉ | 72/363 [00:01<00:07, 40.14it/s]
Loading 0: 21%|██▏ | 78/363 [00:01<00:06, 40.84it/s]
Loading 0: 23%|██▎ | 83/363 [00:01<00:06, 41.10it/s]
Loading 0: 25%|██▍ | 89/363 [00:02<00:06, 42.98it/s]
Loading 0: 26%|██▌ | 94/363 [00:02<00:06, 41.84it/s]
Loading 0: 27%|██▋ | 99/363 [00:02<00:06, 42.94it/s]
Loading 0: 29%|██▉ | 105/363 [00:02<00:06, 42.99it/s]
Loading 0: 31%|███ | 112/363 [00:02<00:05, 46.85it/s]
Loading 0: 32%|███▏ | 117/363 [00:02<00:05, 44.89it/s]
Loading 0: 34%|███▍ | 123/363 [00:02<00:05, 43.50it/s]
Loading 0: 35%|███▌ | 128/363 [00:02<00:05, 42.69it/s]
Loading 0: 37%|███▋ | 134/363 [00:03<00:05, 45.78it/s]
Loading 0: 38%|███▊ | 139/363 [00:03<00:05, 44.18it/s]
Loading 0: 40%|███▉ | 144/363 [00:03<00:07, 27.59it/s]
Loading 0: 41%|████ | 149/363 [00:03<00:07, 30.39it/s]
Loading 0: 43%|████▎ | 157/363 [00:03<00:05, 38.64it/s]
Loading 0: 45%|████▍ | 163/363 [00:03<00:05, 39.22it/s]
Loading 0: 46%|████▋ | 168/363 [00:04<00:04, 39.87it/s]
Loading 0: 48%|████▊ | 175/363 [00:04<00:04, 44.98it/s]
Loading 0: 50%|████▉ | 181/363 [00:04<00:04, 41.54it/s]
Loading 0: 51%|█████ | 186/363 [00:04<00:04, 40.08it/s]
Loading 0: 53%|█████▎ | 192/363 [00:04<00:03, 44.23it/s]
Loading 0: 54%|█████▍ | 197/363 [00:04<00:03, 44.78it/s]
Loading 0: 56%|█████▌ | 202/363 [00:04<00:03, 44.74it/s]
Loading 0: 57%|█████▋ | 208/363 [00:04<00:03, 43.87it/s]
Loading 0: 59%|█████▊ | 213/363 [00:05<00:03, 42.82it/s]
Loading 0: 61%|██████ | 220/363 [00:05<00:02, 49.05it/s]
Loading 0: 62%|██████▏ | 226/363 [00:05<00:04, 32.06it/s]
Loading 0: 64%|██████▎ | 231/363 [00:05<00:04, 32.84it/s]
Loading 0: 66%|██████▌ | 238/363 [00:05<00:03, 38.57it/s]
Loading 0: 67%|██████▋ | 244/363 [00:05<00:02, 39.97it/s]
Loading 0: 69%|██████▊ | 249/363 [00:06<00:02, 39.99it/s]
Loading 0: 70%|███████ | 255/363 [00:06<00:02, 43.93it/s]
Loading 0: 72%|███████▏ | 260/363 [00:06<00:02, 43.02it/s]
Loading 0: 73%|███████▎ | 265/363 [00:06<00:02, 43.57it/s]
Loading 0: 74%|███████▍ | 270/363 [00:06<00:02, 43.72it/s]
Loading 0: 76%|███████▌ | 275/363 [00:06<00:02, 37.45it/s]
Loading 0: 78%|███████▊ | 283/363 [00:06<00:01, 45.77it/s]
Loading 0: 79%|███████▉ | 288/363 [00:06<00:01, 46.44it/s]
Loading 0: 81%|████████ | 293/363 [00:07<00:01, 36.27it/s]
Loading 0: 82%|████████▏ | 299/363 [00:07<00:01, 39.59it/s]
Loading 0: 84%|████████▎ | 304/363 [00:14<00:23, 2.54it/s]
Loading 0: 85%|████████▍ | 308/363 [00:14<00:16, 3.25it/s]
Loading 0: 86%|████████▌ | 312/363 [00:14<00:12, 4.22it/s]
Loading 0: 88%|████████▊ | 320/363 [00:14<00:06, 7.01it/s]
Loading 0: 90%|████████▉ | 326/363 [00:14<00:03, 9.53it/s]
Loading 0: 91%|█████████ | 331/363 [00:14<00:02, 12.11it/s]
Loading 0: 93%|█████████▎| 338/363 [00:14<00:01, 16.96it/s]
Loading 0: 95%|█████████▍| 344/363 [00:14<00:00, 20.96it/s]
Loading 0: 96%|█████████▌| 349/363 [00:15<00:00, 24.35it/s]
Loading 0: 98%|█████████▊| 356/363 [00:15<00:00, 30.97it/s]
Loading 0: 100%|█████████▉| 362/363 [00:15<00:00, 33.45it/s]
Job bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer completed after 104.98s with status: succeeded
Stopping job with name bbchicago-nana-nemo-12b-v1-0-v6-mkmlizer
Pipeline stage MKMLizer completed in 105.99s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.09s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service bbchicago-nana-nemo-12b-v1-0-v6
Waiting for inference service bbchicago-nana-nemo-12b-v1-0-v6 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service bbchicago-nana-nemo-12b-v1-0-v6 ready after 211.8186583518982s
Pipeline stage MKMLDeployer completed in 212.25s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.0784730911254883s
Received healthy response to inference request in 3.1527035236358643s
Failed to get response for submission zonemercy-lexical-nemov8_5966_v9: ('http://zonemercy-lexical-nemov8-5966-v9-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/models/GPT-J-6B-lit-v2:predict', 'read tcp 127.0.0.1:46320->127.0.0.1:8080: read: connection reset by peer\n')
Received healthy response to inference request in 3.4683377742767334s
Received healthy response to inference request in 2.1776537895202637s
Received healthy response to inference request in 2.6913270950317383s
5 requests
0 failed requests
5th percentile: 2.2803884506225587
10th percentile: 2.3831231117248537
20th percentile: 2.5885924339294433
30th percentile: 2.768756294250488
40th percentile: 2.9236146926879885
50th percentile: 3.0784730911254883
60th percentile: 3.1081652641296387
70th percentile: 3.137857437133789
80th percentile: 3.2158303737640384
90th percentile: 3.3420840740203857
95th percentile: 3.4052109241485593
99th percentile: 3.4557124042510985
mean time: 2.9136990547180175
Pipeline stage StressChecker completed in 17.92s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 6.17s
Shutdown handler de-registered
bbchicago-nana-nemo-12b-v1-0_v6 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.14s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service bbchicago-nana-nemo-12b-v1-0-v6-profiler
Waiting for inference service bbchicago-nana-nemo-12b-v1-0-v6-profiler to be ready
Inference service bbchicago-nana-nemo-12b-v1-0-v6-profiler ready after 210.56793451309204s
Pipeline stage MKMLProfilerDeployer completed in 210.95s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/bbchicago-nana-nemo-d31f55556f29d4d37aa2340ba4413514-deplo9z6mt:/code/chaiverse_profiler_1727240467 --namespace tenant-chaiml-guanaco
kubectl exec -it bbchicago-nana-nemo-d31f55556f29d4d37aa2340ba4413514-deplo9z6mt --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1727240467 && python profiles.py profile --best_of_n 8 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 1024 --output_tokens 64 --summary /code/chaiverse_profiler_1727240467/summary.json'
kubectl exec -it bbchicago-nana-nemo-d31f55556f29d4d37aa2340ba4413514-deplo9z6mt --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1727240467/summary.json'
Pipeline stage MKMLProfilerRunner completed in 1167.98s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service bbchicago-nana-nemo-12b-v1-0-v6-profiler is running
Tearing down inference service bbchicago-nana-nemo-12b-v1-0-v6-profiler
Service bbchicago-nana-nemo-12b-v1-0-v6-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.36s
Shutdown handler de-registered
bbchicago-nana-nemo-12b-v1-0_v6 status is now inactive due to auto deactivation removed underperforming models
admin requested tearing down of blend_rofur_2024-10-03
bbchicago-nana-nemo-12b-v1-0_v6 status is now torndown due to DeploymentManager action
Shutdown handler not registered because Python interpreter is not running in the main thread