Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name koboldai-llama2-13b-estopia-v1-mkmlizer
Waiting for job on koboldai-llama2-13b-estopia-v1-mkmlizer to finish
koboldai-llama2-13b-estopia-v1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ _____ __ __ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ /___/ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ Version: 0.11.12 ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ https://mk1.ai ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ The license key for the current software has been verified as ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ belonging to: ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ Chai Research Corp. ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ║ ║
koboldai-llama2-13b-estopia-v1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
koboldai-llama2-13b-estopia-v1-mkmlizer: Downloaded to shared memory in 56.967s
koboldai-llama2-13b-estopia-v1-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpqkwptls3, device:0
koboldai-llama2-13b-estopia-v1-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
koboldai-llama2-13b-estopia-v1-mkmlizer: creating bucket guanaco-mkml-models
koboldai-llama2-13b-estopia-v1-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
koboldai-llama2-13b-estopia-v1-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/config.json
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/special_tokens_map.json
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/tokenizer_config.json
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.model s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/tokenizer.model
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/tokenizer.json
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name arliai-mistral-nemo-12b-9104-v3-mkmlizer
Waiting for job on arliai-mistral-nemo-12b-9104-v3-mkmlizer to finish
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/flywheel_model.0.safetensors
Job koboldai-llama2-13b-estopia-v1-mkmlizer completed after 117.29s with status: succeeded
Stopping job with name koboldai-llama2-13b-estopia-v1-mkmlizer
Pipeline stage MKMLizer completed in 118.43s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.16s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service koboldai-llama2-13b-estopia-v1
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ _____ __ __ ║
Waiting for inference service koboldai-llama2-13b-estopia-v1 to be ready
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ /___/ ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ Version: 0.11.12 ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ https://mk1.ai ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ The license key for the current software has been verified as ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ belonging to: ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ Chai Research Corp. ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║ ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Downloaded to shared memory in 30.437s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpm8jro_qu, device:0
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Connection pool is full, discarding connection: %s. Connection pool size: %s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: quantized model in 36.589s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Processed model ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1 in 67.026s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: creating bucket guanaco-mkml-models
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
arliai-mistral-nemo-12b-9104-v3-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/config.json
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/special_tokens_map.json
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/tokenizer_config.json
Connection pool is full, discarding connection: %s. Connection pool size: %s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/flywheel_model.0.safetensors
arliai-mistral-nemo-12b-9104-v3-mkmlizer:
Loading 0: 0%| | 0/363 [00:00<?, ?it/s]
Loading 0: 1%|▏ | 5/363 [00:00<00:11, 31.84it/s]
Loading 0: 4%|▎ | 13/363 [00:00<00:06, 51.80it/s]
Loading 0: 5%|▌ | 19/363 [00:00<00:07, 46.95it/s]
Loading 0: 7%|▋ | 24/363 [00:00<00:07, 46.05it/s]
Loading 0: 9%|▊ | 31/363 [00:00<00:06, 52.77it/s]
Loading 0: 10%|█ | 37/363 [00:00<00:06, 50.72it/s]
Loading 0: 12%|█▏ | 43/363 [00:00<00:06, 51.33it/s]
Loading 0: 13%|█▎ | 49/363 [00:00<00:05, 53.37it/s]
Loading 0: 15%|█▌ | 55/363 [00:01<00:06, 51.28it/s]
Loading 0: 17%|█▋ | 61/363 [00:01<00:07, 37.98it/s]
Loading 0: 18%|█▊ | 66/363 [00:01<00:07, 37.76it/s]
Loading 0: 20%|█▉ | 72/363 [00:01<00:06, 42.14it/s]
Loading 0: 21%|██ | 77/363 [00:01<00:06, 42.96it/s]
Loading 0: 23%|██▎ | 82/363 [00:01<00:07, 37.80it/s]
Loading 0: 25%|██▍ | 90/363 [00:01<00:05, 46.02it/s]
Loading 0: 26%|██▋ | 96/363 [00:02<00:05, 45.13it/s]
Loading 0: 28%|██▊ | 101/363 [00:02<00:06, 42.78it/s]
Loading 0: 29%|██▉ | 107/363 [00:02<00:05, 46.93it/s]
Loading 0: 31%|███ | 113/363 [00:02<00:05, 43.47it/s]
Loading 0: 33%|███▎ | 118/363 [00:02<00:05, 41.87it/s]
Loading 0: 35%|███▍ | 126/363 [00:02<00:04, 48.38it/s]
Loading 0: 36%|███▋ | 132/363 [00:02<00:04, 46.22it/s]
Loading 0: 38%|███▊ | 137/363 [00:03<00:04, 45.48it/s]
Loading 0: 39%|███▉ | 142/363 [00:03<00:06, 34.81it/s]
Loading 0: 40%|████ | 146/363 [00:03<00:06, 34.20it/s]
Loading 0: 41%|████▏ | 150/363 [00:03<00:06, 33.64it/s]
Loading 0: 43%|████▎ | 157/363 [00:03<00:05, 40.13it/s]
Loading 0: 45%|████▍ | 162/363 [00:03<00:04, 40.63it/s]
Loading 0: 46%|████▌ | 167/363 [00:04<00:06, 32.05it/s]
Loading 0: 48%|████▊ | 174/363 [00:04<00:04, 39.16it/s]
Loading 0: 49%|████▉ | 179/363 [00:04<00:04, 40.33it/s]
Loading 0: 51%|█████ | 184/363 [00:04<00:04, 40.66it/s]
Loading 0: 52%|█████▏ | 190/363 [00:04<00:04, 40.21it/s]
Loading 0: 54%|█████▎ | 195/363 [00:04<00:04, 39.60it/s]
Loading 0: 55%|█████▌ | 201/363 [00:04<00:03, 43.73it/s]
Loading 0: 57%|█████▋ | 206/363 [00:04<00:03, 43.36it/s]
Loading 0: 58%|█████▊ | 211/363 [00:04<00:03, 42.57it/s]
Loading 0: 60%|█████▉ | 216/363 [00:05<00:03, 42.90it/s]
Loading 0: 61%|██████ | 221/363 [00:05<00:03, 43.31it/s]
Loading 0: 62%|██████▏ | 226/363 [00:05<00:05, 26.47it/s]
Loading 0: 63%|██████▎ | 230/363 [00:05<00:04, 26.74it/s]
Loading 0: 65%|██████▌ | 237/363 [00:05<00:03, 33.27it/s]
Loading 0: 66%|██████▋ | 241/363 [00:05<00:03, 32.88it/s]
Loading 0: 68%|██████▊ | 246/363 [00:06<00:03, 35.74it/s]
Loading 0: 69%|██████▉ | 250/363 [00:06<00:03, 34.72it/s]
Loading 0: 70%|███████ | 255/363 [00:06<00:02, 36.92it/s]
Loading 0: 71%|███████▏ | 259/363 [00:06<00:02, 36.69it/s]
Loading 0: 73%|███████▎ | 265/363 [00:06<00:02, 40.80it/s]
Loading 0: 75%|███████▍ | 271/363 [00:06<00:02, 41.12it/s]
Loading 0: 76%|███████▌ | 276/363 [00:06<00:02, 40.63it/s]
Loading 0: 78%|███████▊ | 282/363 [00:06<00:01, 45.45it/s]
Loading 0: 79%|███████▉ | 287/363 [00:07<00:01, 45.66it/s]
Loading 0: 80%|████████ | 292/363 [00:07<00:01, 46.29it/s]
Loading 0: 82%|████████▏ | 298/363 [00:07<00:01, 44.50it/s]
Loading 0: 83%|████████▎ | 303/363 [00:07<00:01, 45.35it/s]
Loading 0: 85%|████████▍ | 308/363 [00:14<00:22, 2.42it/s]
Loading 0: 86%|████████▌ | 312/363 [00:14<00:16, 3.14it/s]
Loading 0: 88%|████████▊ | 320/363 [00:14<00:08, 5.20it/s]
Loading 0: 90%|████████▉ | 326/363 [00:14<00:05, 7.12it/s]
Loading 0: 91%|█████████ | 331/363 [00:14<00:03, 9.12it/s]
Loading 0: 93%|█████████▎| 338/363 [00:14<00:01, 12.97it/s]
Loading 0: 94%|█████████▍| 343/363 [00:15<00:01, 16.03it/s]
Loading 0: 96%|█████████▌| 348/363 [00:15<00:00, 18.18it/s]
Loading 0: 98%|█████████▊| 356/363 [00:15<00:00, 25.23it/s]
Loading 0: 99%|█████████▉| 361/363 [00:15<00:00, 28.87it/s]
Job arliai-mistral-nemo-12b-9104-v3-mkmlizer completed after 100.02s with status: succeeded
Stopping job with name arliai-mistral-nemo-12b-9104-v3-mkmlizer
Pipeline stage MKMLizer completed in 100.69s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.35s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service arliai-mistral-nemo-12b-9104-v3
Waiting for inference service arliai-mistral-nemo-12b-9104-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service koboldai-llama2-13b-estopia-v1 ready after 211.71182775497437s
Pipeline stage MKMLDeployer completed in 213.75s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.860642671585083s
Received healthy response to inference request in 2.537536859512329s
Received healthy response to inference request in 2.086594581604004s
Received healthy response to inference request in 3.2676620483398438s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Received healthy response to inference request in 1.7225141525268555s
5 requests
0 failed requests
5th percentile: 1.7953302383422851
10th percentile: 1.8681463241577148
20th percentile: 2.013778495788574
30th percentile: 2.176783037185669
40th percentile: 2.357159948348999
50th percentile: 2.537536859512329
60th percentile: 2.6667791843414306
70th percentile: 2.796021509170532
80th percentile: 2.942046546936035
90th percentile: 3.1048542976379396
95th percentile: 3.1862581729888917
99th percentile: 3.2513812732696534
mean time: 2.4949900627136232
Pipeline stage StressChecker completed in 15.39s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 8.27s
Shutdown handler de-registered
koboldai-llama2-13b-estopia_v1 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.18s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.16s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service koboldai-llama2-13b-estopia-v1-profiler
Waiting for inference service koboldai-llama2-13b-estopia-v1-profiler to be ready
Inference service koboldai-llama2-13b-estopia-v1-profiler ready after 210.649188041687s
Pipeline stage MKMLProfilerDeployer completed in 211.10s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/koboldai-llama2-13b-7f67e32badd9718bf06a5ab6c0c4e20d-deplov6c8l:/code/chaiverse_profiler_1727235771 --namespace tenant-chaiml-guanaco
kubectl exec -it koboldai-llama2-13b-7f67e32badd9718bf06a5ab6c0c4e20d-deplov6c8l --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1727235771 && python profiles.py profile --best_of_n 8 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 1024 --output_tokens 64 --summary /code/chaiverse_profiler_1727235771/summary.json'
kubectl exec -it koboldai-llama2-13b-7f67e32badd9718bf06a5ab6c0c4e20d-deplov6c8l --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1727235771/summary.json'
Pipeline stage MKMLProfilerRunner completed in 1380.35s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service koboldai-llama2-13b-estopia-v1-profiler is running
Tearing down inference service koboldai-llama2-13b-estopia-v1-profiler
Service koboldai-llama2-13b-estopia-v1-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.36s
Shutdown handler de-registered
koboldai-llama2-13b-estopia_v1 status is now inactive due to auto deactivation removed underperforming models
run pipeline stage %s
admin requested tearing down of koboldai-llama2-13b-estopia_v1
Running pipeline stage MKMLDeleter
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLDeleter
%s, retrying in %s seconds...
%s, retrying in %s seconds...
clean up pipeline due to error=TeardownError("module 'kubernetes.config' has no attribute 'load_kube_config'")
Shutdown handler de-registered
koboldai-llama2-13b-estopia_v1 status is now torndown due to DeploymentManager action