Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name rica40325-mistral-13b-2452-v3-mkmlizer
Waiting for job on rica40325-mistral-13b-2452-v3-mkmlizer to finish
rica40325-mistral-13b-2452-v3-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
rica40325-mistral-13b-2452-v3-mkmlizer: ║ _____ __ __ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ /___/ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ Version: 0.11.12 ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ https://mk1.ai ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ The license key for the current software has been verified as ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ belonging to: ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ Chai Research Corp. ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
rica40325-mistral-13b-2452-v3-mkmlizer: ║ ║
rica40325-mistral-13b-2452-v3-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
rica40325-mistral-13b-2452-v3-mkmlizer: Downloaded to shared memory in 29.157s
rica40325-mistral-13b-2452-v3-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmp8aea62vf, device:0
rica40325-mistral-13b-2452-v3-mkmlizer: Saving flywheel model at /dev/shm/model_cache
rica40325-mistral-13b-2452-v3-mkmlizer: /opt/conda/lib/python3.10/site-packages/mk1/flywheel/functional/loader.py:55: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
rica40325-mistral-13b-2452-v3-mkmlizer: tensors = torch.load(model_shard_filename, map_location=torch.device(self.device), mmap=True)
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
rica40325-mistral-13b-2452-v3-mkmlizer: quantized model in 36.532s
rica40325-mistral-13b-2452-v3-mkmlizer: Processed model rica40325/mistral-13B-2452 in 65.690s
rica40325-mistral-13b-2452-v3-mkmlizer: creating bucket guanaco-mkml-models
rica40325-mistral-13b-2452-v3-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
rica40325-mistral-13b-2452-v3-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/rica40325-mistral-13b-2452-v3
rica40325-mistral-13b-2452-v3-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/rica40325-mistral-13b-2452-v3/config.json
rica40325-mistral-13b-2452-v3-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/rica40325-mistral-13b-2452-v3/special_tokens_map.json
rica40325-mistral-13b-2452-v3-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/rica40325-mistral-13b-2452-v3/tokenizer_config.json
rica40325-mistral-13b-2452-v3-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/rica40325-mistral-13b-2452-v3/tokenizer.json
rica40325-mistral-13b-2452-v3-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/rica40325-mistral-13b-2452-v3/flywheel_model.0.safetensors
Job rica40325-mistral-13b-2452-v3-mkmlizer completed after 95.25s with status: succeeded
Stopping job with name rica40325-mistral-13b-2452-v3-mkmlizer
Pipeline stage MKMLizer completed in 96.80s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service rica40325-mistral-13b-2452-v3
Waiting for inference service rica40325-mistral-13b-2452-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service rica40325-mistral-13b-2452-v3 ready after 212.6531002521515s
Pipeline stage MKMLDeployer completed in 214.27s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 6.606102466583252s
Received healthy response to inference request in 3.847417116165161s
Received healthy response to inference request in 5.133509635925293s
Received healthy response to inference request in 2.399578094482422s
Received healthy response to inference request in 2.693202018737793s
5 requests
0 failed requests
5th percentile: 2.458302879333496
10th percentile: 2.51702766418457
20th percentile: 2.634477233886719
30th percentile: 2.924045038223267
40th percentile: 3.385731077194214
50th percentile: 3.847417116165161
60th percentile: 4.361854124069214
70th percentile: 4.876291131973266
80th percentile: 5.428028202056885
90th percentile: 6.017065334320068
95th percentile: 6.31158390045166
99th percentile: 6.547198753356933
mean time: 4.135961866378784
%s, retrying in %s seconds...
Received healthy response to inference request in 4.019929885864258s
Received healthy response to inference request in 2.1169087886810303s
Received healthy response to inference request in 2.2906832695007324s
Received healthy response to inference request in 3.2996935844421387s
Received healthy response to inference request in 2.1893463134765625s
5 requests
0 failed requests
5th percentile: 2.1313962936401367
10th percentile: 2.145883798599243
20th percentile: 2.174858808517456
30th percentile: 2.2096137046813964
40th percentile: 2.2501484870910646
50th percentile: 2.2906832695007324
60th percentile: 2.694287395477295
70th percentile: 3.097891521453857
80th percentile: 3.4437408447265625
90th percentile: 3.73183536529541
95th percentile: 3.875882625579834
99th percentile: 3.991120433807373
mean time: 2.783312368392944
Pipeline stage StressChecker completed in 40.38s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 5.09s
Shutdown handler de-registered
rica40325-mistral-13b-2452_v3 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.12s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service rica40325-mistral-13b-2452-v3-profiler
Waiting for inference service rica40325-mistral-13b-2452-v3-profiler to be ready
Inference service rica40325-mistral-13b-2452-v3-profiler ready after 220.57646894454956s
Pipeline stage MKMLProfilerDeployer completed in 220.98s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/rica40325-mistral-13def8aacf8fb5f22ee1bdc94d68e47f6b-deploj2j7x:/code/chaiverse_profiler_1727252465 --namespace tenant-chaiml-guanaco
kubectl exec -it rica40325-mistral-13def8aacf8fb5f22ee1bdc94d68e47f6b-deploj2j7x --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1727252465 && python profiles.py profile --best_of_n 8 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 1024 --output_tokens 64 --summary /code/chaiverse_profiler_1727252465/summary.json'
kubectl exec -it rica40325-mistral-13def8aacf8fb5f22ee1bdc94d68e47f6b-deploj2j7x --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1727252465/summary.json'
Pipeline stage MKMLProfilerRunner completed in 1152.90s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service rica40325-mistral-13b-2452-v3-profiler is running
Tearing down inference service rica40325-mistral-13b-2452-v3-profiler
Service rica40325-mistral-13b-2452-v3-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.20s
Shutdown handler de-registered
rica40325-mistral-13b-2452_v3 status is now inactive due to auto deactivation removed underperforming models
Pipeline stage ProductionBlendMKMLTemplater completed in 4.89s
admin requested tearing down of rica40325-mistral-13b-2452_v3
run pipeline stage %s
Shutdown handler not registered because Python interpreter is not running in the main thread
admin requested tearing down of blend_rofur_2024-10-03
Running pipeline stage MKMLDeployer
run pipeline %s
Shutdown handler not registered because Python interpreter is not running in the main thread
Creating inference service blend-rofur-2024-10-03
run pipeline stage %s
run pipeline %s
Ignoring service blend-rofur-2024-10-03 already deployed
Running pipeline stage MKMLDeleter
run pipeline stage %s
Waiting for inference service blend-rofur-2024-10-03 to be ready
rica40325-mistral-13b-2452_v3 status is now torndown due to DeploymentManager action