princeton-nlp-gemma-2-9b_1853

submission_id: princeton-nlp-gemma-2-9b_1853_v1
developer_uid: Meliodia
formatter: {'memory_template': '<|im_start|>system\n{memory}<|im_end|>\n', 'prompt_template': '<|im_start|>user\n{prompt}<|im_end|>\n', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{user_name}: {message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 1, 'max_output_tokens': 64}
model_name: princeton-nlp-gemma-2-9b_1853_v1
model_repo: princeton-nlp/gemma-2-9b-it-SimPO
status: torndown
timestamp: 2024-09-10T21:39:02+00:00
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name princeton-nlp-gemma-2-9b-1853-v1-mkmlizer
Waiting for job on princeton-nlp-gemma-2-9b-1853-v1-mkmlizer to finish
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║     _____            __           __                                ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║    / _/ /_ ___    __/ /  ___ ___ / /                                ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║   / _/ / // / |/|/ / _ \/ -_) -_) /                                 ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  /_//_/\_, /|__,__/_//_/\__/\__/_/                                  ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║       /___/                                                         ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║                                                                     ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  Version: 0.10.1                                                    ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  Copyright 2023 MK ONE TECHNOLOGIES Inc.                            ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  https://mk1.ai                                                     ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║                                                                     ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  The license key for the current software has been verified as      ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  belonging to:                                                      ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║                                                                     ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  Chai Research Corp.                                                ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f                   ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║  Expiration: 2024-10-15 23:59:59                                    ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ║                                                                     ║
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: Downloaded to shared memory in 31.389s
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpqyfqq3qc, device:0
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: Saving flywheel model at /dev/shm/model_cache
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: quantized model in 33.221s
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: Processed model princeton-nlp/gemma-2-9b-it-SimPO in 64.611s
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: creating bucket guanaco-mkml-models
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1/tokenizer_config.json
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1/special_tokens_map.json
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1/config.json
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.model s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1/tokenizer.model
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1/tokenizer.json
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/princeton-nlp-gemma-2-9b-1853-v1/flywheel_model.0.safetensors
princeton-nlp-gemma-2-9b-1853-v1-mkmlizer: 
Loading 0:   0%|          | 0/464 [00:00<?, ?it/s]
Loading 0:   3%|▎         | 12/464 [00:00<00:05, 79.70it/s]
Loading 0:   5%|▍         | 23/464 [00:00<00:05, 74.11it/s]
Loading 0:   7%|▋         | 34/464 [00:00<00:05, 85.52it/s]
Loading 0:  10%|▉         | 45/464 [00:00<00:04, 92.09it/s]
Loading 0:  12%|█▏        | 56/464 [00:00<00:04, 95.97it/s]
Loading 0:  14%|█▍        | 67/464 [00:00<00:04, 95.69it/s]
Loading 0:  17%|█▋        | 78/464 [00:00<00:03, 97.08it/s]
Loading 0:  19%|█▉        | 88/464 [00:01<00:06, 62.49it/s]
Loading 0:  21%|██        | 98/464 [00:01<00:05, 68.65it/s]
Loading 0:  23%|██▎       | 108/464 [00:01<00:04, 74.60it/s]
Loading 0:  25%|██▌       | 117/464 [00:01<00:04, 75.08it/s]
Loading 0:  28%|██▊       | 128/464 [00:01<00:04, 77.98it/s]
Loading 0:  30%|██▉       | 139/464 [00:01<00:03, 83.30it/s]
Loading 0:  32%|███▏      | 150/464 [00:01<00:03, 82.49it/s]
Loading 0:  35%|███▍      | 161/464 [00:01<00:03, 81.68it/s]
Loading 0:  37%|███▋      | 172/464 [00:02<00:03, 78.78it/s]
Loading 0:  39%|███▉      | 183/464 [00:02<00:03, 78.96it/s]
Loading 0:  42%|████▏     | 194/464 [00:02<00:03, 79.20it/s]
Loading 0:  44%|████▍     | 203/464 [00:02<00:03, 81.26it/s]
Loading 0:  46%|████▌     | 214/464 [00:02<00:02, 85.69it/s]
Loading 0:  48%|████▊     | 225/464 [00:02<00:02, 88.57it/s]
Loading 0:  50%|█████     | 234/464 [00:02<00:03, 67.42it/s]
Loading 0:  52%|█████▏    | 243/464 [00:03<00:03, 71.97it/s]
Loading 0:  55%|█████▍    | 254/464 [00:03<00:02, 77.82it/s]
Loading 0:  57%|█████▋    | 265/464 [00:03<00:02, 80.47it/s]
Loading 0:  59%|█████▉    | 276/464 [00:03<00:02, 80.20it/s]
Loading 0:  62%|██████▏   | 287/464 [00:03<00:02, 82.31it/s]
Loading 0:  64%|██████▍   | 298/464 [00:03<00:02, 82.31it/s]
Loading 0:  67%|██████▋   | 309/464 [00:03<00:01, 82.68it/s]
Loading 0:  69%|██████▉   | 320/464 [00:03<00:01, 80.51it/s]
Loading 0:  71%|███████▏  | 331/464 [00:04<00:01, 79.03it/s]
Loading 0:  74%|███████▎  | 342/464 [00:04<00:01, 80.34it/s]
Loading 0:  76%|███████▌  | 353/464 [00:04<00:01, 80.78it/s]
Loading 0:  78%|███████▊  | 362/464 [00:04<00:01, 57.48it/s]
Loading 0:  81%|████████  | 375/464 [00:04<00:01, 68.36it/s]
Loading 0:  83%|████████▎ | 386/464 [00:04<00:01, 73.01it/s]
Loading 0:  86%|████████▌ | 397/464 [00:05<00:00, 77.16it/s]
Loading 0:  88%|████████▊ | 408/464 [00:05<00:00, 81.84it/s]
Loading 0:  90%|█████████ | 419/464 [00:05<00:00, 85.91it/s]
Loading 0:  93%|█████████▎| 430/464 [00:05<00:00, 89.53it/s]
Loading 0:  95%|█████████▌| 441/464 [00:05<00:00, 87.80it/s]
Loading 0:  97%|█████████▋| 452/464 [00:05<00:00, 87.54it/s]
Loading 0: 100%|█████████▉| 463/464 [00:05<00:00, 86.23it/s]
                                                            
Job princeton-nlp-gemma-2-9b-1853-v1-mkmlizer completed after 85.86s with status: succeeded
Stopping job with name princeton-nlp-gemma-2-9b-1853-v1-mkmlizer
Pipeline stage MKMLizer completed in 87.02s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.09s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service princeton-nlp-gemma-2-9b-1853-v1
Waiting for inference service princeton-nlp-gemma-2-9b-1853-v1 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service princeton-nlp-gemma-2-9b-1853-v1 ready after 151.30811285972595s
Pipeline stage MKMLDeployer completed in 151.81s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
5 requests
5 failed requests
5th percentile: 20.082931756973267
10th percentile: 20.084768533706665
20th percentile: 20.088442087173462
30th percentile: 20.100537919998168
40th percentile: 20.121056032180785
50th percentile: 20.141574144363403
60th percentile: 20.150653409957886
70th percentile: 20.15973267555237
80th percentile: 20.209852123260497
90th percentile: 20.301011753082275
95th percentile: 20.346591567993165
99th percentile: 20.383055419921874
mean time: 20.173878335952757
%s, retrying in %s seconds...
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
5 requests
5 failed requests
5th percentile: 20.288898134231566
10th percentile: 20.3505211353302
20th percentile: 20.473767137527467
30th percentile: 20.551531505584716
40th percentile: 20.583814239501955
50th percentile: 20.61609697341919
60th percentile: 20.623627376556396
70th percentile: 20.631157779693602
80th percentile: 20.66568202972412
90th percentile: 20.72720012664795
95th percentile: 20.757959175109864
99th percentile: 20.782566413879394
mean time: 20.560480690002443
%s, retrying in %s seconds...
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPSConnectionPool(host='guanaco-submitter.chai-research.com', port=443): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
5 requests
5 failed requests
5th percentile: 20.105470848083495
10th percentile: 20.112556600570677
20th percentile: 20.126728105545045
30th percentile: 20.134281158447266
40th percentile: 20.135215759277344
50th percentile: 20.136150360107422
60th percentile: 20.14346570968628
70th percentile: 20.150781059265135
80th percentile: 20.1573016166687
90th percentile: 20.163027381896974
95th percentile: 20.16589026451111
99th percentile: 20.168180570602416
mean time: 20.138308238983154
clean up pipeline due to error=%s
Shutdown handler de-registered
princeton-nlp-gemma-2-9b_1853_v1 status is now failed due to DeploymentManager action
princeton-nlp-gemma-2-9b_1853_v1 status is now torndown due to DeploymentManager action