princeton-nlp-mistral-7b_6234

developer_uid: Meliodia

submission_id: princeton-nlp-mistral-7b_6234_v1

model_name: princeton-nlp-mistral-7b_6234_v1

model_group: princeton-nlp/Mistral-7B

status: torndown

timestamp: 2024-09-12T19:41:09+00:00

num_battles: 13063

num_wins: 4259

celo_rating: 1120.47

family_friendly_score: 0.0

submission_type: basic

model_repo: princeton-nlp/Mistral-7B-Instruct-SimPO

model_architecture: MistralForCausalLM

model_num_parameters: 7241732096.0

best_of: 1

max_input_tokens: 1024

max_output_tokens: 64

latencies: [{'batch_size': 1, 'throughput': 0.9784578547293852, 'latency_mean': 1.0219559752941132, 'latency_p50': 1.0239979028701782, 'latency_p90': 1.120844340324402}, {'batch_size': 5, 'throughput': 2.706995340623484, 'latency_mean': 1.8428882372379303, 'latency_p50': 1.8374921083450317, 'latency_p90': 2.077126121520996}, {'batch_size': 10, 'throughput': 3.550382053393241, 'latency_mean': 2.7872896218299865, 'latency_p50': 2.796810269355774, 'latency_p90': 3.085711121559143}, {'batch_size': 15, 'throughput': 3.8956569577829305, 'latency_mean': 3.8043550562858583, 'latency_p50': 3.7631568908691406, 'latency_p90': 4.396522808074951}, {'batch_size': 20, 'throughput': 4.060436386375764, 'latency_mean': 4.860944818258286, 'latency_p50': 4.810726523399353, 'latency_p90': 5.649369025230408}, {'batch_size': 25, 'throughput': 4.0423102250738605, 'latency_mean': 6.048988724946976, 'latency_p50': 6.050798058509827, 'latency_p90': 6.794694662094116}]

gpu_counts: {'NVIDIA RTX A5000': 1}

display_name: princeton-nlp-mistral-7b_6234_v1

is_internal_developer: True

language_model: princeton-nlp/Mistral-7B-Instruct-SimPO

model_size: 7B

ranking_group: single

throughput_3p7s: 3.92

us_pacific_date: 2024-09-12

win_ratio: 0.3260353670672893

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 1, 'max_output_tokens': 64}

formatter: {'memory_template': "{bot_name}'s Persona: {memory}\n####\n", 'prompt_template': '{prompt}\n<START>\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
Connection pool is full, discarding connection: %s. Connection pool size: %s
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name princeton-nlp-mistral-7b-6234-v1-mkmlizer
Waiting for job on princeton-nlp-mistral-7b-6234-v1-mkmlizer to finish
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║     _____            __           __                                ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║    / _/ /_ ___    __/ /  ___ ___ / /                                ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║   / _/ / // / |/|/ / _ \/ -_) -_) /                                 ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  /_//_/\_, /|__,__/_//_/\__/\__/_/                                  ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║       /___/                                                         ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║                                                                     ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  Version: 0.10.1                                                    ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  Copyright 2023 MK ONE TECHNOLOGIES Inc.                            ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  https://mk1.ai                                                     ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║                                                                     ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  The license key for the current software has been verified as      ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  belonging to:                                                      ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║                                                                     ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  Chai Research Corp.                                                ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f                   ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║  Expiration: 2024-10-15 23:59:59                                    ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ║                                                                     ║
princeton-nlp-mistral-7b-6234-v1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
princeton-nlp-mistral-7b-6234-v1-mkmlizer: Downloaded to shared memory in 32.515s
princeton-nlp-mistral-7b-6234-v1-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmp4bx6awyy, device:0
princeton-nlp-mistral-7b-6234-v1-mkmlizer: Saving flywheel model at /dev/shm/model_cache
princeton-nlp-mistral-7b-6234-v1-mkmlizer: quantized model in 16.935s
princeton-nlp-mistral-7b-6234-v1-mkmlizer: Processed model princeton-nlp/Mistral-7B-Instruct-SimPO in 49.450s
princeton-nlp-mistral-7b-6234-v1-mkmlizer: creating bucket guanaco-mkml-models
princeton-nlp-mistral-7b-6234-v1-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
princeton-nlp-mistral-7b-6234-v1-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1
princeton-nlp-mistral-7b-6234-v1-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1/config.json
princeton-nlp-mistral-7b-6234-v1-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1/special_tokens_map.json
princeton-nlp-mistral-7b-6234-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1/tokenizer_config.json
princeton-nlp-mistral-7b-6234-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.model s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1/tokenizer.model
princeton-nlp-mistral-7b-6234-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1/tokenizer.json
princeton-nlp-mistral-7b-6234-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/princeton-nlp-mistral-7b-6234-v1/flywheel_model.0.safetensors
princeton-nlp-mistral-7b-6234-v1-mkmlizer: 
Loading 0:   0%|          | 0/291 [00:00<?, ?it/s]
Loading 0:   2%|▏         | 5/291 [00:00<00:07, 37.47it/s]
Loading 0:   4%|▍         | 13/291 [00:00<00:04, 58.13it/s]
Loading 0:   7%|▋         | 20/291 [00:00<00:05, 50.41it/s]
Loading 0:   9%|▉         | 26/291 [00:00<00:05, 48.01it/s]
Loading 0:  11%|█         | 31/291 [00:00<00:05, 46.92it/s]
Loading 0:  13%|█▎        | 38/291 [00:00<00:05, 44.70it/s]
Loading 0:  16%|█▌        | 46/291 [00:00<00:04, 53.56it/s]
Loading 0:  18%|█▊        | 52/291 [00:01<00:04, 49.31it/s]
Loading 0:  20%|█▉        | 58/291 [00:01<00:04, 49.77it/s]
Loading 0:  22%|██▏       | 64/291 [00:01<00:04, 51.45it/s]
Loading 0:  24%|██▍       | 70/291 [00:01<00:04, 45.66it/s]
Loading 0:  26%|██▌       | 75/291 [00:01<00:04, 45.57it/s]
Loading 0:  28%|██▊       | 82/291 [00:01<00:04, 50.75it/s]
Loading 0:  30%|███       | 88/291 [00:01<00:04, 47.85it/s]
Loading 0:  32%|███▏      | 93/291 [00:01<00:04, 46.97it/s]
Loading 0:  34%|███▎      | 98/291 [00:02<00:05, 33.79it/s]
Loading 0:  36%|███▌      | 104/291 [00:02<00:05, 36.87it/s]
Loading 0:  38%|███▊      | 112/291 [00:02<00:03, 44.75it/s]
Loading 0:  41%|████      | 118/291 [00:02<00:03, 44.27it/s]
Loading 0:  42%|████▏     | 123/291 [00:02<00:03, 43.80it/s]
Loading 0:  45%|████▍     | 130/291 [00:02<00:03, 48.32it/s]
Loading 0:  47%|████▋     | 136/291 [00:02<00:03, 46.27it/s]
Loading 0:  48%|████▊     | 141/291 [00:03<00:03, 46.52it/s]
Loading 0:  51%|█████     | 148/291 [00:03<00:02, 51.55it/s]
Loading 0:  53%|█████▎    | 154/291 [00:03<00:02, 46.56it/s]
Loading 0:  55%|█████▍    | 159/291 [00:03<00:02, 45.67it/s]
Loading 0:  57%|█████▋    | 165/291 [00:03<00:02, 48.81it/s]
Loading 0:  59%|█████▉    | 171/291 [00:03<00:02, 49.29it/s]
Loading 0:  61%|██████    | 177/291 [00:03<00:02, 43.16it/s]
Loading 0:  63%|██████▎   | 184/291 [00:03<00:02, 47.80it/s]
Loading 0:  65%|██████▌   | 190/291 [00:04<00:02, 45.99it/s]
Loading 0:  67%|██████▋   | 195/291 [00:04<00:02, 45.98it/s]
Loading 0:  70%|██████▉   | 203/291 [00:04<00:01, 49.24it/s]
Loading 0:  71%|███████▏  | 208/291 [00:05<00:07, 10.73it/s]
Loading 0:  73%|███████▎  | 213/291 [00:06<00:05, 13.43it/s]
Loading 0:  76%|███████▌  | 221/291 [00:06<00:03, 19.40it/s]
Loading 0:  78%|███████▊  | 227/291 [00:06<00:02, 22.96it/s]
Loading 0:  80%|███████▉  | 232/291 [00:06<00:02, 26.09it/s]
Loading 0:  82%|████████▏ | 238/291 [00:06<00:01, 31.42it/s]
Loading 0:  84%|████████▍ | 244/291 [00:06<00:01, 36.61it/s]
Loading 0:  86%|████████▌ | 250/291 [00:06<00:01, 36.09it/s]
Loading 0:  88%|████████▊ | 257/291 [00:06<00:00, 42.70it/s]
Loading 0:  90%|█████████ | 263/291 [00:07<00:00, 42.89it/s]
Loading 0:  92%|█████████▏| 269/291 [00:07<00:00, 44.72it/s]
Loading 0:  95%|█████████▍| 275/291 [00:07<00:00, 45.74it/s]
Loading 0:  97%|█████████▋| 281/291 [00:07<00:00, 44.67it/s]
Loading 0:  98%|█████████▊| 286/291 [00:07<00:00, 45.14it/s]
                                                            
Job princeton-nlp-mistral-7b-6234-v1-mkmlizer completed after 73.32s with status: succeeded
Stopping job with name princeton-nlp-mistral-7b-6234-v1-mkmlizer
Pipeline stage MKMLizer completed in 74.62s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.10s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service princeton-nlp-mistral-7b-6234-v1
Waiting for inference service princeton-nlp-mistral-7b-6234-v1 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service princeton-nlp-mistral-7b-6234-v1 ready after 171.58997440338135s
Pipeline stage MKMLDeployer completed in 172.04s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 1.768230676651001s
Received healthy response to inference request in 0.7698352336883545s
Received healthy response to inference request in 0.8999876976013184s
Received healthy response to inference request in 0.9173283576965332s
Received healthy response to inference request in 1.0742487907409668s
5 requests
0 failed requests
5th percentile: 0.7958657264709472
10th percentile: 0.8218962192535401
20th percentile: 0.8739572048187256
30th percentile: 0.9034558296203613
40th percentile: 0.9103920936584473
50th percentile: 0.9173283576965332
60th percentile: 0.9800965309143066
70th percentile: 1.04286470413208
80th percentile: 1.2130451679229737
90th percentile: 1.4906379222869874
95th percentile: 1.629434299468994
99th percentile: 1.7404714012145996
mean time: 1.0859261512756349
Pipeline stage StressChecker completed in 9.77s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 7.36s
Shutdown handler de-registered
princeton-nlp-mistral-7b_6234_v1 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.12s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.11s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service princeton-nlp-mistral-7b-6234-v1-profiler
Waiting for inference service princeton-nlp-mistral-7b-6234-v1-profiler to be ready
Inference service princeton-nlp-mistral-7b-6234-v1-profiler ready after 170.39926862716675s
Pipeline stage MKMLProfilerDeployer completed in 170.77s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/princeton-nlp-mistra99b325770b60c404a4ae10a9b51df7cd-deplorplkv:/code/chaiverse_profiler_1726170552 --namespace tenant-chaiml-guanaco
kubectl exec -it princeton-nlp-mistra99b325770b60c404a4ae10a9b51df7cd-deplorplkv --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1726170552 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 1024 --output_tokens 64 --summary /code/chaiverse_profiler_1726170552/summary.json'
kubectl exec -it princeton-nlp-mistra99b325770b60c404a4ae10a9b51df7cd-deplorplkv --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1726170552/summary.json'
Pipeline stage MKMLProfilerRunner completed in 489.86s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service princeton-nlp-mistral-7b-6234-v1-profiler is running
Tearing down inference service princeton-nlp-mistral-7b-6234-v1-profiler
Service princeton-nlp-mistral-7b-6234-v1-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.01s
Shutdown handler de-registered
princeton-nlp-mistral-7b_6234_v1 status is now inactive due to auto deactivation removed underperforming models
princeton-nlp-mistral-7b_6234_v1 status is now torndown due to DeploymentManager action