arliai-mistral-nemo-12b-_9104

developer_uid: azuruce

submission_id: arliai-mistral-nemo-12b-_9104_v3

model_name: arliai-mistral-nemo-12b-_9104_v3

model_group: ArliAI/Mistral-Nemo-12B-

status: torndown

timestamp: 2024-09-25T03:33:59+00:00

num_battles: 2897

num_wins: 1438

celo_rating: 1248.5

family_friendly_score: 0.5628785179907374

family_friendly_standard_error: 0.00933691244975738

submission_type: basic

model_repo: ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1

model_architecture: MistralForCausalLM

model_num_parameters: 12772070400.0

best_of: 8

max_input_tokens: 1024

max_output_tokens: 64

latencies: [{'batch_size': 1, 'throughput': 0.6157859908425753, 'latency_mean': 1.6238423347473145, 'latency_p50': 1.636804461479187, 'latency_p90': 1.7942851543426515}, {'batch_size': 3, 'throughput': 1.0763038045360662, 'latency_mean': 2.7774524056911467, 'latency_p50': 2.7757405042648315, 'latency_p90': 3.0594329595565797}, {'batch_size': 5, 'throughput': 1.2333855993815805, 'latency_mean': 4.026047124862671, 'latency_p50': 4.070652008056641, 'latency_p90': 4.598918008804321}, {'batch_size': 6, 'throughput': 1.2637781988847825, 'latency_mean': 4.737699222564697, 'latency_p50': 4.749259948730469, 'latency_p90': 5.315864777565002}, {'batch_size': 8, 'throughput': 1.2436186956222737, 'latency_mean': 6.393603932857514, 'latency_p50': 6.41321063041687, 'latency_p90': 7.273434567451477}, {'batch_size': 10, 'throughput': 1.213283275169671, 'latency_mean': 8.211547563076019, 'latency_p50': 8.195226430892944, 'latency_p90': 9.268311667442322}]

gpu_counts: {'NVIDIA RTX A5000': 1}

display_name: arliai-mistral-nemo-12b-_9104_v3

ineligible_reason: num_battles<5000

is_internal_developer: True

language_model: ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1

model_size: 13B

ranking_group: single

throughput_3p7s: 1.21

us_pacific_date: 2024-09-24

win_ratio: 0.49637556092509494

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n', '<|eot_id|>', '<|end_of_text|>'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '', 'prompt_template': '', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name arliai-mistral-nemo-12b-9104-v3-mkmlizer
Waiting for job on arliai-mistral-nemo-12b-9104-v3-mkmlizer to finish
koboldai-llama2-13b-estopia-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/koboldai-llama2-13b-estopia-v1/flywheel_model.0.safetensors
Job koboldai-llama2-13b-estopia-v1-mkmlizer completed after 117.29s with status: succeeded
Stopping job with name koboldai-llama2-13b-estopia-v1-mkmlizer
Pipeline stage MKMLizer completed in 118.43s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.16s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service koboldai-llama2-13b-estopia-v1
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║     _____            __           __                                ║
Waiting for inference service koboldai-llama2-13b-estopia-v1 to be ready
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║    / _/ /_ ___    __/ /  ___ ___ / /                                ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║   / _/ / // / |/|/ / _ \/ -_) -_) /                                 ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  /_//_/\_, /|__,__/_//_/\__/\__/_/                                  ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║       /___/                                                         ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║                                                                     ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  Version: 0.11.12                                                   ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  Copyright 2023 MK ONE TECHNOLOGIES Inc.                            ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  https://mk1.ai                                                     ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║                                                                     ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  The license key for the current software has been verified as      ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  belonging to:                                                      ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║                                                                     ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  Chai Research Corp.                                                ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f                   ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║  Expiration: 2024-10-15 23:59:59                                    ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ║                                                                     ║
arliai-mistral-nemo-12b-9104-v3-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Downloaded to shared memory in 30.437s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpm8jro_qu, device:0
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Saving flywheel model at /dev/shm/model_cache
Connection pool is full, discarding connection: %s. Connection pool size: %s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: quantized model in 36.589s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Processed model ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1 in 67.026s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: creating bucket guanaco-mkml-models
arliai-mistral-nemo-12b-9104-v3-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
arliai-mistral-nemo-12b-9104-v3-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/config.json
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/special_tokens_map.json
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/tokenizer_config.json
Connection pool is full, discarding connection: %s. Connection pool size: %s
arliai-mistral-nemo-12b-9104-v3-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/arliai-mistral-nemo-12b-9104-v3/flywheel_model.0.safetensors
arliai-mistral-nemo-12b-9104-v3-mkmlizer: 
Loading 0:   0%|          | 0/363 [00:00<?, ?it/s]
Loading 0:   1%|▏         | 5/363 [00:00<00:11, 31.84it/s]
Loading 0:   4%|▎         | 13/363 [00:00<00:06, 51.80it/s]
Loading 0:   5%|▌         | 19/363 [00:00<00:07, 46.95it/s]
Loading 0:   7%|▋         | 24/363 [00:00<00:07, 46.05it/s]
Loading 0:   9%|▊         | 31/363 [00:00<00:06, 52.77it/s]
Loading 0:  10%|█         | 37/363 [00:00<00:06, 50.72it/s]
Loading 0:  12%|█▏        | 43/363 [00:00<00:06, 51.33it/s]
Loading 0:  13%|█▎        | 49/363 [00:00<00:05, 53.37it/s]
Loading 0:  15%|█▌        | 55/363 [00:01<00:06, 51.28it/s]
Loading 0:  17%|█▋        | 61/363 [00:01<00:07, 37.98it/s]
Loading 0:  18%|█▊        | 66/363 [00:01<00:07, 37.76it/s]
Loading 0:  20%|█▉        | 72/363 [00:01<00:06, 42.14it/s]
Loading 0:  21%|██        | 77/363 [00:01<00:06, 42.96it/s]
Loading 0:  23%|██▎       | 82/363 [00:01<00:07, 37.80it/s]
Loading 0:  25%|██▍       | 90/363 [00:01<00:05, 46.02it/s]
Loading 0:  26%|██▋       | 96/363 [00:02<00:05, 45.13it/s]
Loading 0:  28%|██▊       | 101/363 [00:02<00:06, 42.78it/s]
Loading 0:  29%|██▉       | 107/363 [00:02<00:05, 46.93it/s]
Loading 0:  31%|███       | 113/363 [00:02<00:05, 43.47it/s]
Loading 0:  33%|███▎      | 118/363 [00:02<00:05, 41.87it/s]
Loading 0:  35%|███▍      | 126/363 [00:02<00:04, 48.38it/s]
Loading 0:  36%|███▋      | 132/363 [00:02<00:04, 46.22it/s]
Loading 0:  38%|███▊      | 137/363 [00:03<00:04, 45.48it/s]
Loading 0:  39%|███▉      | 142/363 [00:03<00:06, 34.81it/s]
Loading 0:  40%|████      | 146/363 [00:03<00:06, 34.20it/s]
Loading 0:  41%|████▏     | 150/363 [00:03<00:06, 33.64it/s]
Loading 0:  43%|████▎     | 157/363 [00:03<00:05, 40.13it/s]
Loading 0:  45%|████▍     | 162/363 [00:03<00:04, 40.63it/s]
Loading 0:  46%|████▌     | 167/363 [00:04<00:06, 32.05it/s]
Loading 0:  48%|████▊     | 174/363 [00:04<00:04, 39.16it/s]
Loading 0:  49%|████▉     | 179/363 [00:04<00:04, 40.33it/s]
Loading 0:  51%|█████     | 184/363 [00:04<00:04, 40.66it/s]
Loading 0:  52%|█████▏    | 190/363 [00:04<00:04, 40.21it/s]
Loading 0:  54%|█████▎    | 195/363 [00:04<00:04, 39.60it/s]
Loading 0:  55%|█████▌    | 201/363 [00:04<00:03, 43.73it/s]
Loading 0:  57%|█████▋    | 206/363 [00:04<00:03, 43.36it/s]
Loading 0:  58%|█████▊    | 211/363 [00:04<00:03, 42.57it/s]
Loading 0:  60%|█████▉    | 216/363 [00:05<00:03, 42.90it/s]
Loading 0:  61%|██████    | 221/363 [00:05<00:03, 43.31it/s]
Loading 0:  62%|██████▏   | 226/363 [00:05<00:05, 26.47it/s]
Loading 0:  63%|██████▎   | 230/363 [00:05<00:04, 26.74it/s]
Loading 0:  65%|██████▌   | 237/363 [00:05<00:03, 33.27it/s]
Loading 0:  66%|██████▋   | 241/363 [00:05<00:03, 32.88it/s]
Loading 0:  68%|██████▊   | 246/363 [00:06<00:03, 35.74it/s]
Loading 0:  69%|██████▉   | 250/363 [00:06<00:03, 34.72it/s]
Loading 0:  70%|███████   | 255/363 [00:06<00:02, 36.92it/s]
Loading 0:  71%|███████▏  | 259/363 [00:06<00:02, 36.69it/s]
Loading 0:  73%|███████▎  | 265/363 [00:06<00:02, 40.80it/s]
Loading 0:  75%|███████▍  | 271/363 [00:06<00:02, 41.12it/s]
Loading 0:  76%|███████▌  | 276/363 [00:06<00:02, 40.63it/s]
Loading 0:  78%|███████▊  | 282/363 [00:06<00:01, 45.45it/s]
Loading 0:  79%|███████▉  | 287/363 [00:07<00:01, 45.66it/s]
Loading 0:  80%|████████  | 292/363 [00:07<00:01, 46.29it/s]
Loading 0:  82%|████████▏ | 298/363 [00:07<00:01, 44.50it/s]
Loading 0:  83%|████████▎ | 303/363 [00:07<00:01, 45.35it/s]
Loading 0:  85%|████████▍ | 308/363 [00:14<00:22,  2.42it/s]
Loading 0:  86%|████████▌ | 312/363 [00:14<00:16,  3.14it/s]
Loading 0:  88%|████████▊ | 320/363 [00:14<00:08,  5.20it/s]
Loading 0:  90%|████████▉ | 326/363 [00:14<00:05,  7.12it/s]
Loading 0:  91%|█████████ | 331/363 [00:14<00:03,  9.12it/s]
Loading 0:  93%|█████████▎| 338/363 [00:14<00:01, 12.97it/s]
Loading 0:  94%|█████████▍| 343/363 [00:15<00:01, 16.03it/s]
Loading 0:  96%|█████████▌| 348/363 [00:15<00:00, 18.18it/s]
Loading 0:  98%|█████████▊| 356/363 [00:15<00:00, 25.23it/s]
Loading 0:  99%|█████████▉| 361/363 [00:15<00:00, 28.87it/s]
                                                            
Job arliai-mistral-nemo-12b-9104-v3-mkmlizer completed after 100.02s with status: succeeded
Stopping job with name arliai-mistral-nemo-12b-9104-v3-mkmlizer
Pipeline stage MKMLizer completed in 100.69s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.35s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service arliai-mistral-nemo-12b-9104-v3
Waiting for inference service arliai-mistral-nemo-12b-9104-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service koboldai-llama2-13b-estopia-v1 ready after 211.71182775497437s
Pipeline stage MKMLDeployer completed in 213.75s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.860642671585083s
Received healthy response to inference request in 2.537536859512329s
Received healthy response to inference request in 2.086594581604004s
Received healthy response to inference request in 3.2676620483398438s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Received healthy response to inference request in 1.7225141525268555s
5 requests
0 failed requests
5th percentile: 1.7953302383422851
10th percentile: 1.8681463241577148
20th percentile: 2.013778495788574
30th percentile: 2.176783037185669
40th percentile: 2.357159948348999
50th percentile: 2.537536859512329
60th percentile: 2.6667791843414306
70th percentile: 2.796021509170532
80th percentile: 2.942046546936035
90th percentile: 3.1048542976379396
95th percentile: 3.1862581729888917
99th percentile: 3.2513812732696534
mean time: 2.4949900627136232
Pipeline stage StressChecker completed in 15.39s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 8.27s
Shutdown handler de-registered
koboldai-llama2-13b-estopia_v1 status is now deployed due to DeploymentManager action
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service arliai-mistral-nemo-12b-9104-v3 ready after 211.83311486244202s
Pipeline stage MKMLDeployer completed in 212.31s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 5.268962621688843s
Received healthy response to inference request in 2.064613103866577s
Received healthy response to inference request in 1.9792077541351318s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Received healthy response to inference request in 1.5111405849456787s
Received healthy response to inference request in 2.8335697650909424s
5 requests
0 failed requests
5th percentile: 1.6047540187835694
10th percentile: 1.69836745262146
20th percentile: 1.8855943202972412
30th percentile: 1.996288824081421
40th percentile: 2.030450963973999
50th percentile: 2.064613103866577
60th percentile: 2.372195768356323
70th percentile: 2.679778432846069
80th percentile: 3.320648336410523
90th percentile: 4.294805479049683
95th percentile: 4.781884050369262
99th percentile: 5.1715469074249265
mean time: 2.7314987659454344
Pipeline stage StressChecker completed in 16.93s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 1.90s
Shutdown handler de-registered
arliai-mistral-nemo-12b-_9104_v3 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.16s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service arliai-mistral-nemo-12b-9104-v3-profiler
Waiting for inference service arliai-mistral-nemo-12b-9104-v3-profiler to be ready
Inference service arliai-mistral-nemo-12b-9104-v3-profiler ready after 210.48657155036926s
Pipeline stage MKMLProfilerDeployer completed in 210.88s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/arliai-mistral-nemo-181f03cb7d11d5aaec6ed913d5d27fcb-deplo5pq64:/code/chaiverse_profiler_1727235829 --namespace tenant-chaiml-guanaco
kubectl exec -it arliai-mistral-nemo-181f03cb7d11d5aaec6ed913d5d27fcb-deplo5pq64 --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1727235829 && python profiles.py profile --best_of_n 8 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 1024 --output_tokens 64 --summary /code/chaiverse_profiler_1727235829/summary.json'
kubectl exec -it arliai-mistral-nemo-181f03cb7d11d5aaec6ed913d5d27fcb-deplo5pq64 --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1727235829/summary.json'
Pipeline stage MKMLProfilerRunner completed in 1161.92s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service arliai-mistral-nemo-12b-9104-v3-profiler is running
Tearing down inference service arliai-mistral-nemo-12b-9104-v3-profiler
Service arliai-mistral-nemo-12b-9104-v3-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.12s
Shutdown handler de-registered
arliai-mistral-nemo-12b-_9104_v3 status is now inactive due to auto deactivation removed underperforming models
arliai-mistral-nemo-12b-_9104_v3 status is now torndown due to DeploymentManager action

ChatRequest

Bot Name

Generation Params

Prompt Formatter

ChatMessage 1

Sender

Message

User Message