nousresearch-meta-llama_4939

submission_id: nousresearch-meta-llama_4939_v62

developer_uid: end_to_end_test

best_of: 4

celo_rating: 1191.38

display_name: nousresearch-meta-llama_4939_v62

family_friendly_score: 0.0

formatter: {'memory_template': "{bot_name}'s Persona: {memory}\n####\n", 'prompt_template': '{prompt}\n<START>\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '{bot_name}:', 'truncate_by_message': False}

generation_params: {'temperature': 1.0, 'top_p': 0.99, 'min_p': 0.1, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 512, 'best_of': 4, 'max_output_tokens': 64}

gpu_counts: {'NVIDIA RTX A5000': 1}

ineligible_reason: model is only for e2e test

is_internal_developer: True

language_model: NousResearch/Meta-Llama-3.1-8B-Instruct

latencies: [{'batch_size': 1, 'throughput': 1.0116507708892482, 'latency_mean': 0.9884189462661743, 'latency_p50': 0.984783411026001, 'latency_p90': 1.1027578592300415}, {'batch_size': 5, 'throughput': 3.0351526027293394, 'latency_mean': 1.6395308542251588, 'latency_p50': 1.6451306343078613, 'latency_p90': 1.810835313796997}, {'batch_size': 10, 'throughput': 4.041847532016118, 'latency_mean': 2.4544256854057314, 'latency_p50': 2.430687189102173, 'latency_p90': 2.786658501625061}, {'batch_size': 15, 'throughput': 4.344980111010546, 'latency_mean': 3.404390046596527, 'latency_p50': 3.4223259687423706, 'latency_p90': 3.8944753885269163}, {'batch_size': 20, 'throughput': 4.466962523441733, 'latency_mean': 4.4018342602252964, 'latency_p50': 4.3704530000686646, 'latency_p90': 5.056978917121887}, {'batch_size': 25, 'throughput': 4.508898239247515, 'latency_mean': 5.42511438369751, 'latency_p50': 5.418269634246826, 'latency_p90': 6.275020909309387}]

max_input_tokens: 512

max_output_tokens: 64

model_architecture: LlamaForCausalLM

model_group: NousResearch/Meta-Llama-

model_name: nousresearch-meta-llama_4939_v62

model_num_parameters: 8030261248.0

model_repo: NousResearch/Meta-Llama-3.1-8B-Instruct

model_size: 8B

num_battles: 12922

num_wins: 5544

ranking_group: single

status: torndown

submission_type: basic

throughput_3p7s: 4.45

timestamp: 2024-09-04T20:50:52+00:00

us_pacific_date: 2024-09-04

win_ratio: 0.42903575297941493

Download Preference Data

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline MKMLDeploymentPipeline
run pipeline stage upload
Running pipeline stage MKMLizer
Starting job with name nousresearch-meta-llama-4939-v62-mkmlizer
Waiting for job on nousresearch-meta-llama-4939-v62-mkmlizer to finish
nousresearch-meta-llama-4939-v62-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
nousresearch-meta-llama-4939-v62-mkmlizer: ║     _____            __           __                                ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║    / _/ /_ ___    __/ /  ___ ___ / /                                ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║   / _/ / // / |/|/ / _ \/ -_) -_) /                                 ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  /_//_/\_, /|__,__/_//_/\__/\__/_/                                  ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║       /___/                                                         ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║                                                                     ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  Version: 0.10.1                                                    ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  Copyright 2023 MK ONE TECHNOLOGIES Inc.                            ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  https://mk1.ai                                                     ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║                                                                     ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  The license key for the current software has been verified as      ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  belonging to:                                                      ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║                                                                     ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  Chai Research Corp.                                                ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f                   ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║  Expiration: 2024-10-15 23:59:59                                    ║
nousresearch-meta-llama-4939-v62-mkmlizer: ║                                                                     ║
nousresearch-meta-llama-4939-v62-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
nousresearch-meta-llama-4939-v62-mkmlizer: Downloaded to shared memory in 40.653s
nousresearch-meta-llama-4939-v62-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpaypunw1g, device:0
nousresearch-meta-llama-4939-v62-mkmlizer: Saving flywheel model at /dev/shm/model_cache
nousresearch-meta-llama-4939-v62-mkmlizer: quantized model in 26.356s
nousresearch-meta-llama-4939-v62-mkmlizer: Processed model NousResearch/Meta-Llama-3.1-8B-Instruct in 67.010s
nousresearch-meta-llama-4939-v62-mkmlizer: creating bucket guanaco-mkml-models
nousresearch-meta-llama-4939-v62-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
nousresearch-meta-llama-4939-v62-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/nousresearch-meta-llama-4939-v62
nousresearch-meta-llama-4939-v62-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/nousresearch-meta-llama-4939-v62/config.json
nousresearch-meta-llama-4939-v62-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/nousresearch-meta-llama-4939-v62/special_tokens_map.json
nousresearch-meta-llama-4939-v62-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/nousresearch-meta-llama-4939-v62/tokenizer_config.json
nousresearch-meta-llama-4939-v62-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/nousresearch-meta-llama-4939-v62/tokenizer.json
nousresearch-meta-llama-4939-v62-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/nousresearch-meta-llama-4939-v62/flywheel_model.0.safetensors
nousresearch-meta-llama-4939-v62-mkmlizer: 
Loading 0:   0%|          | 0/291 [00:00<?, ?it/s]
Loading 0:   2%|▏         | 5/291 [00:00<00:08, 35.73it/s]
Loading 0:   5%|▍         | 14/291 [00:00<00:05, 48.77it/s]
Loading 0:   8%|▊         | 23/291 [00:00<00:05, 49.76it/s]
Loading 0:  11%|█         | 31/291 [00:00<00:04, 56.12it/s]
Loading 0:  13%|█▎        | 37/291 [00:00<00:04, 52.60it/s]
Loading 0:  15%|█▌        | 44/291 [00:00<00:04, 56.65it/s]
Loading 0:  17%|█▋        | 50/291 [00:01<00:05, 47.16it/s]
Loading 0:  20%|█▉        | 58/291 [00:01<00:04, 53.78it/s]
Loading 0:  22%|██▏       | 64/291 [00:01<00:04, 46.69it/s]
Loading 0:  24%|██▍       | 70/291 [00:01<00:04, 47.03it/s]
Loading 0:  26%|██▌       | 76/291 [00:01<00:04, 50.13it/s]
Loading 0:  28%|██▊       | 82/291 [00:01<00:04, 47.77it/s]
Loading 0:  30%|██▉       | 87/291 [00:01<00:05, 34.03it/s]
Loading 0:  32%|███▏      | 94/291 [00:02<00:04, 40.89it/s]
Loading 0:  34%|███▍      | 100/291 [00:02<00:04, 39.34it/s]
Loading 0:  36%|███▌      | 105/291 [00:02<00:04, 40.39it/s]
Loading 0:  38%|███▊      | 112/291 [00:02<00:03, 46.56it/s]
Loading 0:  41%|████      | 118/291 [00:02<00:03, 45.73it/s]
Loading 0:  42%|████▏     | 123/291 [00:02<00:03, 45.04it/s]
Loading 0:  45%|████▍     | 130/291 [00:02<00:03, 49.84it/s]
Loading 0:  47%|████▋     | 136/291 [00:02<00:03, 47.50it/s]
Loading 0:  48%|████▊     | 141/291 [00:03<00:03, 47.24it/s]
Loading 0:  51%|█████     | 148/291 [00:03<00:02, 51.28it/s]
Loading 0:  53%|█████▎    | 154/291 [00:03<00:02, 45.97it/s]
Loading 0:  55%|█████▍    | 159/291 [00:03<00:02, 44.57it/s]
Loading 0:  57%|█████▋    | 166/291 [00:03<00:02, 48.91it/s]
Loading 0:  59%|█████▉    | 172/291 [00:03<00:02, 46.59it/s]
Loading 0:  62%|██████▏   | 179/291 [00:03<00:02, 50.42it/s]
Loading 0:  64%|██████▎   | 185/291 [00:03<00:02, 50.79it/s]
Loading 0:  66%|██████▌   | 191/291 [00:04<00:02, 34.35it/s]
Loading 0:  67%|██████▋   | 196/291 [00:04<00:02, 36.16it/s]
Loading 0:  69%|██████▉   | 202/291 [00:04<00:02, 41.03it/s]
Loading 0:  71%|███████▏  | 208/291 [00:04<00:02, 41.20it/s]
Loading 0:  73%|███████▎  | 213/291 [00:04<00:01, 39.99it/s]
Loading 0:  75%|███████▌  | 219/291 [00:04<00:01, 44.18it/s]
Loading 0:  77%|███████▋  | 224/291 [00:04<00:01, 45.59it/s]
Loading 0:  79%|███████▉  | 230/291 [00:05<00:01, 41.58it/s]
Loading 0:  81%|████████▏ | 237/291 [00:05<00:01, 47.31it/s]
Loading 0:  84%|████████▎ | 243/291 [00:05<00:00, 48.03it/s]
Loading 0:  86%|████████▌ | 249/291 [00:05<00:00, 42.89it/s]
Loading 0:  88%|████████▊ | 255/291 [00:05<00:00, 45.44it/s]
Loading 0:  90%|████████▉ | 261/291 [00:05<00:00, 48.48it/s]
Loading 0:  92%|█████████▏| 267/291 [00:05<00:00, 42.05it/s]
Loading 0:  94%|█████████▍| 273/291 [00:06<00:00, 45.92it/s]
Loading 0:  96%|█████████▌| 278/291 [00:06<00:00, 46.59it/s]
Loading 0:  97%|█████████▋| 283/291 [00:06<00:00, 39.64it/s]
Loading 0:  99%|█████████▉| 288/291 [00:11<00:00,  3.12it/s]
                                                            
Job nousresearch-meta-llama-4939-v62-mkmlizer completed after 97.44s with status: succeeded
Stopping job with name nousresearch-meta-llama-4939-v62-mkmlizer
Pipeline stage MKMLizer completed in 98.41s
run pipeline stage kube_config
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.25s
run pipeline stage deploy_isvc
Running pipeline stage MKMLDeployer
Creating inference service nousresearch-meta-llama-4939-v62
Waiting for inference service nousresearch-meta-llama-4939-v62 to be ready
Inference service nousresearch-meta-llama-4939-v62 ready after 151.47615098953247s
Pipeline stage MKMLDeployer completed in 152.24s
run pipeline stage stress_check
Running pipeline stage StressChecker
Received healthy response to inference request in 3.0780069828033447s
Received healthy response to inference request in 1.4435858726501465s
Received healthy response to inference request in 1.1186127662658691s
Received healthy response to inference request in 1.0355000495910645s
Received healthy response to inference request in 1.6247730255126953s
5 requests
0 failed requests
5th percentile: 1.0521225929260254
10th percentile: 1.0687451362609863
20th percentile: 1.1019902229309082
30th percentile: 1.1836073875427247
40th percentile: 1.3135966300964355
50th percentile: 1.4435858726501465
60th percentile: 1.516060733795166
70th percentile: 1.5885355949401856
80th percentile: 1.9154198169708254
90th percentile: 2.496713399887085
95th percentile: 2.7873601913452144
99th percentile: 3.0198776245117185
mean time: 1.660095739364624
Pipeline stage StressChecker completed in 11.42s
run pipeline stage triggering_profiling_pipeline
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud {'submission_id': 'nousresearch-meta-llama_4939_v62', 'pipeline_name': 'profiling_pipeline', 'only_stage': None, 'timeout': 60}
starting trigger_guanaco_pipeline args=--submission_id=nousresearch-meta-llama_4939_v62,--pipeline=profiling_pipeline,--timeout=60,--local
Pipeline stage TriggerMKMLProfilingPipeline completed in 6.79s
Shutdown handler de-registered
nousresearch-meta-llama_4939_v62 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.12s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.11s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service nousresearch-meta-llama-4939-v62-profiler
Waiting for inference service nousresearch-meta-llama-4939-v62-profiler to be ready
Inference service nousresearch-meta-llama-4939-v62-profiler ready after 150.36316466331482s
Pipeline stage MKMLProfilerDeployer completed in 150.72s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/nousresearch-meta-ll22417cd0b001bada237d2bdbcc4e56df-deplojcfcx:/code/chaiverse_profiler_1725483519 --namespace tenant-chaiml-guanaco
kubectl exec -it nousresearch-meta-ll22417cd0b001bada237d2bdbcc4e56df-deplojcfcx --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1725483519 && python profiles.py profile --best_of_n 4 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 512 --output_tokens 64 --summary /code/chaiverse_profiler_1725483519/summary.json'
kubectl exec -it nousresearch-meta-ll22417cd0b001bada237d2bdbcc4e56df-deplojcfcx --namespace tenant-chaiml-guanaco -- bash -c 'cat /code/chaiverse_profiler_1725483519/summary.json'
Pipeline stage MKMLProfilerRunner completed in 453.89s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service nousresearch-meta-llama-4939-v62-profiler is running
Tearing down inference service nousresearch-meta-llama-4939-v62-profiler
Service nousresearch-meta-llama-4939-v62-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 1.60s
Shutdown handler de-registered
nousresearch-meta-llama_4939_v62 status is now inactive due to auto deactivation removed underperforming models
nousresearch-meta-llama_4939_v62 status is now torndown due to DeploymentManager action