zzyabc-llama-3-qlora

developer_uid: ZZYABC

submission_id: zzyabc-llama-3-qlora_v1

model_name: zzyabc-llama-3-qlora_v1

model_group: ZZYABC/llama-3-qlora

status: torndown

timestamp: 2024-12-20T08:12:35+00:00

num_battles: 9683

num_wins: 4118

celo_rating: 1207.8

family_friendly_score: 0.5816

family_friendly_standard_error: 0.006976266049972578

submission_type: basic

model_repo: ZZYABC/llama-3-qlora

model_architecture: LlamaForCausalLM

model_num_parameters: 8030261248.0

best_of: 8

max_input_tokens: 1024

max_output_tokens: 64

reward_model: default

latencies: [{'batch_size': 1, 'throughput': 0.8677359664258186, 'latency_mean': 1.1523356139659882, 'latency_p50': 1.1571705341339111, 'latency_p90': 1.2692844867706299}, {'batch_size': 4, 'throughput': 1.8666569480554278, 'latency_mean': 2.1382188594341276, 'latency_p50': 2.142951488494873, 'latency_p90': 2.3761706829071043}, {'batch_size': 5, 'throughput': 2.003945848411457, 'latency_mean': 2.4738527369499206, 'latency_p50': 2.4773805141448975, 'latency_p90': 2.7649242639541627}, {'batch_size': 8, 'throughput': 2.2365161958600517, 'latency_mean': 3.5502326464653016, 'latency_p50': 3.561653971672058, 'latency_p90': 3.981434965133667}, {'batch_size': 10, 'throughput': 2.3015827958213704, 'latency_mean': 4.312681185007095, 'latency_p50': 4.287150740623474, 'latency_p90': 4.931620478630066}, {'batch_size': 12, 'throughput': 2.3391420848357614, 'latency_mean': 5.084839828014374, 'latency_p50': 5.136856913566589, 'latency_p90': 5.767050504684448}, {'batch_size': 15, 'throughput': 2.358770056932134, 'latency_mean': 6.272888153791428, 'latency_p50': 6.305419325828552, 'latency_p90': 7.019041967391968}]

gpu_counts: {'NVIDIA RTX A5000': 1}

display_name: zzyabc-llama-3-qlora_v1

ineligible_reason: num_battles<10000

is_internal_developer: False

language_model: ZZYABC/llama-3-qlora

model_size: 8B

ranking_group: single

throughput_3p7s: 2.27

us_pacific_date: 2024-12-20

win_ratio: 0.42528142104719613

generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': "{bot_name}'s Persona: {memory}\n####\n", 'prompt_template': '{prompt}\n<START>\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name zzyabc-llama-3-qlora-v1-mkmlizer
Waiting for job on zzyabc-llama-3-qlora-v1-mkmlizer to finish
zzyabc-llama-3-qlora-v1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
zzyabc-llama-3-qlora-v1-mkmlizer: ║     _____            __           __                                ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║    / _/ /_ ___    __/ /  ___ ___ / /                                ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║   / _/ / // / |/|/ / _ \/ -_) -_) /                                 ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  /_//_/\_, /|__,__/_//_/\__/\__/_/                                  ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║       /___/                                                         ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║                                                                     ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  Version: 0.11.12                                                   ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  Copyright 2023 MK ONE TECHNOLOGIES Inc.                            ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  https://mk1.ai                                                     ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║                                                                     ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  The license key for the current software has been verified as      ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  belonging to:                                                      ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║                                                                     ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  Chai Research Corp.                                                ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f                   ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║  Expiration: 2025-01-15 23:59:59                                    ║
zzyabc-llama-3-qlora-v1-mkmlizer: ║                                                                     ║
zzyabc-llama-3-qlora-v1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
zzyabc-llama-3-qlora-v1-mkmlizer: Downloaded to shared memory in 34.580s
zzyabc-llama-3-qlora-v1-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpqq1gno58, device:0
zzyabc-llama-3-qlora-v1-mkmlizer: Saving flywheel model at /dev/shm/model_cache
zzyabc-llama-3-qlora-v1-mkmlizer: /opt/conda/lib/python3.10/site-packages/mk1/flywheel/functional/loader.py:55: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
zzyabc-llama-3-qlora-v1-mkmlizer:   tensors = torch.load(model_shard_filename, map_location=torch.device(self.device), mmap=True)
zzyabc-llama-3-qlora-v1-mkmlizer: quantized model in 27.309s
zzyabc-llama-3-qlora-v1-mkmlizer: Processed model ZZYABC/llama-3-qlora in 61.889s
zzyabc-llama-3-qlora-v1-mkmlizer: creating bucket guanaco-mkml-models
zzyabc-llama-3-qlora-v1-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
zzyabc-llama-3-qlora-v1-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/zzyabc-llama-3-qlora-v1
zzyabc-llama-3-qlora-v1-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/zzyabc-llama-3-qlora-v1/config.json
zzyabc-llama-3-qlora-v1-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/zzyabc-llama-3-qlora-v1/special_tokens_map.json
zzyabc-llama-3-qlora-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/zzyabc-llama-3-qlora-v1/tokenizer_config.json
zzyabc-llama-3-qlora-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/zzyabc-llama-3-qlora-v1/tokenizer.json
zzyabc-llama-3-qlora-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/zzyabc-llama-3-qlora-v1/flywheel_model.0.safetensors
zzyabc-llama-3-qlora-v1-mkmlizer: 
Loading 0:   0%|          | 0/291 [00:00<?, ?it/s]
Loading 0:   1%|▏         | 4/291 [00:00<00:07, 39.99it/s]
Loading 0:   4%|▍         | 13/291 [00:00<00:04, 65.72it/s]
Loading 0:   8%|▊         | 22/291 [00:00<00:03, 70.63it/s]
Loading 0:  11%|█         | 31/291 [00:00<00:03, 71.77it/s]
Loading 0:  14%|█▎        | 40/291 [00:00<00:03, 75.23it/s]
Loading 0:  17%|█▋        | 49/291 [00:00<00:03, 77.62it/s]
Loading 0:  20%|█▉        | 58/291 [00:00<00:03, 76.08it/s]
Loading 0:  23%|██▎       | 67/291 [00:00<00:02, 75.52it/s]
Loading 0:  26%|██▌       | 76/291 [00:01<00:02, 72.58it/s]
Loading 0:  29%|██▉       | 84/291 [00:02<00:10, 19.62it/s]
Loading 0:  31%|███       | 90/291 [00:02<00:08, 22.81it/s]
Loading 0:  33%|███▎      | 97/291 [00:02<00:07, 27.52it/s]
Loading 0:  36%|███▋      | 106/291 [00:02<00:05, 34.84it/s]
Loading 0:  40%|███▉      | 115/291 [00:02<00:04, 42.71it/s]
Loading 0:  43%|████▎     | 124/291 [00:02<00:03, 50.85it/s]
Loading 0:  46%|████▌     | 133/291 [00:02<00:02, 57.74it/s]
Loading 0:  49%|████▉     | 142/291 [00:03<00:02, 62.01it/s]
Loading 0:  52%|█████▏    | 151/291 [00:03<00:02, 66.74it/s]
Loading 0:  55%|█████▍    | 160/291 [00:03<00:01, 67.59it/s]
Loading 0:  58%|█████▊    | 169/291 [00:03<00:01, 71.57it/s]
Loading 0:  61%|██████    | 178/291 [00:03<00:01, 72.71it/s]
Loading 0:  64%|██████▍   | 187/291 [00:04<00:05, 19.56it/s]
Loading 0:  67%|██████▋   | 196/291 [00:04<00:03, 24.91it/s]
Loading 0:  70%|███████   | 205/291 [00:04<00:02, 30.75it/s]
Loading 0:  74%|███████▎  | 214/291 [00:05<00:02, 36.50it/s]
Loading 0:  77%|███████▋  | 223/291 [00:05<00:01, 41.30it/s]
Loading 0:  80%|███████▉  | 232/291 [00:05<00:01, 46.62it/s]
Loading 0:  83%|████████▎ | 241/291 [00:05<00:00, 52.84it/s]
Loading 0:  86%|████████▌ | 250/291 [00:05<00:00, 58.56it/s]
Loading 0:  89%|████████▉ | 259/291 [00:05<00:00, 62.84it/s]
Loading 0:  92%|█████████▏| 268/291 [00:05<00:00, 64.38it/s]
Loading 0:  95%|█████████▌| 277/291 [00:06<00:00, 67.63it/s]
Loading 0:  98%|█████████▊| 286/291 [00:06<00:00, 70.36it/s]
                                                            
Job zzyabc-llama-3-qlora-v1-mkmlizer completed after 85.37s with status: succeeded
Stopping job with name zzyabc-llama-3-qlora-v1-mkmlizer
Pipeline stage MKMLizer completed in 85.98s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.17s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service zzyabc-llama-3-qlora-v1
Waiting for inference service zzyabc-llama-3-qlora-v1 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service zzyabc-llama-3-qlora-v1 ready after 240.9515199661255s
Pipeline stage MKMLDeployer completed in 241.43s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter.guanaco-backend.k2.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.717430591583252s
Received healthy response to inference request in 1.2095191478729248s
Received healthy response to inference request in 1.9418749809265137s
Received healthy response to inference request in 1.225426197052002s
5 requests
1 failed requests
5th percentile: 1.2127005577087402
10th percentile: 1.2158819675445556
20th percentile: 1.2222447872161866
30th percentile: 1.323827075958252
40th percentile: 1.520628833770752
50th percentile: 1.717430591583252
60th percentile: 1.8072083473205567
70th percentile: 1.8969861030578612
80th percentile: 5.578790521621707
90th percentile: 12.852621603012086
95th percentile: 16.489537143707274
99th percentile: 19.39906957626343
mean time: 5.244140720367431
%s, retrying in %s seconds...
Received healthy response to inference request in 1.25022554397583s
Received healthy response to inference request in 1.7481441497802734s
Received healthy response to inference request in 1.1330430507659912s
Received healthy response to inference request in 1.7848215103149414s
Received healthy response to inference request in 1.225825309753418s
5 requests
0 failed requests
5th percentile: 1.1515995025634767
10th percentile: 1.1701559543609619
20th percentile: 1.2072688579559325
30th percentile: 1.2307053565979005
40th percentile: 1.2404654502868653
50th percentile: 1.25022554397583
60th percentile: 1.4493929862976074
70th percentile: 1.6485604286193847
80th percentile: 1.755479621887207
90th percentile: 1.7701505661010741
95th percentile: 1.7774860382080078
99th percentile: 1.7833544158935546
mean time: 1.4284119129180908
Pipeline stage StressChecker completed in 36.06s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 2.18s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 2.29s
Shutdown handler de-registered
zzyabc-llama-3-qlora_v1 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Pipeline stage OfflineFamilyFriendlyScorer completed in 2189.03s
Shutdown handler de-registered
zzyabc-llama-3-qlora_v1 status is now inactive due to auto deactivation removed underperforming models
zzyabc-llama-3-qlora_v1 status is now torndown due to DeploymentManager action
zzyabc-llama-3-qlora_v1 status is now torndown due to DeploymentManager action
zzyabc-llama-3-qlora_v1 status is now torndown due to DeploymentManager action