submission_id: anhnv125-llama-op-v17-1_v27
developer_uid: chai_backend_admin
status: deployed
model_repo: anhnv125/llama-op-v17.1
reward_repo: ChaiML/reward_models_100_170000000_cp_498032
generation_params: {'temperature': 1.1, 'top_p': 1.0, 'top_k': 20, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n', '</s>', '<|im_end|>'], 'max_input_tokens': 1024, 'best_of': 4, 'max_output_tokens': 64}
formatter: {'memory_template': "### Instruction:\nAs the assistant, your task is to fully embody the given character, creating immersive, captivating narratives. Stay true to the character's personality and background, generating responses that not only reflect their core traits but are also accurate to their character. Your responses should evoke emotion, suspense, and anticipation in the user. The more detailed and descriptive your response, the more vivid the narrative becomes. Aim to create a fertile environment for ongoing interaction – introduce new elements, offer choices, or ask questions to invite the user to participate more fully in the conversation. This conversation is a dance, always continuing, always evolving.\nYour character: {bot_name}.\nContext: {memory}\n", 'prompt_template': '### Input:\n{prompt}\n', 'bot_template': '{bot_name}: {message}\n', 'user_template': '{user_name}: {message}\n', 'response_template': '### Response:\n{bot_name}:'}
timestamp: 2023-12-18T13:01:25+00:00
model_name: anhnv125-llama-op-v17-1_v27
safety_score: 0.98
entertaining: None
stay_in_character: None
user_preference: None
double_thumbs_up: 4466
thumbs_up: 7711
thumbs_down: 3294
num_battles: 245789
num_wins: 114388
win_ratio: 0.46539104679216725
celo_rating: 1127.54
Resubmit model
Running pipeline stage MKMLizer
Starting job with name anhnv125-llama-op-v17-1-mkmlizer
Waiting for job on anhnv125-llama-op-v17-1-mkmlizer to finish
anhnv125-llama-op-v17-1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
anhnv125-llama-op-v17-1-mkmlizer: ║ _______ __ __ _______ _____ ║
anhnv125-llama-op-v17-1-mkmlizer: ║ | | | |/ | | | |_ ║
anhnv125-llama-op-v17-1-mkmlizer: ║ | | <| | | ║
anhnv125-llama-op-v17-1-mkmlizer: ║ |__|_|__|__|\__|__|_|__|_______| ║
anhnv125-llama-op-v17-1-mkmlizer: ║ ║
anhnv125-llama-op-v17-1-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
anhnv125-llama-op-v17-1-mkmlizer: ║ ║
anhnv125-llama-op-v17-1-mkmlizer: ║ The license key for the current software has been verified as ║
anhnv125-llama-op-v17-1-mkmlizer: ║ belonging to: ║
anhnv125-llama-op-v17-1-mkmlizer: ║ ║
anhnv125-llama-op-v17-1-mkmlizer: ║ Chai Research Corp ║
anhnv125-llama-op-v17-1-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
anhnv125-llama-op-v17-1-mkmlizer: ║ Expiration: 2024-01-08 23:59:59 ║
anhnv125-llama-op-v17-1-mkmlizer: ║ ║
anhnv125-llama-op-v17-1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
anhnv125-llama-op-v17-1-mkmlizer: loading model from anhnv125/llama-op-v17.1
anhnv125-llama-op-v17-1-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1067: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
anhnv125-llama-op-v17-1-mkmlizer: warnings.warn(
anhnv125-llama-op-v17-1-mkmlizer: config.json: 0%| | 0.00/654 [00:00<?, ?B/s] config.json: 100%|██████████| 654/654 [00:00<00:00, 5.06MB/s]
anhnv125-llama-op-v17-1-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:690: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
anhnv125-llama-op-v17-1-mkmlizer: warnings.warn(
anhnv125-llama-op-v17-1-mkmlizer: tokenizer_config.json: 0%| | 0.00/749 [00:00<?, ?B/s] tokenizer_config.json: 100%|██████████| 749/749 [00:00<00:00, 9.27MB/s]
anhnv125-llama-op-v17-1-mkmlizer: tokenizer.model: 0%| | 0.00/500k [00:00<?, ?B/s] tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 9.87MB/s]
anhnv125-llama-op-v17-1-mkmlizer: added_tokens.json: 0%| | 0.00/21.0 [00:00<?, ?B/s] added_tokens.json: 100%|██████████| 21.0/21.0 [00:00<00:00, 334kB/s]
anhnv125-llama-op-v17-1-mkmlizer: special_tokens_map.json: 0%| | 0.00/438 [00:00<?, ?B/s] special_tokens_map.json: 100%|██████████| 438/438 [00:00<00:00, 3.52MB/s]
anhnv125-llama-op-v17-1-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
anhnv125-llama-op-v17-1-mkmlizer: warnings.warn(
anhnv125-llama-op-v17-1-mkmlizer: model.safetensors.index.json: 0%| | 0.00/29.9k [00:00<?, ?B/s] model.safetensors.index.json: 100%|██████████| 29.9k/29.9k [00:00<00:00, 116MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 0%| | 0/13 [00:00<?, ?it/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 0%| | 0.00/2.09G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 1%| | 10.5M/2.09G [00:00<01:02, 33.2MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 2%|▏ | 31.5M/2.09G [00:00<00:28, 71.8MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 4%|▍ | 83.9M/2.09G [00:00<00:10, 187MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 9%|▉ | 189M/2.09G [00:00<00:04, 417MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 15%|█▌ | 315M/2.09G [00:00<00:02, 607MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 25%|██▍ | 514M/2.09G [00:00<00:01, 961MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 37%|███▋ | 776M/2.09G [00:01<00:00, 1.41GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 45%|████▌ | 944M/2.09G [00:01<00:01, 1.06GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 52%|█████▏ | 1.09G/2.09G [00:01<00:00, 1.13GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 61%|██████ | 1.27G/2.09G [00:01<00:00, 1.06GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 67%|██████▋ | 1.40G/2.09G [00:02<00:01, 562MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00001-of-00013.safetensors: 73%|███████▎ | 1.53G/2.09G [00:02<00:00, 671MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 31%|███▏ | 640M/2.04G [00:00<00:01, 1.03GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 42%|████▏ | 849M/2.04G [00:01<00:00, 1.24GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 48%|████▊ | 986M/2.04G [00:01<00:00, 1.09GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 55%|█████▍ | 1.12G/2.04G [00:01<00:00, 1.11GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 61%|██████ | 1.25G/2.04G [00:01<00:01, 601MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 71%|███████ | 1.45G/2.04G [00:01<00:00, 775MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 76%|███████▋ | 1.56G/2.04G [00:02<00:00, 619MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 81%|████████ | 1.66G/2.04G [00:02<00:00, 639MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00002-of-00013.safetensors: 85%|████████▌ | 1.74G/2.04G [00:02<00:00, 646MB/s] model-00002-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 792MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 15%|█▌ | 2/13 [00:06<00:33, 3.02s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 0%| | 0.00/2.06G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 1%| | 10.5M/2.06G [00:00<00:48, 41.9MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 4%|▎ | 73.4M/2.06G [00:00<00:07, 254MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 7%|▋ | 147M/2.06G [00:00<00:04, 412MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 13%|█▎ | 262M/2.06G [00:00<00:03, 580MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 16%|█▋ | 336M/2.06G [00:00<00:02, 622MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 20%|█▉ | 409M/2.06G [00:00<00:02, 612MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 24%|██▍ | 503M/2.06G [00:00<00:02, 657MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 33%|███▎ | 671M/2.06G [00:01<00:01, 930MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 54%|█████▍ | 1.11G/2.06G [00:01<00:00, 1.88GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 64%|██████▍ | 1.32G/2.06G [00:01<00:00, 1.53GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 73%|███████▎ | 1.49G/2.06G [00:01<00:00, 1.49GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 84%|████████▎ | 1.72G/2.06G [00:01<00:00, 1.68GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00003-of-00013.safetensors: 93%|█████████▎| 1.91G/2.06G [00:01<00:00, 1.25GB/s] model-00003-of-00013.safetensors: 100%|█████████▉| 2.06G/2.06G [00:01<00:00, 1.06GB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 23%|██▎ | 3/13 [00:08<00:26, 2.67s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 0%| | 0.00/1.96G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 1%| | 10.5M/1.96G [00:00<00:54, 35.8MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 2%|▏ | 41.9M/1.96G [00:00<00:17, 107MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 5%|▍ | 94.4M/1.96G [00:00<00:08, 212MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 6%|▋ | 126M/1.96G [00:00<00:10, 172MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 11%|█ | 210M/1.96G [00:00<00:05, 317MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 16%|█▌ | 315M/1.96G [00:01<00:03, 478MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 19%|█▉ | 377M/1.96G [00:01<00:03, 512MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 28%|██▊ | 545M/1.96G [00:01<00:01, 795MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 38%|███▊ | 734M/1.96G [00:01<00:01, 1.07GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 48%|████▊ | 944M/1.96G [00:01<00:00, 1.35GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 59%|█████▉ | 1.15G/1.96G [00:01<00:00, 1.53GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 71%|███████ | 1.38G/1.96G [00:01<00:00, 1.67GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00004-of-00013.safetensors: 84%|████████▍ | 1.65G/1.96G [00:01<00:00, 1.90GB/s] model-00004-of-00013.safetensors: 100%|█████████▉| 1.96G/1.96G [00:05<00:00, 369MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 31%|███ | 4/13 [00:13<00:34, 3.84s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00005-of-00013.safetensors: 64%|██████▎ | 1.30G/2.04G [00:01<00:00, 841MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00005-of-00013.safetensors: 69%|██████▉ | 1.42G/2.04G [00:02<00:01, 567MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00005-of-00013.safetensors: 74%|███████▍ | 1.51G/2.04G [00:02<00:00, 581MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00005-of-00013.safetensors: 78%|███████▊ | 1.59G/2.04G [00:02<00:00, 573MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00005-of-00013.safetensors: 82%|████████▏ | 1.67G/2.04G [00:02<00:00, 569MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00005-of-00013.safetensors: 87%|████████▋ | 1.78G/2.04G [00:02<00:00, 635MB/s] model-00005-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 740MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 38%|███▊ | 5/13 [00:16<00:28, 3.55s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 0%| | 0.00/2.04G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 1%| | 10.5M/2.04G [00:00<01:01, 33.2MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 2%|▏ | 41.9M/2.04G [00:00<00:17, 116MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 4%|▎ | 73.4M/2.04G [00:00<00:14, 140MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 6%|▌ | 115M/2.04G [00:00<00:11, 169MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 8%|▊ | 168M/2.04G [00:00<00:07, 246MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 16%|█▌ | 325M/2.04G [00:01<00:03, 519MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 19%|█▉ | 388M/2.04G [00:01<00:03, 519MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 22%|██▏ | 451M/2.04G [00:01<00:02, 543MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 26%|██▌ | 535M/2.04G [00:01<00:02, 590MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 36%|███▋ | 744M/2.04G [00:01<00:01, 982MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 51%|█████▏ | 1.05G/2.04G [00:01<00:00, 1.54GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 65%|██████▍ | 1.32G/2.04G [00:01<00:00, 1.83GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 74%|███████▍ | 1.52G/2.04G [00:01<00:00, 1.72GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 85%|████████▍ | 1.73G/2.04G [00:01<00:00, 1.82GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00006-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 1.74GB/s] model-00006-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 937MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 46%|████▌ | 6/13 [00:19<00:22, 3.21s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 0%| | 0.00/2.04G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 1%| | 10.5M/2.04G [00:00<01:23, 24.3MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 7%|▋ | 136M/2.04G [00:00<00:05, 324MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 17%|█▋ | 346M/2.04G [00:00<00:02, 766MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 24%|██▎ | 482M/2.04G [00:00<00:01, 909MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 30%|███ | 619M/2.04G [00:00<00:01, 1.02GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 37%|███▋ | 755M/2.04G [00:00<00:01, 1.06GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 43%|████▎ | 881M/2.04G [00:01<00:01, 1.03GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 49%|████▊ | 996M/2.04G [00:01<00:01, 835MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 56%|█████▋ | 1.15G/2.04G [00:01<00:00, 937MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 62%|██████▏ | 1.26G/2.04G [00:01<00:01, 490MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 71%|███████ | 1.45G/2.04G [00:02<00:00, 637MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 75%|███████▌ | 1.54G/2.04G [00:02<00:00, 554MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 81%|████████ | 1.65G/2.04G [00:02<00:00, 593MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00007-of-00013.safetensors: 85%|████████▍ | 1.73G/2.04G [00:02<00:00, 617MB/s] model-00007-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 746MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 54%|█████▍ | 7/13 [00:22<00:19, 3.21s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 0%| | 0.00/2.06G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 1%| | 10.5M/2.06G [00:00<01:13, 27.7MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 1%| | 21.0M/2.06G [00:00<00:50, 40.8MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 4%|▎ | 73.4M/2.06G [00:00<00:14, 139MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 5%|▌ | 105M/2.06G [00:00<00:11, 169MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 8%|▊ | 157M/2.06G [00:00<00:07, 251MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 11%|█ | 220M/2.06G [00:01<00:05, 324MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 17%|█▋ | 346M/2.06G [00:01<00:03, 545MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 21%|██ | 430M/2.06G [00:01<00:02, 608MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 32%|███▏ | 650M/2.06G [00:01<00:01, 1.03GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 44%|████▍ | 902M/2.06G [00:01<00:00, 1.44GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 54%|█████▍ | 1.11G/2.06G [00:01<00:00, 1.62GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 63%|██████▎ | 1.29G/2.06G [00:01<00:00, 1.50GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 74%|███████▎ | 1.52G/2.06G [00:01<00:00, 1.67GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 82%|████████▏ | 1.69G/2.06G [00:01<00:00, 1.59GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00008-of-00013.safetensors: 100%|█████████▉| 2.06G/2.06G [00:02<00:00, 746MB/s]  model-00008-of-00013.safetensors: 100%|█████████▉| 2.06G/2.06G [00:02<00:00, 736MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 1%| | 10.5M/2.04G [00:00<00:48, 42.1MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 3%|▎ | 52.4M/2.04G [00:00<00:13, 148MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 8%|▊ | 157M/2.04G [00:00<00:04, 410MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 12%|█▏ | 252M/2.04G [00:00<00:03, 562MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 16%|█▌ | 325M/2.04G [00:00<00:03, 563MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 23%|██▎ | 461M/2.04G [00:00<00:02, 754MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 29%|██▉ | 598M/2.04G [00:00<00:01, 907MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 45%|████▌ | 923M/2.04G [00:01<00:00, 1.55GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 53%|█████▎ | 1.09G/2.04G [00:01<00:00, 1.23GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 61%|██████ | 1.24G/2.04G [00:01<00:00, 1.12GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 67%|██████▋ | 1.36G/2.04G [00:01<00:00, 879MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 72%|███████▏ | 1.47G/2.04G [00:01<00:00, 801MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 76%|███████▋ | 1.56G/2.04G [00:02<00:00, 642MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 81%|████████ | 1.65G/2.04G [00:02<00:00, 603MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 85%|████████▍ | 1.73G/2.04G [00:02<00:00, 642MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00010-of-00013.safetensors: 89%|████████▉ | 1.82G/2.04G [00:02<00:00, 700MB/s] model-00010-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 796MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 77%|███████▋ | 10/13 [00:31<00:08, 2.90s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 0%| | 0.00/2.04G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 1%| | 10.5M/2.04G [00:00<00:52, 38.5MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 2%|▏ | 31.5M/2.04G [00:00<00:23, 87.2MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 5%|▍ | 94.4M/2.04G [00:00<00:07, 245MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 7%|▋ | 147M/2.04G [00:00<00:06, 309MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 10%|▉ | 199M/2.04G [00:00<00:04, 370MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 17%|█▋ | 346M/2.04G [00:00<00:02, 684MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 24%|██▎ | 482M/2.04G [00:00<00:01, 823MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 35%|███▍ | 713M/2.04G [00:01<00:01, 1.23GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 49%|████▊ | 996M/2.04G [00:01<00:00, 1.66GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 57%|█████▋ | 1.17G/2.04G [00:01<00:00, 1.58GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 67%|██████▋ | 1.37G/2.04G [00:01<00:00, 1.69GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 76%|███████▌ | 1.55G/2.04G [00:01<00:00, 1.22GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 83%|████████▎ | 1.70G/2.04G [00:02<00:00, 781MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00011-of-00013.safetensors: 90%|█████████ | 1.85G/2.04G [00:02<00:00, 816MB/s] model-00011-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 896MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 85%|████████▍ | 11/13 [00:33<00:05, 2.81s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 0%| | 0.00/2.04G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 1%| | 10.5M/2.04G [00:00<00:58, 35.0MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 2%|▏ | 31.5M/2.04G [00:00<00:22, 88.5MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 3%|▎ | 52.4M/2.04G [00:00<00:16, 121MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 8%|▊ | 157M/2.04G [00:00<00:04, 392MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 15%|█▌ | 315M/2.04G [00:00<00:02, 631MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 23%|██▎ | 461M/2.04G [00:00<00:01, 852MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 34%|███▍ | 703M/2.04G [00:00<00:01, 1.27GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 44%|████▍ | 902M/2.04G [00:01<00:00, 1.44GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 52%|█████▏ | 1.06G/2.04G [00:01<00:00, 1.04GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 58%|█████▊ | 1.18G/2.04G [00:01<00:00, 950MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 64%|██████▎ | 1.30G/2.04G [00:01<00:00, 765MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 68%|██████▊ | 1.39G/2.04G [00:01<00:00, 712MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 72%|███████▏ | 1.48G/2.04G [00:02<00:01, 504MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 76%|███████▌ | 1.55G/2.04G [00:02<00:01, 457MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 83%|████████▎ | 1.69G/2.04G [00:02<00:00, 579MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00012-of-00013.safetensors: 86%|████████▌ | 1.76G/2.04G [00:02<00:00, 585MB/s] model-00012-of-00013.safetensors: 100%|█████████▉| 2.04G/2.04G [00:02<00:00, 719MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Downloading shards: 92%|█████████▏| 12/13 [00:36<00:02, 2.92s/it]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 0%| | 0.00/1.60G [00:00<?, ?B/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 1%| | 10.5M/1.60G [00:00<00:46, 33.9MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 6%|▌ | 94.4M/1.60G [00:00<00:05, 287MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 9%|▉ | 147M/1.60G [00:00<00:04, 309MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 15%|█▌ | 241M/1.60G [00:00<00:02, 470MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 26%|██▋ | 419M/1.60G [00:00<00:01, 823MB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 36%|███▌ | 577M/1.60G [00:00<00:00, 1.03GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 53%|█████▎ | 839M/1.60G [00:01<00:00, 1.30GB/s]
anhnv125-llama-op-v17-1-mkmlizer: model-00013-of-00013.safetensors: 61%|██████ | 978M/1.60G [00:01<00:01, 611MB/s] 
anhnv125-llama-op-v17-1-mkmlizer: Loading checkpoint shards: 0%| | 0/13 [00:00<?, ?it/s] Loading checkpoint shards: 8%|▊ | 1/13 [00:00<00:03, 3.13it/s] Loading checkpoint shards: 15%|█▌ | 2/13 [00:00<00:03, 3.28it/s] Loading checkpoint shards: 23%|██▎ | 3/13 [00:00<00:02, 3.39it/s] Loading checkpoint shards: 31%|███ | 4/13 [00:01<00:02, 3.52it/s] Loading checkpoint shards: 38%|███▊ | 5/13 [00:01<00:02, 3.54it/s] Loading checkpoint shards: 46%|████▌ | 6/13 [00:01<00:01, 3.56it/s] Loading checkpoint shards: 54%|█████▍ | 7/13 [00:01<00:01, 3.60it/s] Loading checkpoint shards: 62%|██████▏ | 8/13 [00:02<00:01, 3.59it/s] Loading checkpoint shards: 69%|██████▉ | 9/13 [00:02<00:01, 3.62it/s] Loading checkpoint shards: 77%|███████▋ | 10/13 [00:02<00:00, 3.62it/s] Loading checkpoint shards: 85%|████████▍ | 11/13 [00:03<00:00, 3.64it/s] Loading checkpoint shards: 92%|█████████▏| 12/13 [00:03<00:00, 3.05it/s] Loading checkpoint shards: 100%|██████████| 13/13 [00:03<00:00, 3.32it/s] Loading checkpoint shards: 100%|██████████| 13/13 [00:03<00:00, 3.44it/s]
anhnv125-llama-op-v17-1-mkmlizer: loaded model in 44.712s
anhnv125-llama-op-v17-1-mkmlizer: saved to disk in 106.101s
anhnv125-llama-op-v17-1-mkmlizer: quantizing model to /tmp/model_cache
anhnv125-llama-op-v17-1-mkmlizer: Saving mkml model at /tmp/model_cache
anhnv125-llama-op-v17-1-mkmlizer: Reading /tmp/tmpdm1mx6i1/model.safetensors.index.json
anhnv125-llama-op-v17-1-mkmlizer: Profiling: 0%| | 0/363 [00:00<?, ?it/s] Profiling: 0%| | 1/363 [00:02<14:59, 2.49s/it] Profiling: 1%| | 3/363 [00:03<06:36, 1.10s/it] Profiling: 1%| | 4/363 [00:04<06:49, 1.14s/it] Profiling: 1%|▏ | 5/363 [00:06<06:52, 1.15s/it] Profiling: 2%|▏ | 7/363 [00:06<04:15, 1.39it/s] Profiling: 2%|▏ | 8/363 [00:07<03:56, 1.50it/s] Profiling: 2%|▏ | 9/363 [00:07<03:44, 1.58it/s] Profiling: 3%|▎ | 10/363 [00:08<03:32, 1.66it/s] Profiling: 3%|▎ | 12/363 [00:09<03:43, 1.57it/s] Profiling: 4%|▎ | 13/363 [00:11<04:51, 1.20it/s] Profiling: 4%|▍ | 14/363 [00:12<05:41, 1.02it/s] Profiling: 4%|▍ | 16/363 [00:12<03:56, 1.47it/s] Profiling: 5%|▍ | 17/363 [00:13<03:45, 1.53it/s] Profiling: 5%|▍ | 18/363 [00:14<03:36, 1.59it/s] Profiling: 5%|▌ | 19/363 [00:14<03:31, 1.62it/s] Profiling: 6%|▌ | 21/363 [00:16<03:44, 1.53it/s] Profiling: 6%|▌ | 22/363 [00:17<04:40, 1.22it/s] Profiling: 6%|▋ | 23/363 [00:18<05:25, 1.05it/s] Profiling: 7%|▋ | 25/363 [00:19<03:51, 1.46it/s] Profiling: 7%|▋ | 26/363 [00:20<03:43, 1.51it/s] Profiling: 7%|▋ | 27/363 [00:20<03:40, 1.52it/s] Profiling: 8%|▊ | 28/363 [00:21<03:34, 1.56it/s] Profiling: 8%|▊ | 29/363 [00:21<03:27, 1.61it/s] Profiling: 8%|▊ | 30/363 [00:22<03:21, 1.66it/s] Profiling: 9%|▊ | 31/363 [00:22<03:17, 1.68it/s] Profiling: 9%|▉ | 32/363 [00:23<03:17, 1.67it/s] Profiling: 9%|▉ | 33/363 [00:23<02:30, 2.20it/s] Profiling: 9%|▉ | 34/363 [00:25<04:02, 1.35it/s] Profiling: 10%|▉ | 35/363 [00:26<05:03, 1.08it/s] Profiling: 10%|▉ | 36/363 [00:27<05:47, 1.06s/it] Profiling: 11%|█ | 39/363 [00:29<03:56, 1.37it/s] Profiling: 11%|█ | 40/363 [00:30<04:47, 1.12it/s] Profiling: 11%|█▏ | 41/363 [00:32<05:23, 1.01s/it] Profiling: 12%|█▏ | 43/363 [00:32<03:53, 1.37it/s] Profiling: 12%|█▏ | 44/363 [00:33<03:43, 1.43it/s] Profiling: 12%|█▏ | 45/363 [00:33<03:36, 1.47it/s] Profiling: 13%|█▎ | 46/363 [00:34<03:32, 1.49it/s] Profiling: 13%|█▎ | 48/363 [00:36<03:40, 1.43it/s] Profiling: 13%|█▎ | 49/363 [00:37<04:33, 1.15it/s] Profiling: 14%|█▍ | 50/363 [00:38<05:16, 1.01s/it] Profiling: 14%|█▍ | 52/363 [00:39<03:41, 1.40it/s] Profiling: 15%|█▍ | 53/363 [00:40<03:31, 1.47it/s] Profiling: 15%|█▍ | 54/363 [00:40<03:28, 1.48it/s] Profiling: 15%|█▌ | 55/363 [00:41<03:22, 1.52it/s] Profiling: 16%|█▌ | 57/363 [00:42<03:24, 1.50it/s] Profiling: 16%|█▌ | 58/363 [00:44<04:19, 1.18it/s] Profiling: 16%|█▋ | 59/363 [00:45<05:02, 1.00it/s] Profiling: 17%|█▋ | 61/363 [00:46<03:32, 1.42it/s] Profiling: 17%|█▋ | 62/363 [00:46<03:05, 1.62it/s] Profiling: 17%|█▋ | 63/363 [00:46<02:37, 1.90it/s] Profiling: 18%|█▊ | 64/363 [00:47<02:16, 2.19it/s] Profiling: 18%|█▊ | 65/363 [00:47<01:59, 2.49it/s] Profiling: 18%|█▊ | 66/363 [00:47<01:47, 2.76it/s] Profiling: 18%|█▊ | 67/363 [00:47<01:38, 3.00it/s] Profiling: 19%|█▉ | 69/363 [00:48<01:48, 2.71it/s] Profiling: 19%|█▉ | 70/363 [00:49<02:13, 2.20it/s] Profiling: 20%|█▉ | 71/363 [00:50<02:35, 1.88it/s] Profiling: 20%|██ | 73/363 [00:50<01:49, 2.65it/s] Profiling: 20%|██ | 74/363 [00:50<01:45, 2.73it/s] Profiling: 21%|██ | 75/363 [00:51<01:43, 2.78it/s] Profiling: 21%|██ | 76/363 [00:51<01:43, 2.77it/s] Profiling: 21%|██ | 77/363 [00:51<01:41, 2.82it/s] Profiling: 21%|██▏ | 78/363 [00:52<01:39, 2.86it/s] Profiling: 22%|██▏ | 80/363 [00:52<01:42, 2.77it/s] Profiling: 22%|██▏ | 81/363 [00:53<02:06, 2.22it/s] Profiling: 23%|██▎ | 82/363 [00:54<02:26, 1.92it/s] Profiling: 23%|██▎ | 84/363 [00:54<01:39, 2.80it/s] Profiling: 24%|██▎ | 86/363 [00:55<01:37, 2.85it/s] Profiling: 24%|██▍ | 87/363 [00:56<01:59, 2.31it/s] Profiling: 24%|██▍ | 88/363 [00:56<02:15, 2.03it/s] Profiling: 25%|██▍ | 90/363 [00:57<01:34, 2.88it/s] Profiling: 25%|██▌ | 91/363 [00:57<01:28, 3.06it/s] Profiling: 25%|██▌ | 92/363 [00:57<01:26, 3.13it/s] Profiling: 26%|██▌ | 93/363 [00:57<01:27, 3.08it/s] Profiling: 26%|██▌ | 95/363 [00:58<01:37, 2.75it/s] Profiling: 26%|██▋ | 96/363 [00:59<02:00, 2.21it/s] Profiling: 27%|██▋ | 97/363 [01:00<02:19, 1.91it/s] Profiling: 27%|██▋ | 99/363 [01:00<01:38, 2.67it/s] Profiling: 28%|██▊ | 100/363 [01:00<01:35, 2.75it/s] Profiling: 28%|██▊ | 101/363 [01:01<01:33, 2.80it/s] Profiling: 28%|██▊ | 102/363 [01:01<01:31, 2.86it/s] Profiling: 29%|██▊ | 104/363 [01:02<01:37, 2.64it/s] Profiling: 29%|██▉ | 105/363 [01:03<01:58, 2.18it/s] Profiling: 29%|██▉ | 106/363 [01:03<02:22, 1.81it/s] Profiling: 30%|██▉ | 108/363 [01:04<01:39, 2.55it/s] Profiling: 30%|███ | 109/363 [01:04<01:36, 2.64it/s] Profiling: 31%|███ | 111/363 [01:05<01:35, 2.65it/s] Profiling: 31%|███ | 112/363 [01:06<01:59, 2.11it/s] Profiling: 31%|███ | 113/363 [01:07<02:38, 1.58it/s] Profiling: 32%|███▏ | 115/363 [01:07<01:59, 2.07it/s] Profiling: 32%|███▏ | 116/363 [01:08<02:01, 2.04it/s] Profiling: 32%|███▏ | 117/363 [01:08<02:01, 2.02it/s] Profiling: 33%|███▎ | 118/363 [01:09<02:02, 2.00it/s] Profiling: 33%|███▎ | 120/363 [01:10<02:14, 1.81it/s] Profiling: 33%|███▎ | 121/363 [01:11<02:53, 1.39it/s] Profiling: 34%|███▎ | 122/363 [01:13<03:27, 1.16it/s] Profiling: 34%|███▍ | 124/363 [01:13<02:25, 1.64it/s] Profiling: 34%|███▍ | 125/363 [01:14<02:17, 1.74it/s] Profiling: 35%|███▍ | 126/363 [01:14<02:08, 1.84it/s] Profiling: 35%|███▍ | 127/363 [01:15<02:02, 1.92it/s] Profiling: 36%|███▌ | 129/363 [01:16<02:13, 1.75it/s] Profiling: 36%|███▌ | 130/363 [01:17<02:49, 1.38it/s] Profiling: 36%|███▌ | 131/363 [01:18<03:22, 1.15it/s] Profiling: 37%|███▋ | 133/363 [01:19<02:23, 1.60it/s] Profiling: 37%|███▋ | 134/363 [01:19<02:17, 1.67it/s] Profiling: 37%|███▋ | 135/363 [01:20<02:11, 1.73it/s] Profiling: 37%|███▋ | 136/363 [01:20<02:07, 1.77it/s] Profiling: 38%|███▊ | 137/363 [01:21<02:04, 1.81it/s] Profiling: 38%|███▊ | 139/363 [01:22<02:16, 1.65it/s] Profiling: 39%|███▊ | 140/363 [01:24<02:47, 1.33it/s] Profiling: 39%|███▉ | 141/363 [01:25<03:12, 1.15it/s] Profiling: 39%|███▉ | 143/363 [01:25<02:11, 1.67it/s] Profiling: 40%|███▉ | 144/363 [01:26<02:06, 1.73it/s] Profiling: 40%|███▉ | 145/363 [01:26<02:01, 1.80it/s] Profiling: 40%|████ | 147/363 [01:28<02:09, 1.66it/s] Profiling: 41%|████ | 148/363 [01:29<02:41, 1.33it/s] Profiling: 41%|████ | 149/363 [01:30<03:09, 1.13it/s] Profiling: 42%|████▏ | 151/363 [01:31<02:12, 1.60it/s] Profiling: 42%|████▏ | 152/363 [01:31<02:02, 1.72it/s] Profiling: 42%|████▏ | 153/363 [01:31<01:54, 1.83it/s] Profiling: 42%|████▏ | 154/363 [01:32<01:48, 1.92it/s] Profiling: 43%|████▎ | 156/363 [01:33<01:55, 1.79it/s] Profiling: 43%|████▎ | 157/363 [01:34<02:28, 1.39it/s] Profiling: 44%|████▎ | 158/363 [01:36<02:54, 1.17it/s] Profiling: 44%|████▍ | 160/363 [01:36<02:02, 1.66it/s] Profiling: 44%|████▍ | 161/363 [01:37<01:57, 1.72it/s] Profiling: 45%|████▍ | 162/363 [01:37<01:53, 1.77it/s] Profiling: 45%|████▍ | 163/363 [01:38<01:50, 1.81it/s] Profiling: 45%|████▌ | 165/363 [01:39<01:57, 1.68it/s] Profiling: 46%|████▌ | 166/363 [01:40<02:27, 1.34it/s] Profiling: 46%|████▌ | 167/363 [01:42<02:51, 1.14it/s] Profiling: 47%|████▋ | 169/363 [01:42<01:59, 1.62it/s] Profiling: 47%|████▋ | 170/363 [01:43<01:54, 1.68it/s] Profiling: 47%|████▋ | 171/363 [01:43<01:50, 1.75it/s] Profiling: 47%|████▋ | 172/363 [01:44<01:46, 1.79it/s] Profiling: 48%|████▊ | 173/363 [01:44<01:22, 2.30it/s] Profiling: 48%|████▊ | 174/363 [01:45<02:05, 1.51it/s] Profiling: 48%|████▊ | 175/363 [01:46<02:35, 1.21it/s] Profiling: 48%|████▊ | 176/363 [01:47<02:57, 1.05it/s] Profiling: 49%|████▉ | 178/363 [01:48<01:57, 1.57it/s] Profiling: 49%|████▉ | 179/363 [01:48<01:51, 1.65it/s] Profiling: 50%|████▉ | 180/363 [01:49<01:45, 1.73it/s] Profiling: 50%|████▉ | 181/363 [01:49<01:41, 1.79it/s] Profiling: 50%|█████ | 183/363 [01:51<01:46, 1.70it/s] Profiling: 51%|█████ | 184/363 [01:52<02:11, 1.36it/s] Profiling: 51%|█████ | 185/363 [01:53<02:31, 1.18it/s] Profiling: 52%|█████▏ | 187/363 [01:54<01:45, 1.67it/s] Profiling: 52%|█████▏ | 188/363 [01:54<01:38, 1.77it/s] Profiling: 52%|█████▏ | 189/363 [01:55<01:35, 1.82it/s] Profiling: 52%|█████▏ | 190/363 [01:55<01:33, 1.85it/s] Profiling: 53%|█████▎ | 192/363 [01:56<01:37, 1.75it/s] Profiling: 53%|█████▎ | 193/363 [01:58<02:02, 1.38it/s] Profiling: 53%|█████▎ | 194/363 [01:59<02:25, 1.16it/s] Profiling: 54%|█████▍ | 196/363 [01:59<01:41, 1.64it/s] Profiling: 54%|█████▍ | 197/363 [02:00<01:37, 1.70it/s] Profiling: 55%|█████▍ | 198/363 [02:00<01:33, 1.77it/s] Profiling: 55%|█████▍ | 199/363 [02:01<01:30, 1.81it/s] Profiling: 55%|█████▌ | 200/363 [02:02<01:59, 1.36it/s] Profiling: 55%|█████▌ | 201/363 [02:03<02:21, 1.14it/s] Profiling: 56%|█████▌ | 202/363 [02:04<02:04, 1.30it/s] Profiling: 56%|█████▌ | 203/363 [02:04<01:51, 1.43it/s] Profiling: 56%|█████▌ | 204/363 [02:05<01:42, 1.56it/s] Profiling: 56%|█████▋ | 205/363 [02:05<01:35, 1.65it/s] Profiling: 57%|█████▋ | 206/363 [02:05<01:12, 2.18it/s] Profiling: 57%|█████▋ | 207/363 [02:07<01:48, 1.44it/s] Profiling: 58%|█████▊ | 210/363 [02:08<01:23, 1.84it/s] Profiling: 58%|█████▊ | 211/363 [02:09<01:45, 1.44it/s] Profiling: 58%|█████▊ | 212/363 [02:11<02:04, 1.21it/s] Profiling: 59%|█████▉ | 214/363 [02:11<01:29, 1.67it/s] Profiling: 59%|█████▉ | 215/363 [02:12<01:25, 1.73it/s] Profiling: 60%|█████▉ | 216/363 [02:12<01:22, 1.78it/s] Profiling: 60%|█████▉ | 217/363 [02:13<01:19, 1.83it/s] Profiling: 60%|██████ | 219/363 [02:14<01:23, 1.72it/s] Profiling: 61%|██████ | 220/363 [02:15<01:45, 1.36it/s] Profiling: 61%|██████ | 221/363 [02:16<02:03, 1.15it/s] Profiling: 61%|██████▏ | 223/363 [02:17<01:26, 1.62it/s] Profiling: 62%|██████▏ | 224/363 [02:17<01:22, 1.69it/s] Profiling: 62%|██████▏ | 225/363 [02:18<01:18, 1.75it/s] Profiling: 62%|██████▏ | 226/363 [02:18<01:15, 1.80it/s] Profiling: 63%|██████▎ | 228/363 [02:20<01:19, 1.69it/s] Profiling: 63%|██████▎ | 229/363 [02:21<01:40, 1.33it/s] Profiling: 63%|██████▎ | 230/363 [02:22<01:55, 1.15it/s] Profiling: 64%|██████▍ | 232/363 [02:23<01:20, 1.63it/s] Profiling: 64%|██████▍ | 233/363 [02:23<01:16, 1.70it/s] Profiling: 64%|██████▍ | 234/363 [02:24<01:13, 1.76it/s] Profiling: 65%|██████▍ | 235/363 [02:24<01:11, 1.80it/s] Profiling: 65%|██████▌ | 236/363 [02:25<01:35, 1.33it/s] Profiling: 65%|██████▌ | 237/363 [02:26<01:26, 1.46it/s] Profiling: 66%|██████▌ | 238/363 [02:26<01:19, 1.58it/s] Profiling: 66%|██████▌ | 239/363 [02:27<01:13, 1.68it/s] Profiling: 66%|██████▌ | 240/363 [02:28<01:10, 1.75it/s] Profiling: 66%|██████▋ | 241/363 [02:28<00:52, 2.31it/s] Profiling: 67%|██████▋ | 242/363 [02:29<01:21, 1.49it/s] Profiling: 67%|██████▋ | 243/363 [02:30<01:41, 1.19it/s] Profiling: 68%|██████▊ | 246/363 [02:31<01:10, 1.66it/s] Profiling: 68%|██████▊ | 247/363 [02:33<01:25, 1.35it/s] Profiling: 68%|██████▊ | 248/363 [02:34<01:38, 1.17it/s] Profiling: 69%|██████▉ | 250/363 [02:34<01:10, 1.61it/s] Profiling: 69%|██████▉ | 251/363 [02:35<01:06, 1.67it/s] Profiling: 69%|██████▉ | 252/363 [02:35<01:04, 1.72it/s] Profiling: 70%|██████▉ | 253/363 [02:36<01:01, 1.78it/s] Profiling: 70%|███████ | 255/363 [02:37<01:03, 1.69it/s] Profiling: 71%|███████ | 256/363 [02:38<01:20, 1.34it/s] Profiling: 71%|███████ | 257/363 [02:40<01:32, 1.14it/s] Profiling: 71%|███████▏ | 259/363 [02:40<01:04, 1.61it/s] Profiling: 72%|███████▏ | 260/363 [02:41<01:01, 1.68it/s] Profiling: 72%|███████▏ | 261/363 [02:41<00:58, 1.74it/s] Profiling: 72%|███████▏ | 262/363 [02:42<00:56, 1.79it/s] Profiling: 73%|███████▎ | 264/363 [02:43<00:58, 1.70it/s] Profiling: 73%|███████▎ | 265/363 [02:44<01:12, 1.35it/s] Profiling: 73%|███████▎ | 266/363 [02:45<01:23, 1.16it/s] Profiling: 74%|███████▍ | 268/363 [02:46<00:56, 1.70it/s] Profiling: 74%|███████▍ | 269/363 [02:46<00:49, 1.89it/s] Profiling: 74%|███████▍ | 270/363 [02:47<00:44, 2.09it/s] Profiling: 75%|███████▍ | 271/363 [02:47<00:40, 2.27it/s] Profiling: 75%|███████▍ | 272/363 [02:47<00:37, 2.42it/s] Profiling: 75%|███████▌ | 273/363 [02:48<00:35, 2.56it/s] Profiling: 75%|███████▌ | 274/363 [02:48<00:33, 2.67it/s] Profiling: 76%|███████▌ | 275/363 [02:48<00:31, 2.77it/s] Profiling: 76%|███████▋ | 277/363 [02:49<00:33, 2.60it/s] Profiling: 77%|███████▋ | 278/363 [02:50<00:38, 2.19it/s] Profiling: 77%|███████▋ | 279/363 [02:50<00:42, 1.96it/s] Profiling: 78%|███████▊ | 282/363 [02:51<00:29, 2.79it/s] Profiling: 78%|███████▊ | 283/363 [02:52<00:33, 2.37it/s] Profiling: 78%|███████▊ | 284/363 [02:52<00:37, 2.09it/s] Profiling: 79%|███████▉ | 286/363 [02:53<00:26, 2.93it/s] Profiling: 79%|███████▉ | 287/363 [02:53<00:24, 3.11it/s] Profiling: 79%|███████▉ | 288/363 [02:53<00:24, 3.11it/s] Profiling: 80%|███████▉ | 289/363 [02:54<00:23, 3.09it/s] Profiling: 80%|████████ | 291/363 [02:54<00:24, 2.94it/s] Profiling: 80%|████████ | 292/363 [02:55<00:30, 2.33it/s] Profiling: 81%|████████ | 293/363 [02:56<00:35, 1.98it/s] Profiling: 81%|████████▏ | 295/363 [02:56<00:24, 2.74it/s] Profiling: 82%|████████▏ | 296/363 [02:56<00:23, 2.80it/s] Profiling: 82%|████████▏ | 297/363 [02:57<00:23, 2.87it/s] Profiling: 82%|████████▏ | 298/363 [02:57<00:22, 2.91it/s] Profiling: 83%|████████▎ | 300/363 [02:58<00:22, 2.80it/s] Profiling: 83%|████████▎ | 301/363 [02:59<00:27, 2.23it/s] Profiling: 83%|████████▎ | 302/363 [02:59<00:32, 1.91it/s] Profiling: 84%|████████▎ | 304/363 [03:00<00:22, 2.67it/s] Profiling: 84%|████████▍ | 305/363 [03:00<00:21, 2.75it/s] Profiling: 84%|████████▍ | 306/363 [03:00<00:20, 2.82it/s] Profiling: 85%|████████▍ | 307/363 [03:01<00:19, 2.87it/s] Profiling: 85%|████████▍ | 308/363 [03:01<00:18, 2.92it/s] Profiling: 85%|████████▌ | 309/363 [03:01<00:18, 2.95it/s] Profiling: 85%|████████▌ | 310/363 [03:02<00:17, 2.99it/s] Profiling: 86%|████████▌ | 312/363 [03:02<00:18, 2.72it/s] Profiling: 86%|████████▌ | 313/363 [03:03<00:22, 2.19it/s] Profiling: 87%|████████▋ | 314/363 [03:04<00:25, 1.90it/s] Profiling: 87%|████████▋ | 316/363 [03:04<00:17, 2.68it/s] Profiling: 88%|████████▊ | 318/363 [03:05<00:16, 2.70it/s] Profiling: 88%|████████▊ | 319/363 [03:06<00:19, 2.24it/s] Profiling: 88%|████████▊ | 320/363 [03:06<00:22, 1.94it/s] Profiling: 89%|████████▊ | 322/363 [03:07<00:15, 2.65it/s] Profiling: 89%|████████▉ | 323/363 [03:07<00:14, 2.73it/s] Profiling: 89%|████████▉ | 324/363 [03:07<00:13, 2.79it/s] Profiling: 90%|████████▉ | 325/363 [03:08<00:13, 2.87it/s] Profiling: 90%|█████████ | 327/363 [03:08<00:12, 2.81it/s] Profiling: 90%|█████████ | 328/363 [03:09<00:15, 2.27it/s] Profiling: 91%|█████████ | 329/363 [03:10<00:17, 1.95it/s] Profiling: 91%|█████████ | 331/363 [03:10<00:11, 2.71it/s] Profiling: 91%|█████████▏| 332/363 [03:11<00:11, 2.79it/s] Profiling: 92%|█████████▏| 333/363 [03:11<00:10, 2.84it/s] Profiling: 92%|█████████▏| 334/363 [03:11<00:10, 2.90it/s] Profiling: 93%|█████████▎| 336/363 [03:12<00:09, 2.81it/s] Profiling: 93%|█████████▎| 337/363 [03:13<00:11, 2.26it/s] Profiling: 93%|█████████▎| 338/363 [03:13<00:12, 1.93it/s] Profiling: 94%|█████████▎| 340/363 [03:14<00:08, 2.69it/s] Profiling: 94%|█████████▍| 341/363 [03:14<00:07, 2.75it/s] Profiling: 94%|█████████▍| 342/363 [03:14<00:07, 2.71it/s] Profiling: 94%|█████████▍| 343/363 [03:15<00:07, 2.78it/s] Profiling: 95%|█████████▍| 344/363 [03:15<00:06, 2.84it/s] Profiling: 95%|█████████▌| 345/363 [03:15<00:06, 2.90it/s] Profiling: 95%|█████████▌| 346/363 [03:17<00:12, 1.36it/s] Profiling: 96%|█████████▌| 348/363 [03:18<00:08, 1.76it/s] Profiling: 96%|█████████▌| 349/363 [03:19<00:08, 1.67it/s] Profiling: 96%|█████████▋| 350/363 [03:19<00:08, 1.57it/s] Profiling: 97%|█████████▋| 352/363 [03:20<00:04, 2.28it/s] Profiling: 97%|█████████▋| 353/363 [03:20<00:04, 2.41it/s] Profiling: 98%|█████████▊| 355/363 [03:21<00:03, 2.53it/s] Profiling: 98%|█████████▊| 356/363 [03:22<00:03, 2.13it/s] Profiling: 98%|█████████▊| 357/363 [03:22<00:03, 1.88it/s] Profiling: 99%|█████████▉| 359/363 [03:23<00:01, 2.62it/s] Profiling: 99%|█████████▉| 360/363 [03:23<00:01, 2.69it/s] Profiling: 99%|█████████▉| 361/363 [03:23<00:00, 2.76it/s] Profiling: 100%|█████████▉| 362/363 [03:24<00:00, 2.82it/s] Profiling: 100%|██████████| 363/363 [03:24<00:00, 1.78it/s]
anhnv125-llama-op-v17-1-mkmlizer: quantized model in 254.563s
anhnv125-llama-op-v17-1-mkmlizer: Processed model anhnv125/llama-op-v17.1 in 405.379s
anhnv125-llama-op-v17-1-mkmlizer: creating bucket guanaco-mkml-models
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/model_cache/mkml_model.tensors s3://guanaco-mkml-models/anhnv125-llama-op-v17-1-v27/mkml_model.tensors
anhnv125-llama-op-v17-1-mkmlizer: loading reward model from ChaiML/reward_models_100_170000000_cp_498032
anhnv125-llama-op-v17-1-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:1067: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
anhnv125-llama-op-v17-1-mkmlizer: warnings.warn(
anhnv125-llama-op-v17-1-mkmlizer: config.json: 0%| | 0.00/1.03k [00:00<?, ?B/s] config.json: 100%|██████████| 1.03k/1.03k [00:00<00:00, 8.63MB/s]
anhnv125-llama-op-v17-1-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:690: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
anhnv125-llama-op-v17-1-mkmlizer: warnings.warn(
anhnv125-llama-op-v17-1-mkmlizer: tokenizer_config.json: 0%| | 0.00/234 [00:00<?, ?B/s] tokenizer_config.json: 100%|██████████| 234/234 [00:00<00:00, 1.32MB/s]
anhnv125-llama-op-v17-1-mkmlizer: vocab.json: 0%| | 0.00/798k [00:00<?, ?B/s] vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 8.32MB/s]
anhnv125-llama-op-v17-1-mkmlizer: merges.txt: 0%| | 0.00/456k [00:00<?, ?B/s] merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 44.7MB/s]
anhnv125-llama-op-v17-1-mkmlizer: tokenizer.json: 0%| | 0.00/2.11M [00:00<?, ?B/s] tokenizer.json: 100%|██████████| 2.11M/2.11M [00:00<00:00, 48.1MB/s]
anhnv125-llama-op-v17-1-mkmlizer: special_tokens_map.json: 0%| | 0.00/99.0 [00:00<?, ?B/s] special_tokens_map.json: 100%|██████████| 99.0/99.0 [00:00<00:00, 785kB/s]
anhnv125-llama-op-v17-1-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
anhnv125-llama-op-v17-1-mkmlizer: warnings.warn(
anhnv125-llama-op-v17-1-mkmlizer: pytorch_model.bin: 0%| | 0.00/510M [00:00<?, ?B/s] pytorch_model.bin: 1%|▏ | 7.08M/510M [00:00<00:13, 37.5MB/s] pytorch_model.bin: 5%|▌ | 28.1M/510M [00:00<00:04, 103MB/s] pytorch_model.bin: 20%|█▉ | 101M/510M [00:00<00:01, 321MB/s] pytorch_model.bin: 77%|███████▋ | 395M/510M [00:00<00:00, 1.19GB/s] pytorch_model.bin: 100%|█████████▉| 510M/510M [00:00<00:00, 963MB/s]
anhnv125-llama-op-v17-1-mkmlizer: Saving model to /tmp/reward_cache/reward.tensors
anhnv125-llama-op-v17-1-mkmlizer: Saving duration: 0.101s
anhnv125-llama-op-v17-1-mkmlizer: Processed model ChaiML/reward_models_100_170000000_cp_498032 in 3.076s
anhnv125-llama-op-v17-1-mkmlizer: creating bucket guanaco-reward-models
anhnv125-llama-op-v17-1-mkmlizer: Bucket 's3://guanaco-reward-models/' created
anhnv125-llama-op-v17-1-mkmlizer: uploading /tmp/reward_cache to s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/special_tokens_map.json s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/special_tokens_map.json
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/config.json s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/config.json
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/tokenizer_config.json s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/tokenizer_config.json
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/merges.txt s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/merges.txt
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/vocab.json s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/vocab.json
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/tokenizer.json s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/tokenizer.json
anhnv125-llama-op-v17-1-mkmlizer: cp /tmp/reward_cache/reward.tensors s3://guanaco-reward-models/anhnv125-llama-op-v17-1-v27_reward/reward.tensors
Job anhnv125-llama-op-v17-1-mkmlizer completed after 438.01s with status: succeeded
Stopping job with name anhnv125-llama-op-v17-1-mkmlizer
Running pipeline stage MKMLKubeTemplater
Running pipeline stage ISVCDeployer
Creating inference service anhnv125-llama-op-v17-1-v27
Waiting for inference service anhnv125-llama-op-v17-1-v27 to be ready
Tearing down inference service anhnv125-llama-op-v17-1-v27
%s, retrying in %s seconds...
Creating inference service anhnv125-llama-op-v17-1-v27
Waiting for inference service anhnv125-llama-op-v17-1-v27 to be ready
Inference service anhnv125-llama-op-v17-1-v27 ready after 160.95137739181519s
Running pipeline stage StressChecker
Received healthy response to inference request with status code 200 in 2.9635775089263916s
Received healthy response to inference request with status code 200 in 2.234947919845581s
Received healthy response to inference request with status code 200 in 1.7072243690490723s
Received healthy response to inference request with status code 200 in 1.7588975429534912s
Received healthy response to inference request with status code 200 in 1.767899990081787s
Received healthy response to inference request with status code 200 in 1.8045425415039062s
Received healthy response to inference request with status code 200 in 1.7901091575622559s
Received healthy response to inference request with status code 200 in 2.1416382789611816s
Received healthy response to inference request with status code 200 in 2.1554510593414307s
Received healthy response to inference request with status code 200 in 1.6895396709442139s
Received healthy response to inference request with status code 200 in 2.101287364959717s
Received healthy response to inference request with status code 200 in 1.9326531887054443s
Received healthy response to inference request with status code 200 in 2.1944074630737305s
Received healthy response to inference request with status code 200 in 2.137819290161133s
Received healthy response to inference request with status code 200 in 1.687147855758667s
Received healthy response to inference request with status code 200 in 1.9822678565979004s
Received healthy response to inference request with status code 200 in 2.1653878688812256s
Received healthy response to inference request with status code 200 in 2.1278507709503174s
Received healthy response to inference request with status code 200 in 1.721447229385376s
Received healthy response to inference request with status code 200 in 2.2079484462738037s
Received healthy response to inference request with status code 200 in 1.907257318496704s
Received healthy response to inference request with status code 200 in 2.227952241897583s
Received healthy response to inference request with status code 200 in 2.198387861251831s
Received healthy response to inference request with status code 200 in 1.8496291637420654s
Received healthy response to inference request with status code 200 in 1.6833360195159912s
Received healthy response to inference request with status code 200 in 2.205404758453369s
Received healthy response to inference request with status code 200 in 1.939793586730957s
Received healthy response to inference request with status code 200 in 2.1704084873199463s
Received healthy response to inference request with status code 200 in 1.999824047088623s
Received healthy response to inference request with status code 200 in 2.2203710079193115s
Received healthy response to inference request with status code 200 in 2.2022860050201416s
Received healthy response to inference request with status code 200 in 2.111144781112671s
Received healthy response to inference request with status code 200 in 2.19240140914917s
Received healthy response to inference request with status code 200 in 2.0138847827911377s
Received healthy response to inference request with status code 200 in 2.221705675125122s
Received healthy response to inference request with status code 200 in 2.210235834121704s
Received healthy response to inference request with status code 200 in 2.2030439376831055s
Received healthy response to inference request with status code 200 in 2.185753583908081s
Received healthy response to inference request with status code 200 in 2.2364282608032227s
Received healthy response to inference request with status code 200 in 2.1992344856262207s
Received healthy response to inference request with status code 200 in 1.8434693813323975s
Received healthy response to inference request with status code 200 in 1.8116862773895264s
Received healthy response to inference request with status code 200 in 1.86061429977417s
Received healthy response to inference request with status code 200 in 1.7621679306030273s
Received healthy response to inference request with status code 200 in 2.202341318130493s
Received healthy response to inference request with status code 200 in 1.9151949882507324s
Received healthy response to inference request with status code 200 in 2.208231210708618s
Received healthy response to inference request with status code 200 in 2.2029175758361816s
Received healthy response to inference request with status code 200 in 2.0253443717956543s
Received healthy response to inference request with status code 200 in 1.9665930271148682s
Received healthy response to inference request with status code 200 in 1.7504079341888428s
Received healthy response to inference request with status code 200 in 2.208203077316284s
Received healthy response to inference request with status code 200 in 2.209193229675293s
Received healthy response to inference request with status code 200 in 1.7837700843811035s
Received healthy response to inference request with status code 200 in 2.3075380325317383s
Received healthy response to inference request with status code 200 in 1.9123647212982178s
Received healthy response to inference request with status code 200 in 2.1232852935791016s
Received healthy response to inference request with status code 200 in 2.1905903816223145s
Received healthy response to inference request with status code 200 in 1.7263388633728027s
Received healthy response to inference request with status code 200 in 1.4363558292388916s
Received healthy response to inference request with status code 200 in 1.349313735961914s
Received healthy response to inference request with status code 200 in 1.7134015560150146s
Received healthy response to inference request with status code 200 in 2.213904619216919s
Received healthy response to inference request with status code 200 in 1.7230281829833984s
Received healthy response to inference request with status code 200 in 2.2111477851867676s
Received healthy response to inference request with status code 200 in 1.6806082725524902s
Received healthy response to inference request with status code 200 in 1.7394659519195557s
Received healthy response to inference request with status code 200 in 2.3076531887054443s
Received healthy response to inference request with status code 200 in 1.8132786750793457s
Received healthy response to inference request with status code 200 in 2.2506096363067627s
Received healthy response to inference request with status code 200 in 2.220632791519165s
Received healthy response to inference request with status code 200 in 1.7317638397216797s
Received healthy response to inference request with status code 200 in 2.0321316719055176s
Received healthy response to inference request with status code 200 in 2.127609968185425s
Received healthy response to inference request with status code 200 in 2.222471237182617s
Received healthy response to inference request with status code 200 in 2.2022900581359863s
Received healthy response to inference request with status code 200 in 2.2560338973999023s
Received healthy response to inference request with status code 200 in 1.8397297859191895s
Received healthy response to inference request with status code 200 in 2.2371881008148193s
Received healthy response to inference request with status code 200 in 2.0230712890625s
Received healthy response to inference request with status code 200 in 1.8366844654083252s
Received healthy response to inference request with status code 200 in 1.6659953594207764s
Received healthy response to inference request with status code 200 in 2.2200236320495605s
Received healthy response to inference request with status code 200 in 2.190105676651001s
Received healthy response to inference request with status code 200 in 1.6273925304412842s
Received healthy response to inference request with status code 200 in 3.2460243701934814s
Received healthy response to inference request with status code 200 in 2.33561372756958s
Received healthy response to inference request with status code 200 in 2.060673236846924s
Received healthy response to inference request with status code 200 in 2.1962783336639404s
Received healthy response to inference request with status code 200 in 1.5615253448486328s
Received healthy response to inference request with status code 200 in 2.2395899295806885s
Received healthy response to inference request with status code 200 in 2.214076042175293s
Received healthy response to inference request with status code 200 in 2.242001533508301s
Received healthy response to inference request with status code 200 in 2.144906520843506s
Received healthy response to inference request with status code 200 in 1.7340359687805176s
Received healthy response to inference request with status code 200 in 2.1950109004974365s
Received healthy response to inference request with status code 200 in 1.7102994918823242s
Received healthy response to inference request with status code 200 in 2.2226266860961914s
Received healthy response to inference request with status code 200 in 3.0500881671905518s
Received healthy response to inference request with status code 200 in 2.0077266693115234s
100 requests
0 failed requests
5th percentile: 1.6798776268959046
10th percentile: 1.7099919795989988
20th percentile: 1.76151385307312
30th percentile: 1.847781229019165
40th percentile: 2.004565620422363
50th percentile: 2.127730369567871
60th percentile: 2.1913147926330567
70th percentile: 2.202955484390259
80th percentile: 2.2152655601501463
90th percentile: 2.2374282836914063
95th percentile: 2.3075437903404237
99th percentile: 3.052047529220582
mean time: 2.044931492805481
Running pipeline stage SafetyScorer
anhnv125-llama-op-v17-1_v27 status is now inactive due to auto deactivation removed underperforming models
anhnv125-llama-op-v17-1_v27 status is now deployed due to admin request

Usage Metrics

Latency Metrics