submission_id: jellywibble-lora-120k-p_2801_v11
developer_uid: Jellywibble
best_of: 2
celo_rating: 1212.1
display_name: nitral-ai-hathor-l3-8b-v-01_v1
family_friendly_score: 0.0
formatter: {'memory_template': "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{bot_name}'s Persona: {memory}\n\n", 'prompt_template': '{prompt}<|eot_id|>', 'bot_template': '<|start_header_id|>assistant<|end_header_id|>\n\n{bot_name}: {message}<|eot_id|>', 'user_template': '<|start_header_id|>user<|end_header_id|>\n\n{user_name}: {message}<|eot_id|>', 'response_template': '<|start_header_id|>assistant<|end_header_id|>\n\n{bot_name}:', 'truncate_by_message': False}
generation_params: {'temperature': 0.95, 'top_p': 1.0, 'min_p': 0.08, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['\n', '<|eot_id|>'], 'max_input_tokens': 512, 'best_of': 2, 'max_output_tokens': 64}
is_internal_developer: True
language_model: Jellywibble/lora_120k_pref_data_ep3_stacked_elo_alignment
max_input_tokens: 512
max_output_tokens: 64
model_architecture: LlamaForCausalLM
model_group: Jellywibble/lora_120k_pr
model_name: nitral-ai-hathor-l3-8b-v-01_v1
model_num_parameters: 8030261248.0
model_repo: Jellywibble/lora_120k_pref_data_ep3_stacked_elo_alignment
model_size: 8B
num_battles: 91203
num_wins: 47391
ranking_group: single
reward_formatter: {'bot_template': '{bot_name}: {message}\n', 'memory_template': "{bot_name}'s Persona: {memory}\n####\n", 'prompt_template': '{prompt}\n<START>\n', 'response_template': '{bot_name}:', 'truncate_by_message': False, 'user_template': '{user_name}: {message}\n'}
reward_repo: ChaiML/gpt2_xl_pairwise_89m_step_347634
status: torndown
submission_type: basic
timestamp: 2024-07-14T03:51:54+00:00
us_pacific_date: 2024-07-13
win_ratio: 0.5196210650965428
Resubmit model
Running pipeline stage MKMLizer
Starting job with name jellywibble-lora-120k-p-2801-v11-mkmlizer
Waiting for job on jellywibble-lora-120k-p-2801-v11-mkmlizer to finish
jellywibble-lora-120k-p-2801-v11-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ _____ __ __ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ /___/ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ Version: 0.9.5.post2 ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ https://mk1.ai ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ The license key for the current software has been verified as ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ belonging to: ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ Chai Research Corp. ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ Expiration: 2024-10-15 23:59:59 ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ║ ║
jellywibble-lora-120k-p-2801-v11-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
jellywibble-lora-120k-p-2801-v11-mkmlizer: Downloaded to shared memory in 55.065s
jellywibble-lora-120k-p-2801-v11-mkmlizer: quantizing model to /dev/shm/model_cache
jellywibble-lora-120k-p-2801-v11-mkmlizer: Saving flywheel model at /dev/shm/model_cache
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.embed_tokens.weight torch.Size([139542528])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.0.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.0.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.0.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.0.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.0.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.0.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.1.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.1.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.1.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.1.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.1.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.1.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.2.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.2.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.2.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.2.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.2.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.2.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.3.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.3.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.3.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.3.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.4.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.4.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.4.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.4.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.4.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.4.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.5.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.5.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.5.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.5.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.5.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.5.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.6.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.6.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.6.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.6.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.6.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.6.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.7.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.7.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.7.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.7.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.7.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.7.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.8.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.8.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.8.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.10.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.10.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.10.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.10.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.10.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.10.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.11.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.11.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.11.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.11.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.11.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.11.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.12.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.12.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.12.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.12.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.12.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.12.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.13.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.13.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.13.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.13.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.13.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.13.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.14.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.14.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.8.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.8.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.8.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.9.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.9.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.9.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.9.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.9.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.9.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.14.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.14.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.14.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.14.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.15.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.15.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.15.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.15.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.15.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.15.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.16.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.16.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.16.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.16.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.16.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.16.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.17.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.17.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.17.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.17.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.17.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.17.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.18.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.18.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.18.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.18.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.18.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.18.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.19.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.19.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.19.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.19.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.19.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.19.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.20.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.20.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.20.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.20.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.20.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.20.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.21.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.21.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.21.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.21.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.21.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.21.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.22.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.22.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.22.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.22.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.22.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: Loading 0: 0%| | 0/291 [00:00<?, ?it/s] Loading 0: 0%| | 1/291 [00:00<00:00, 295.89it/s] Loading 0: 1%| | 2/291 [00:00<00:07, 38.07it/s] Loading 0: 1%|▏ | 4/291 [00:00<00:10, 28.10it/s] Loading 0: 2%|▏ | 5/291 [00:00<00:08, 35.08it/s] Loading 0: 2%|▏ | 5/291 [00:00<00:08, 35.08it/s] Loading 0: 2%|▏ | 7/291 [00:00<00:08, 35.08it/s] Loading 0: 3%|▎ | 9/291 [00:00<00:08, 35.08it/s] Loading 0: 3%|▎ | 10/291 [00:00<00:08, 35.08it/s] Loading 0: 4%|▍ | 11/291 [00:00<00:07, 35.08it/s] Loading 0: 4%|▍ | 12/291 [00:00<00:09, 30.97it/s] Loading 0: 4%|▍ | 13/291 [00:01<00:08, 30.97it/s] Loading 0: 5%|▍ | 14/291 [00:01<00:08, 30.97it/s] Loading 0: 5%|▌ | 16/291 [00:01<00:25, 10.61it/s] Loading 0: 5%|▌ | 16/291 [00:01<00:25, 10.61it/s] Loading 0: 6%|▌ | 18/291 [00:01<00:23, 11.51it/s] Loading 0: 6%|▌ | 18/291 [00:01<00:23, 11.51it/s] Loading 0: 7%|▋ | 19/291 [00:01<00:23, 11.51it/s] Loading 0: 7%|▋ | 20/291 [00:01<00:23, 11.51it/s] Loading 0: 7%|▋ | 21/291 [00:01<00:22, 12.09it/s] Loading 0: 8%|▊ | 22/291 [00:02<00:22, 12.09it/s] Loading 0: 8%|▊ | 23/291 [00:02<00:35, 7.62it/s] Loading 0: 8%|▊ | 23/291 [00:02<00:35, 7.62it/s] Loading 0: 9%|▊ | 25/291 [00:02<00:34, 7.62it/s] Loading 0: 9%|▉ | 27/291 [00:02<00:24, 10.73it/s] Loading 0: 9%|▉ | 27/291 [00:02<00:24, 10.73it/s] Loading 0: 10%|▉ | 29/291 [00:02<00:24, 10.73it/s] Loading 0: 10%|█ | 30/291 [00:02<00:27, 9.39it/s] Loading 0: 11%|█ | 31/291 [00:02<00:27, 9.39it/s] Loading 0: 11%|█ | 32/291 [00:02<00:24, 10.55it/s] Loading 0: 11%|█ | 32/291 [00:03<00:24, 10.55it/s] Loading 0: 11%|█▏ | 33/291 [00:03<00:24, 10.55it/s] Loading 0: 12%|█▏ | 34/291 [00:03<00:40, 6.29it/s] Loading 0: 12%|█▏ | 35/291 [00:05<00:40, 6.29it/s] Loading 0: 12%|█▏ | 36/291 [00:05<01:21, 3.13it/s] Loading 0: 12%|█▏ | 36/291 [00:05<01:21, 3.13it/s] Loading 0: 13%|█▎ | 37/291 [00:05<01:21, 3.13it/s] Loading 0: 13%|█▎ | 38/291 [00:05<01:20, 3.13it/s] Loading 0: 13%|█▎ | 39/291 [00:05<01:10, 3.58it/s] Loading 0: 14%|█▎ | 40/291 [00:06<01:22, 3.04it/s] Loading 0: 14%|█▎ | 40/291 [00:06<01:22, 3.04it/s] Loading 0: 14%|█▍ | 41/291 [00:06<01:33, 2.67it/s] Loading 0: 14%|█▍ | 41/291 [00:06<01:33, 2.67it/s] Loading 0: 15%|█▍ | 43/291 [00:07<01:32, 2.67it/s] Loading 0: 15%|█▌ | 44/291 [00:07<00:58, 4.25it/s] Loading 0: 15%|█▌ | 45/291 [00:07<00:57, 4.25it/s] Loading 0: 16%|█▌ | 46/291 [00:07<00:57, 4.25it/s] Loading 0: 16%|█▌ | 47/291 [00:07<00:57, 4.25it/s] Loading 0: 16%|█▋ | 48/291 [00:07<00:39, 6.09it/s] Loading 0: 17%|█▋ | 49/291 [00:08<00:39, 6.09it/s] Loading 0: 17%|█▋ | 50/291 [00:08<00:48, 4.95it/s] Loading 0: 17%|█▋ | 50/291 [00:08<00:48, 4.95it/s] Loading 0: 18%|█▊ | 52/291 [00:08<00:48, 4.95it/s] Loading 0: 19%|█▊ | 54/291 [00:08<00:47, 4.95it/s] Loading 0: 19%|█▉ | 55/291 [00:08<00:47, 4.95it/s] Loading 0: 19%|█▉ | 56/291 [00:08<00:47, 4.95it/s] Loading 0: 20%|█▉ | 57/291 [00:08<00:24, 9.71it/s] Loading 0: 20%|█▉ | 58/291 [00:08<00:23, 9.71it/s] Loading 0: 20%|██ | 59/291 [00:08<00:23, 9.71it/s] Loading 0: 21%|██ | 61/291 [00:08<00:23, 9.71it/s] Loading 0: 22%|██▏ | 63/291 [00:08<00:15, 14.62it/s] Loading 0: 22%|██▏ | 63/291 [00:08<00:15, 14.62it/s] Loading 0: 22%|██▏ | 64/291 [00:08<00:15, 14.62it/s] Loading 0: 22%|██▏ | 65/291 [00:08<00:15, 14.62it/s] Loading 0: 23%|██▎ | 67/291 [00:08<00:15, 14.62it/s] Loading 0: 23%|██▎ | 68/291 [00:08<00:12, 18.39it/s] Loading 0: 23%|██▎ | 68/291 [00:08<00:12, 18.39it/s] Loading 0: 24%|██▍ | 70/291 [00:08<00:12, 18.39it/s] Loading 0: 25%|██▍ | 72/291 [00:08<00:11, 18.39it/s] Loading 0: 25%|██▌ | 74/291 [00:08<00:11, 18.39it/s] Loading 0: 26%|██▌ | 75/291 [00:08<00:08, 25.41it/s] Loading 0: 26%|██▌ | 76/291 [00:08<00:08, 25.41it/s] Loading 0: 27%|██▋ | 78/291 [00:08<00:08, 25.41it/s] Loading 0: 27%|██▋ | 79/291 [00:08<00:08, 25.41it/s] Loading 0: 27%|██▋ | 80/291 [00:08<00:11, 19.13it/s] Loading 0: 27%|██▋ | 80/291 [00:08<00:11, 19.13it/s] Loading 0: 28%|██▊ | 82/291 [00:09<00:10, 19.13it/s] Loading 0: 29%|██▊ | 83/291 [00:09<00:10, 19.13it/s] Loading 0: 29%|██▉ | 84/291 [00:09<00:09, 21.12it/s] Loading 0: 29%|██▉ | 85/291 [00:09<00:09, 21.12it/s] Loading 0: 30%|██▉ | 87/291 [00:09<00:09, 21.12it/s] Loading 0: 30%|███ | 88/291 [00:09<00:09, 21.12it/s] Loading 0: 31%|███ | 89/291 [00:09<00:09, 21.12it/s] Loading 0: 31%|███▏ | 91/291 [00:09<00:07, 28.37it/s] Loading 0: 31%|███▏ | 91/291 [00:09<00:07, 28.37it/s] Loading 0: 32%|███▏ | 92/291 [00:09<00:07, 28.37it/s] Loading 0: 32%|███▏ | 94/291 [00:09<00:06, 28.37it/s] Loading 0: 33%|███▎ | 96/291 [00:09<00:06, 28.37it/s] Loading 0: 33%|███▎ | 97/291 [00:09<00:06, 28.37it/s] Loading 0: 34%|███▎ | 98/291 [00:09<00:06, 28.37it/s] Loading 0: 34%|███▍ | 99/291 [00:09<00:05, 36.98it/s] Loading 0: 34%|███▍ | 100/291 [00:09<00:05, 36.98it/s] Loading 0: 35%|███▍ | 101/291 [00:09<00:05, 36.98it/s] Loading 0: 35%|███▌ | 103/291 [00:09<00:05, 36.98it/s] Loading 0: 36%|███▌ | 105/291 [00:09<00:04, 40.38it/s] Loading 0: 36%|███▌ | 105/291 [00:09<00:04, 40.38it/s] Loading 0: 36%|███▋ | 106/291 [00:09<00:04, 40.38it/s] Loading 0: 37%|███▋ | 107/291 [00:09<00:04, 40.38it/s] Loading 0: 37%|███▋ | 109/291 [00:09<00:04, 40.38it/s] Loading 0: 38%|███▊ | 110/291 [00:09<00:04, 40.38it/s] Loading 0: 38%|███▊ | 111/291 [00:09<00:04, 42.66it/s] Loading 0: 38%|███▊ | 112/291 [00:09<00:04, 42.66it/s] Loading 0: 39%|███▉ | 114/291 [00:09<00:04, 42.66it/s] Loading 0: 40%|████ | 117/291 [00:09<00:04, 42.66it/s] Loading 0: 41%|████ | 119/291 [00:09<00:04, 42.66it/s] Loading 0: 41%|████ | 120/291 [00:09<00:03, 53.07it/s] Loading 0: 41%|████ | 120/291 [00:09<00:03, 53.07it/s] Loading 0: 42%|████▏ | 121/291 [00:09<00:03, 53.07it/s] Loading 0: 42%|████▏ | 122/291 [00:09<00:03, 53.07it/s] Loading 0: 42%|████▏ | 123/291 [00:09<00:03, 53.07it/s] Loading 0: 43%|████▎ | 124/291 [00:09<00:03, 53.07it/s] Loading 0: 43%|████▎ | 126/291 [00:09<00:03, 53.07it/s] Loading 0: 44%|████▎ | 127/291 [00:09<00:03, 47.70it/s] Loading 0: 44%|████▎ | 127/291 [00:09<00:03, 47.70it/s] Loading 0: 44%|████▍ | 129/291 [00:09<00:03, 47.70it/s] Loading 0: 45%|████▌ | 131/291 [00:09<00:03, 47.70it/s] Loading 0: 45%|████▌ | 132/291 [00:10<00:03, 47.70it/s] Loading 0: 46%|████▌ | 133/291 [00:10<00:05, 27.90it/s] Loading 0: 46%|████▌ | 133/291 [00:10<00:05, 27.90it/s] Loading 0: 46%|████▌ | 134/291 [00:10<00:05, 27.90it/s] Loading 0: 46%|████▋ | 135/291 [00:10<00:05, 27.90it/s] Loading 0: 47%|████▋ | 136/291 [00:10<00:05, 27.90it/s] Loading 0: 47%|████▋ | 137/291 [00:10<00:05, 27.90it/s] Loading 0: 47%|████▋ | 138/291 [00:10<00:05, 29.62it/s] Loading 0: 48%|████▊ | 139/291 [00:10<00:05, 29.62it/s] Loading 0: 48%|████▊ | 140/291 [00:10<00:05, 29.62it/s] Loading 0: 49%|████▉ | 142/291 [00:10<00:05, 29.62it/s] Loading 0: 49%|████▉ | 144/291 [00:10<00:04, 34.09it/s] Loading 0: 49%|████▉ | 144/291 [00:10<00:04, 34.09it/s] Loading 0: 50%|████▉ | 145/291 [00:10<00:04, 34.09it/s] Loading 0: 50%|█████ | 146/291 [00:10<00:04, 34.09it/s] Loading 0: 51%|█████ | 148/291 [00:10<00:04, 34.09it/s] Loading 0: 51%|█████ | 149/291 [00:10<00:04, 35.38it/s] Loading 0: 51%|█████ | 149/291 [00:10<00:04, 35.38it/s] Loading 0: 52%|█████▏ | 151/291 [00:10<00:03, 35.38it/s] Loading 0: 53%|█████▎ | 153/291 [00:10<00:03, 35.38it/s] Loading 0: 53%|█████▎ | 154/291 [00:10<00:03, 35.38it/s] Loading 0: 53%|█████▎ | 155/291 [00:10<00:03, 35.38it/s] Loading 0: 54%|█████▍ | 157/291 [00:10<00:03, 43.84it/s] Loading 0: 54%|█████▍ | 157/291 [00:10<00:03, 43.84it/s] Loading 0: 54%|█████▍ | 158/291 [00:10<00:03, 43.84it/s] Loading 0: 55%|█████▍ | 160/291 [00:10<00:02, 43.84it/s] Loading 0: 56%|█████▌ | 162/291 [00:10<00:02, 43.84it/s] Loading 0: 56%|█████▌ | 163/291 [00:10<00:02, 43.84it/s] Loading 0: 56%|█████▋ | 164/291 [00:10<00:02, 43.84it/s] model.layers.22.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.23.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.23.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.23.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.23.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.23.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.23.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.24.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.24.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.24.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.24.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.24.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.24.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.25.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.25.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.25.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.25.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.25.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.25.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.26.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.26.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.26.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.26.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.26.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.26.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.27.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.27.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.27.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.27.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.27.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.27.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.28.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.28.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.28.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.28.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.28.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.28.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.29.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.29.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.29.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.29.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.29.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.29.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.30.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.30.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.30.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.30.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.30.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.30.self_attn.qkv_proj.weight torch.Size([5111808])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.31.self_attn.o_proj.weight torch.Size([3407872])
jellywibble-lora-120k-p-2801-v11-mkmlizer: lm_head.weight torch.Size([139542528])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.31.input_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.31.mlp.down_proj.weight torch.Size([11927552])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.31.mlp.up_gate_proj.weight torch.Size([23855104])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.layers.31.post_attention_layernorm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: model.norm.weight torch.Size([4096])
jellywibble-lora-120k-p-2801-v11-mkmlizer: Loading 0: 57%|█████▋ | 165/291 [00:10<00:02, 50.20it/s] Loading 0: 57%|█████▋ | 166/291 [00:10<00:02, 50.20it/s] Loading 0: 57%|█████▋ | 167/291 [00:10<00:02, 50.20it/s] Loading 0: 58%|█████▊ | 169/291 [00:10<00:02, 50.20it/s] Loading 0: 59%|█████▉ | 171/291 [00:10<00:02, 49.88it/s] Loading 0: 59%|█████▉ | 171/291 [00:10<00:02, 49.88it/s] Loading 0: 59%|█████▉ | 172/291 [00:10<00:02, 49.88it/s] Loading 0: 59%|█████▉ | 173/291 [00:11<00:02, 49.88it/s] Loading 0: 60%|██████ | 175/291 [00:11<00:02, 49.88it/s] Loading 0: 60%|██████ | 176/291 [00:11<00:02, 49.88it/s] Loading 0: 61%|██████ | 177/291 [00:11<00:02, 46.24it/s] Loading 0: 61%|██████ | 178/291 [00:11<00:02, 46.24it/s] Loading 0: 62%|██████▏ | 180/291 [00:11<00:02, 46.24it/s] Loading 0: 63%|██████▎ | 182/291 [00:11<00:02, 46.24it/s] Loading 0: 63%|██████▎ | 184/291 [00:11<00:02, 46.24it/s] Loading 0: 64%|██████▎ | 185/291 [00:11<00:02, 46.24it/s] Loading 0: 64%|██████▍ | 186/291 [00:11<00:03, 28.90it/s] Loading 0: 64%|██████▍ | 186/291 [00:11<00:03, 28.90it/s] Loading 0: 65%|██████▍ | 188/291 [00:11<00:03, 28.90it/s] Loading 0: 65%|██████▍ | 189/291 [00:11<00:03, 28.90it/s] Loading 0: 65%|██████▌ | 190/291 [00:11<00:03, 28.90it/s] Loading 0: 66%|██████▌ | 191/291 [00:11<00:03, 31.19it/s] Loading 0: 66%|██████▌ | 191/291 [00:11<00:03, 31.19it/s] Loading 0: 66%|██████▋ | 193/291 [00:11<00:03, 31.19it/s] Loading 0: 67%|██████▋ | 194/291 [00:11<00:03, 31.19it/s] Loading 0: 67%|██████▋ | 196/291 [00:11<00:02, 33.09it/s] Loading 0: 67%|██████▋ | 196/291 [00:11<00:02, 33.09it/s] Loading 0: 68%|██████▊ | 198/291 [00:11<00:02, 33.09it/s] Loading 0: 68%|██████▊ | 199/291 [00:11<00:02, 33.09it/s] Loading 0: 69%|██████▊ | 200/291 [00:11<00:02, 33.09it/s] Loading 0: 69%|██████▉ | 202/291 [00:11<00:02, 37.73it/s] Loading 0: 69%|██████▉ | 202/291 [00:12<00:02, 37.73it/s] Loading 0: 70%|██████▉ | 203/291 [00:12<00:02, 37.73it/s] Loading 0: 70%|███████ | 205/291 [00:12<00:02, 37.73it/s] Loading 0: 71%|███████ | 207/291 [00:12<00:02, 37.73it/s] Loading 0: 71%|███████▏ | 208/291 [00:12<00:02, 37.73it/s] Loading 0: 72%|███████▏ | 209/291 [00:12<00:02, 37.73it/s] Loading 0: 72%|███████▏ | 210/291 [00:12<00:01, 45.02it/s] Loading 0: 73%|███████▎ | 211/291 [00:12<00:01, 45.02it/s] Loading 0: 73%|███████▎ | 212/291 [00:12<00:01, 45.02it/s] Loading 0: 74%|███████▎ | 214/291 [00:12<00:01, 45.02it/s] Loading 0: 74%|███████▍ | 216/291 [00:12<00:01, 45.35it/s] Loading 0: 74%|███████▍ | 216/291 [00:12<00:01, 45.35it/s] Loading 0: 75%|███████▍ | 217/291 [00:12<00:01, 45.35it/s] Loading 0: 75%|███████▍ | 218/291 [00:12<00:01, 45.35it/s] Loading 0: 76%|███████▌ | 220/291 [00:12<00:01, 45.35it/s] Loading 0: 76%|███████▌ | 221/291 [00:12<00:01, 45.35it/s] Loading 0: 76%|███████▋ | 222/291 [00:12<00:01, 45.17it/s] Loading 0: 77%|███████▋ | 223/291 [00:12<00:01, 45.17it/s] Loading 0: 77%|███████▋ | 225/291 [00:12<00:01, 45.17it/s] Loading 0: 78%|███████▊ | 227/291 [00:12<00:01, 45.17it/s] Loading 0: 78%|███████▊ | 228/291 [00:12<00:01, 46.90it/s] Loading 0: 79%|███████▊ | 229/291 [00:12<00:01, 46.90it/s] Loading 0: 79%|███████▉ | 231/291 [00:12<00:01, 46.90it/s] Loading 0: 80%|███████▉ | 232/291 [00:12<00:01, 46.90it/s] Loading 0: 80%|████████ | 233/291 [00:12<00:02, 25.72it/s] Loading 0: 80%|████████ | 233/291 [00:12<00:02, 25.72it/s] Loading 0: 80%|████████ | 234/291 [00:12<00:02, 25.72it/s] Loading 0: 81%|████████ | 235/291 [00:12<00:02, 25.72it/s] Loading 0: 81%|████████ | 236/291 [00:13<00:02, 25.72it/s] Loading 0: 82%|████████▏ | 238/291 [00:13<00:01, 28.66it/s] Loading 0: 82%|████████▏ | 238/291 [00:13<00:01, 28.66it/s] Loading 0: 82%|████████▏ | 239/291 [00:13<00:01, 28.66it/s] Loading 0: 83%|████████▎ | 241/291 [00:13<00:01, 28.66it/s] Loading 0: 84%|████████▎ | 243/291 [00:13<00:01, 28.66it/s] Loading 0: 84%|████████▍ | 244/291 [00:13<00:01, 28.66it/s] Loading 0: 84%|████████▍ | 245/291 [00:13<00:01, 28.66it/s] Loading 0: 85%|████████▍ | 246/291 [00:13<00:01, 36.75it/s] Loading 0: 85%|████████▍ | 247/291 [00:13<00:01, 36.75it/s] Loading 0: 85%|████████▌ | 248/291 [00:13<00:01, 36.75it/s] Loading 0: 86%|████████▌ | 250/291 [00:13<00:01, 36.75it/s] Loading 0: 86%|████████▋ | 251/291 [00:13<00:01, 39.33it/s] Loading 0: 87%|████████▋ | 252/291 [00:13<00:00, 39.33it/s] Loading 0: 87%|████████▋ | 253/291 [00:13<00:00, 39.33it/s] Loading 0: 87%|████████▋ | 254/291 [00:13<00:00, 39.33it/s] Loading 0: 88%|████████▊ | 256/291 [00:13<00:00, 39.33it/s] Loading 0: 88%|████████▊ | 257/291 [00:13<00:00, 40.46it/s] Loading 0: 88%|████████▊ | 257/291 [00:13<00:00, 40.46it/s] Loading 0: 89%|████████▉ | 259/291 [00:13<00:00, 40.46it/s] Loading 0: 90%|████████▉ | 261/291 [00:13<00:00, 40.46it/s] Loading 0: 90%|█████████ | 262/291 [00:13<00:00, 40.46it/s] Loading 0: 90%|█████████ | 263/291 [00:13<00:00, 40.46it/s] Loading 0: 91%|█████████ | 265/291 [00:13<00:00, 48.04it/s] Loading 0: 91%|█████████ | 265/291 [00:13<00:00, 48.04it/s] Loading 0: 91%|█████████▏| 266/291 [00:13<00:00, 48.04it/s] Loading 0: 92%|█████████▏| 268/291 [00:13<00:00, 48.04it/s] Loading 0: 93%|█████████▎| 270/291 [00:13<00:00, 48.04it/s] Loading 0: 93%|█████████▎| 271/291 [00:13<00:00, 48.04it/s] Loading 0: 93%|█████████▎| 272/291 [00:13<00:00, 48.04it/s] Loading 0: 94%|█████████▍| 273/291 [00:13<00:00, 52.70it/s] Loading 0: 94%|█████████▍| 274/291 [00:13<00:00, 52.70it/s] Loading 0: 95%|█████████▍| 275/291 [00:13<00:00, 52.70it/s] Loading 0: 95%|█████████▌| 277/291 [00:13<00:00, 52.70it/s] Loading 0: 96%|█████████▌| 279/291 [00:13<00:00, 50.91it/s] Loading 0: 96%|█████████▌| 279/291 [00:13<00:00, 50.91it/s] Loading 0: 97%|█████████▋| 282/291 [00:13<00:00, 50.91it/s] Loading 0: 98%|█████████▊| 284/291 [00:13<00:00, 50.91it/s] Loading 0: 98%|█████████▊| 285/291 [00:21<00:00, 50.91it/s] Loading 0: 98%|█████████▊| 286/291 [00:21<00:01, 2.83it/s] Loading 0: 98%|█████████▊| 286/291 [00:21<00:01, 2.83it/s] Loading 0: 99%|█████████▊| 287/291 [00:21<00:01, 2.83it/s] Loading 0: 99%|█████████▉| 288/291 [00:21<00:01, 2.83it/s] Loading 0: 99%|█████████▉| 289/291 [00:21<00:00, 2.83it/s] Loading 0: 100%|█████████▉| 290/291 [00:21<00:00, 2.83it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
jellywibble-lora-120k-p-2801-v11-mkmlizer: quantized model in 43.948s
jellywibble-lora-120k-p-2801-v11-mkmlizer: Processed model Jellywibble/lora_120k_pref_data_ep3_stacked_elo_alignment in 99.013s
jellywibble-lora-120k-p-2801-v11-mkmlizer: creating bucket guanaco-mkml-models
jellywibble-lora-120k-p-2801-v11-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
jellywibble-lora-120k-p-2801-v11-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/jellywibble-lora-120k-p-2801-v11
jellywibble-lora-120k-p-2801-v11-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/jellywibble-lora-120k-p-2801-v11/special_tokens_map.json
jellywibble-lora-120k-p-2801-v11-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/jellywibble-lora-120k-p-2801-v11/config.json
jellywibble-lora-120k-p-2801-v11-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/jellywibble-lora-120k-p-2801-v11/tokenizer_config.json
jellywibble-lora-120k-p-2801-v11-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/jellywibble-lora-120k-p-2801-v11/tokenizer.json
jellywibble-lora-120k-p-2801-v11-mkmlizer: loading reward model from ChaiML/gpt2_xl_pairwise_89m_step_347634
jellywibble-lora-120k-p-2801-v11-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py:950: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
jellywibble-lora-120k-p-2801-v11-mkmlizer: warnings.warn(
jellywibble-lora-120k-p-2801-v11-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:778: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
jellywibble-lora-120k-p-2801-v11-mkmlizer: warnings.warn(
jellywibble-lora-120k-p-2801-v11-mkmlizer: /opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py:469: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
jellywibble-lora-120k-p-2801-v11-mkmlizer: warnings.warn(
jellywibble-lora-120k-p-2801-v11-mkmlizer: Downloading shards: 0%| | 0/2 [00:00<?, ?it/s] Downloading shards: 50%|█████ | 1/2 [00:08<00:08, 8.39s/it] Downloading shards: 100%|██████████| 2/2 [00:10<00:00, 4.94s/it] Downloading shards: 100%|██████████| 2/2 [00:10<00:00, 5.46s/it]
jellywibble-lora-120k-p-2801-v11-mkmlizer: Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] Loading checkpoint shards: 50%|█████ | 1/2 [00:00<00:00, 1.30it/s] Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 2.20it/s] Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.99it/s]
jellywibble-lora-120k-p-2801-v11-mkmlizer: Saving model to /tmp/reward_cache/reward.tensors
jellywibble-lora-120k-p-2801-v11-mkmlizer: Saving duration: 3.105s
jellywibble-lora-120k-p-2801-v11-mkmlizer: Processed model ChaiML/gpt2_xl_pairwise_89m_step_347634 in 16.678s
jellywibble-lora-120k-p-2801-v11-mkmlizer: cp /tmp/reward_cache/reward.tensors s3://guanaco-reward-models/jellywibble-lora-120k-p-2801-v11_reward/reward.tensors
Job jellywibble-lora-120k-p-2801-v11-mkmlizer completed after 213.04s with status: succeeded
Stopping job with name jellywibble-lora-120k-p-2801-v11-mkmlizer
Pipeline stage MKMLizer completed in 213.99s
Running pipeline stage MKMLKubeTemplater
Pipeline stage MKMLKubeTemplater completed in 0.14s
Running pipeline stage ISVCDeployer
Creating inference service jellywibble-lora-120k-p-2801-v11
Waiting for inference service jellywibble-lora-120k-p-2801-v11 to be ready
Inference service jellywibble-lora-120k-p-2801-v11 ready after 50.2209255695343s
Pipeline stage ISVCDeployer completed in 56.88s
Running pipeline stage StressChecker
Received healthy response to inference request in 2.23925518989563s
Received healthy response to inference request in 1.1102869510650635s
Received healthy response to inference request in 1.1102674007415771s
Received healthy response to inference request in 1.1016948223114014s
Received healthy response to inference request in 1.1020264625549316s
5 requests
0 failed requests
5th percentile: 1.1017611503601075
10th percentile: 1.1018274784088136
20th percentile: 1.1019601345062255
30th percentile: 1.1036746501922607
40th percentile: 1.106971025466919
50th percentile: 1.1102674007415771
60th percentile: 1.1102752208709716
70th percentile: 1.1102830410003661
80th percentile: 1.336080598831177
90th percentile: 1.7876678943634035
95th percentile: 2.0134615421295163
99th percentile: 2.1940964603424074
mean time: 1.3327061653137207
Pipeline stage StressChecker completed in 7.78s
jellywibble-lora-120k-p_2801_v11 status is now deployed due to DeploymentManager action
jellywibble-lora-120k-p_2801_v11 status is now inactive due to auto deactivation removed underperforming models
admin requested tearing down of jellywibble-lora-120k-p_2801_v11
Running pipeline stage ISVCDeleter
Checking if service jellywibble-lora-120k-p-2801-v11 is running
Skipping teardown as no inference service was found
Pipeline stage ISVCDeleter completed in 5.22s
Running pipeline stage MKMLModelDeleter
Cleaning model data from S3
Cleaning model data from model cache
Deleting key jellywibble-lora-120k-p-2801-v11/config.json from bucket guanaco-mkml-models
Deleting key jellywibble-lora-120k-p-2801-v11/flywheel_model.0.safetensors from bucket guanaco-mkml-models
Deleting key jellywibble-lora-120k-p-2801-v11/special_tokens_map.json from bucket guanaco-mkml-models
Deleting key jellywibble-lora-120k-p-2801-v11/tokenizer.json from bucket guanaco-mkml-models
Deleting key jellywibble-lora-120k-p-2801-v11/tokenizer_config.json from bucket guanaco-mkml-models
Cleaning model data from model cache
Deleting key jellywibble-lora-120k-p-2801-v11_reward/config.json from bucket guanaco-reward-models
Deleting key jellywibble-lora-120k-p-2801-v11_reward/merges.txt from bucket guanaco-reward-models
Deleting key jellywibble-lora-120k-p-2801-v11_reward/reward.tensors from bucket guanaco-reward-models
Deleting key jellywibble-lora-120k-p-2801-v11_reward/special_tokens_map.json from bucket guanaco-reward-models
Deleting key jellywibble-lora-120k-p-2801-v11_reward/tokenizer.json from bucket guanaco-reward-models
Deleting key jellywibble-lora-120k-p-2801-v11_reward/tokenizer_config.json from bucket guanaco-reward-models
Deleting key jellywibble-lora-120k-p-2801-v11_reward/vocab.json from bucket guanaco-reward-models
Pipeline stage MKMLModelDeleter completed in 6.90s
jellywibble-lora-120k-p_2801_v11 status is now torndown due to DeploymentManager action