qwen-qwen2-5-14b-instruct-1m

developer_uid: azuruce

submission_id: qwen-qwen2-5-14b-instruct-1m_v6

model_name: nis-best-config

model_group: Qwen/Qwen2.5-14B-Instruc

status: torndown

timestamp: 2025-04-11T19:35:51+00:00

num_battles: 6998

num_wins: 3087

celo_rating: 1230.1

family_friendly_score: 0.7792

family_friendly_standard_error: 0.005865958745166897

submission_type: basic

model_repo: Qwen/Qwen2.5-14B-Instruct-1M

model_architecture: Qwen2ForCausalLM

model_num_parameters: 14769689600.0

best_of: 8

max_input_tokens: 1024

max_output_tokens: 64

reward_model: default

latencies: [{'batch_size': 1, 'throughput': 0.4743663422495993, 'latency_mean': 2.108000874519348, 'latency_p50': 2.101341724395752, 'latency_p90': 2.3259138822555543}, {'batch_size': 3, 'throughput': 0.8692029243847861, 'latency_mean': 3.4445513665676115, 'latency_p50': 3.452815890312195, 'latency_p90': 3.7490571975708007}, {'batch_size': 5, 'throughput': 1.068200672985902, 'latency_mean': 4.66530706524849, 'latency_p50': 4.6324944496154785, 'latency_p90': 5.177451753616333}, {'batch_size': 6, 'throughput': 1.1181427360558769, 'latency_mean': 5.346534450054168, 'latency_p50': 5.347013235092163, 'latency_p90': 5.886440968513488}, {'batch_size': 10, 'throughput': 1.2039400235157165, 'latency_mean': 8.247620249986648, 'latency_p50': 8.195510983467102, 'latency_p90': 9.343454146385191}]

gpu_counts: {'NVIDIA RTX A5000': 1}

display_name: nis-best-config

ineligible_reason: num_battles<10000

is_internal_developer: False

language_model: Qwen/Qwen2.5-14B-Instruct-1M

model_size: 15B

ranking_group: single

throughput_3p7s: 0.92

us_pacific_date: 2025-04-11

win_ratio: 0.44112603601028866

generation_params: {'temperature': 0.75, 'top_p': 0.96, 'min_p': 0.02, 'top_k': 60, 'presence_penalty': 0.2, 'frequency_penalty': 0.2, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}

formatter: {'memory_template': '', 'prompt_template': '', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{user_name}: {message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': False}

Resubmit model

Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer
Waiting for job on qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer to finish
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: Downloaded to shared memory in 42.368s
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmptkrgjxfj, device:0
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: Saving flywheel model at /dev/shm/model_cache
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: quantized model in 38.119s
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: Processed model Qwen/Qwen2.5-14B-Instruct-1M in 80.488s
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: creating bucket guanaco-mkml-models
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6/tokenizer_config.json
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: cp /dev/shm/model_cache/merges.txt s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6/merges.txt
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: cp /dev/shm/model_cache/vocab.json s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6/vocab.json
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6/tokenizer.json
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: cp /dev/shm/model_cache/flywheel_model.1.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6/flywheel_model.1.safetensors
qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/qwen-qwen2-5-14b-instruct-1m-v6/flywheel_model.0.safetensors
Job qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer completed after 114.73s with status: succeeded
Stopping job with name qwen-qwen2-5-14b-instruct-1m-v6-mkmlizer
Pipeline stage MKMLizer completed in 115.17s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service qwen-qwen2-5-14b-instruct-1m-v6
Waiting for inference service qwen-qwen2-5-14b-instruct-1m-v6 to be ready
Failed to get response for submission jellywibble-tyler-james_92283_v1: HTTPConnectionPool(host='jellywibble-tyler-james-92283-v1-predictor.tenant-chaiml-guanaco.k.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Inference service qwen-qwen2-5-14b-instruct-1m-v6 ready after 110.84032917022705s
Pipeline stage MKMLDeployer completed in 111.21s
run pipeline stage %s
Running pipeline stage StressChecker
Failed to get response for submission jellywibble-tyler-james_92283_v1: HTTPConnectionPool(host='jellywibble-tyler-james-92283-v1-predictor.tenant-chaiml-guanaco.k.chaiverse.com', port=80): Read timed out. (read timeout=12.0)
Received healthy response to inference request in 2.420598030090332s
Received healthy response to inference request in 1.8213560581207275s
Received healthy response to inference request in 1.1934974193572998s
Received healthy response to inference request in 1.3280742168426514s
Received healthy response to inference request in 1.3398830890655518s
5 requests
0 failed requests
5th percentile: 1.2204127788543702
10th percentile: 1.2473281383514405
20th percentile: 1.301158857345581
30th percentile: 1.3304359912872314
40th percentile: 1.3351595401763916
50th percentile: 1.3398830890655518
60th percentile: 1.532472276687622
70th percentile: 1.7250614643096922
80th percentile: 1.9412044525146486
90th percentile: 2.1809012413024904
95th percentile: 2.300749635696411
99th percentile: 2.396628351211548
mean time: 1.6206817626953125
Pipeline stage StressChecker completed in 9.37s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.65s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 0.66s
Shutdown handler de-registered
qwen-qwen2-5-14b-instruct-1m_v6 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.10s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.10s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service qwen-qwen2-5-14b-instruct-1m-v6-profiler
Waiting for inference service qwen-qwen2-5-14b-instruct-1m-v6-profiler to be ready
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyScorer
Evaluating %s Family Friendly Score with %s threads
Pipeline stage OfflineFamilyFriendlyScorer completed in 2719.12s
Shutdown handler de-registered
qwen-qwen2-5-14b-instruct-1m_v6 status is now inactive due to auto deactivation removed underperforming models
qwen-qwen2-5-14b-instruct-1m_v6 status is now torndown due to DeploymentManager action