developer_uid: zonemercy
submission_id: chaiml-pony-d3b-mv1-top2_9386_v1
model_name: chaiml-pony-d3b-mv1-top2_9386_v1
model_group: ChaiML/pony-d3b-mv1-top2
status: torndown
timestamp: 2026-03-30T08:27:24+00:00
num_battles: 10364
num_wins: 5527
celo_rating: 1317.99
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: basic
model_repo: ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8
model_architecture: Qwen3_5MoeForConditionalGeneration
model_num_parameters: 33753909248.0
best_of: 16
max_input_tokens: 2048
max_output_tokens: 80
reward_model: default
display_name: chaiml-pony-d3b-mv1-top2_9386_v1
ineligible_reason: max_output_tokens!=64
is_internal_developer: True
language_model: ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8
model_size: 34B
ranking_group: single
us_pacific_date: 2026-03-27
win_ratio: 0.5332883056734852
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.8, 'frequency_penalty': 0.0, 'stopping_words': ['####', '<|assistant|>', '</s>', '<|user|>', '<|im_end|>'], 'max_input_tokens': 2048, 'best_of': 16, 'max_output_tokens': 80}
formatter: {'memory_template': "<|im_start|>system\n{bot_name}'s persona: {memory}<|im_end|>\n", 'prompt_template': '', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-d3b-mv1-top2-9386-v1-uploader
Waiting for job on chaiml-pony-d3b-mv1-top2-9386-v1-uploader to finish
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Using quantization_mode: fp8
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Checking if ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Downloading snapshot of ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8...
2026-03-27T06:49:51.404538+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Downloaded in 31.701s
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Loading /tmp/model_input...
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Applying quantization...
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:18.909900+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:21.246822+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:21.249585+0000 | moe_calibration_context | INFO - Found 40 MoE modules to replace
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:35.094548+0000 | moe_calibration_context | INFO - Replaced 40 MoE modules for calibration
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:35.094809+0000 | moe_calibration_context | INFO - 40/40 modules will remain in calibration form (permanent)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:35.094923+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:38.317698+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:38.318227+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:50:38.597064+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
2026-03-27T06:50:51.491493+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:51:07.053367+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: 2026-03-27T06:51:07.053621+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: warnings.warn(
2026-03-27T06:51:51.583967+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Pushing to ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8-FP8
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Checking if ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Creating repo ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: - model.safetensors: 37.7GB
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: ---------- 2026-03-27 06:52:01 (0:00:00) ----------
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Files: hashed 5/7 (32.5K/37.7G) | pre-uploaded: 0/0 (0.0/37.7G) (+7 unsure) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: ---------------------------------------------------
2026-03-27T06:52:51.676382+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
chaiml-pony-d3b-mv1-top2-9386-v1-uploader:       
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: ---------- 2026-03-27 06:53:01 (0:01:00) ----------
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Files: hashed 7/7 (37.7G/37.7G) | pre-uploaded: 1/2 (20.0M/37.7G) | committed: 0/7 (0.0/37.7G) | ignored: 0
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: ---------------------------------------------------
2026-03-27T06:53:51.787108+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Processed model ChaiML/pony-d3b-mv1-top2-q35b-lr5e6ep2g8 in 269.100s
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/chat_template.jinja
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/generation_config.json
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/tokenizer_config.json
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/recipe.yaml
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/config.json
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/tokenizer.json
2026-03-27T06:54:51.891018+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
chaiml-pony-d3b-mv1-top2-9386-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-d3b-mv1-top2-9386-v1/default/model.safetensors
Job chaiml-pony-d3b-mv1-top2-9386-v1-uploader completed after 386.58s with status: succeeded
Stopping job with name chaiml-pony-d3b-mv1-top2-9386-v1-uploader
Pipeline stage VLLMUploader completed in 387.01s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.10s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 2.84s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-d3b-mv1-top2-9386-v1
Waiting for inference service chaiml-pony-d3b-mv1-top2-9386-v1 to be ready
2026-03-27T06:55:51.996695+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
2026-03-27T06:56:52.090109+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
2026-03-27T06:57:52.179854+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
Inference service chaiml-pony-d3b-mv1-top2-9386-v1 ready after 200.47467350959778s
Pipeline stage VLLMDeployer completed in 200.90s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-27T06:58:52.271736+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-27T06:59:52.364827+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
Received healthy response to inference request in 11.853856325149536s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 3.338172674179077s
Received healthy response to inference request in 6.383633852005005s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-27T07:00:52.454335+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 5.659064054489136s
Received healthy response to inference request in 1.2956857681274414s
Received healthy response to inference request in 1.3082308769226074s
Received healthy response to inference request in 1.3623981475830078s
Received healthy response to inference request in 1.4939136505126953s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 1.2164011001586914s
Received healthy response to inference request in 1.3395278453826904s
Received healthy response to inference request in 1.3303279876708984s
Received healthy response to inference request in 3.748291015625s
Received healthy response to inference request in 1.5568561553955078s
Received healthy response to inference request in 1.381500005722046s
Received healthy response to inference request in 1.3848938941955566s
Received healthy response to inference request in 1.4390332698822021s
Received healthy response to inference request in 1.4100499153137207s
2026-03-27T07:01:52.546162+00:00 monitor updated for chaiml-pony-d3b-mv1-top2_9386_v1
Received healthy response to inference request in 1.3784291744232178s
Received healthy response to inference request in 1.7335846424102783s
Received healthy response to inference request in 1.3242485523223877s
Received healthy response to inference request in 1.2577855587005615s
Received healthy response to inference request in 1.4405977725982666s
Received healthy response to inference request in 1.6737220287322998s
30 requests
7 failed requests
5th percentile: 1.2748406529426575
10th percentile: 1.3069763660430909
20th percentile: 1.337687873840332
30th percentile: 1.3805787563323975
40th percentile: 1.4274399280548096
50th percentile: 1.5253849029541016
60th percentile: 2.3754198551177956
70th percentile: 5.876434993743894
80th percentile: 20.121099138259886
90th percentile: 20.130852794647218
95th percentile: 20.143041944503786
99th percentile: 20.15667011976242
mean time: 6.575281731287638
%s, retrying in %s seconds...
Received healthy response to inference request in 1.1660404205322266s
Received healthy response to inference request in 1.2710773944854736s
Received healthy response to inference request in 1.2232913970947266s
Received healthy response to inference request in 1.1988599300384521s
Received healthy response to inference request in 1.2871594429016113s
Received healthy response to inference request in 1.4296045303344727s
Received healthy response to inference request in 1.4683003425598145s
Received healthy response to inference request in 1.276752233505249s
Received healthy response to inference request in 1.245032548904419s
Received healthy response to inference request in 1.38377046585083s
Received healthy response to inference request in 1.247020959854126s
Received healthy response to inference request in 1.8597052097320557s
Received healthy response to inference request in 1.2814569473266602s
Received healthy response to inference request in 1.2459709644317627s
Received healthy response to inference request in 1.3294103145599365s
Received healthy response to inference request in 1.2252857685089111s
Received healthy response to inference request in 1.3209881782531738s
Received healthy response to inference request in 1.2557923793792725s
Received healthy response to inference request in 1.465860366821289s
Received healthy response to inference request in 1.3567771911621094s
Received healthy response to inference request in 1.2021803855895996s
Received healthy response to inference request in 1.8565900325775146s
Received healthy response to inference request in 1.2791407108306885s
Received healthy response to inference request in 1.4116497039794922s
Received healthy response to inference request in 1.2902755737304688s
Received healthy response to inference request in 1.4233198165893555s
Received healthy response to inference request in 1.314455270767212s
Received healthy response to inference request in 1.3655104637145996s
Received healthy response to inference request in 1.6358246803283691s
Received healthy response to inference request in 1.32667875289917s
30 requests
0 failed requests
5th percentile: 1.2003541350364686
10th percentile: 1.221180295944214
20th percentile: 1.245783281326294
30th percentile: 1.2664918899536133
40th percentile: 1.2805304527282715
50th percentile: 1.3023654222488403
60th percentile: 1.3277713775634765
70th percentile: 1.3709884643554686
80th percentile: 1.424576759338379
90th percentile: 1.48505277633667
95th percentile: 1.7572456240653986
99th percentile: 1.8588018083572388
mean time: 1.3547927459081015
Pipeline stage StressChecker completed in 243.37s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.78s
Shutdown handler de-registered
chaiml-pony-d3b-mv1-top2_9386_v1 status is now deployed due to DeploymentManager action
chaiml-pony-d3b-mv1-top2_9386_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-d3b-mv1-top2_9386_v1 status is now torndown due to DeploymentManager action