developer_uid: zonemercy
submission_id: chaiml-pony-v3a-q27b-lr_21575_v1
model_name: chaiml-pony-v3a-q27b-lr_21575_v1
model_group: ChaiML/pony-v3a-q27b-lr5
status: torndown
timestamp: 2026-03-31T16:51:11+00:00
num_battles: 10359
num_wins: 5584
celo_rating: 1322.7
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: basic
model_repo: ChaiML/pony-v3a-q27b-lr5e6ep2g8
model_architecture: Qwen3_5ForConditionalGeneration
model_num_parameters: 23564784640.0
best_of: 8
max_input_tokens: 2048
max_output_tokens: 80
reward_model: default
display_name: chaiml-pony-v3a-q27b-lr_21575_v1
ineligible_reason: max_output_tokens!=64
is_internal_developer: True
language_model: ChaiML/pony-v3a-q27b-lr5e6ep2g8
model_size: 24B
ranking_group: single
us_pacific_date: 2026-03-28
win_ratio: 0.5390481706728448
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'stopping_words': ['<|im_end|>', '<|assistant|>', '</s>', '####', '<|user|>'], 'max_input_tokens': 2048, 'best_of': 8, 'max_output_tokens': 80}
formatter: {'memory_template': "<|im_start|>system\n{bot_name}'s persona: {memory}<|im_end|>\n", 'prompt_template': '', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v3a-q27b-lr-21575-v1-uploader
Waiting for job on chaiml-pony-v3a-q27b-lr-21575-v1-uploader to finish
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Using quantization_mode: fp8
Failed to get response for submission chaiml-glm-47-bobo-v1-s_16089_v2: ('http://chaiml-glm-47-bobo-v1-s-16089-v2-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Checking if ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Downloading snapshot of ChaiML/pony-v3a-q27b-lr5e6ep2g8...
2026-03-28T13:08:27.519222+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Downloaded in 55.196s
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Loading /tmp/model_input...
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Applying quantization...
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:45.498393+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:47.918100+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:47.920077+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:47.970720+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:47.970988+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:47.983631+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:54.588377+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: 2026-03-28T13:08:54.588590+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Saving to /dev/shm/model_output...
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: warnings.warn(
2026-03-28T13:09:27.641082+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Cleaning quantization config in /dev/shm/model_output
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Pushing to ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Checking if ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8 already exists in ChaiML
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Creating repo ChaiML/pony-v3a-q27b-lr5e6ep2g8-FP8 and uploading /dev/shm/model_output to it
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Found 1 files larger than 20GB (recommended limit):
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: - model.safetensors: 35.9GB
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Large files may slow down loading and processing.
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: ---------- 2026-03-28 13:09:44 (0:00:00) ----------
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Files: hashed 5/7 (34.1K/35.9G) | pre-uploaded: 0/0 (0.0/35.9G) (+7 unsure) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Workers: hashing: 2 | get upload mode: 5 | pre-uploading: 0 | committing: 0 | waiting: 57
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: ---------------------------------------------------
2026-03-28T13:10:27.725324+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
chaiml-pony-v3a-q27b-lr-21575-v1-uploader:       
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: ---------- 2026-03-28 13:10:44 (0:01:00) ----------
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Files: hashed 7/7 (35.9G/35.9G) | pre-uploaded: 1/2 (20.0M/35.9G) | committed: 0/7 (0.0/35.9G) | ignored: 0
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Workers: hashing: 0 | get upload mode: 0 | pre-uploading: 1 | committing: 0 | waiting: 63
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: ---------------------------------------------------
Failed to get response for submission chaiml-gspo-glm47-cas72_44260_v1: ('http://chaiml-gspo-glm47-cas72-44260-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
Failed to get response for submission chaiml-glm-47-bobo-v1-s_16089_v2: ('http://chaiml-glm-47-bobo-v1-s-16089-v2-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
2026-03-28T13:11:27.822400+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Processed model ChaiML/pony-v3a-q27b-lr5e6ep2g8 in 218.799s
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: creating bucket guanaco-vllm-models
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/chat_template.jinja
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/config.json
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/recipe.yaml
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/generation_config.json
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/tokenizer_config.json
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/tokenizer.json
chaiml-pony-v3a-q27b-lr-21575-v1-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-v3a-q27b-lr-21575-v1/default/model.safetensors
2026-03-28T13:12:27.927669+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
Job chaiml-pony-v3a-q27b-lr-21575-v1-uploader completed after 306.85s with status: succeeded
Stopping job with name chaiml-pony-v3a-q27b-lr-21575-v1-uploader
Pipeline stage VLLMUploader completed in 307.35s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.12s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.66s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v3a-q27b-lr-21575-v1
Waiting for inference service chaiml-pony-v3a-q27b-lr-21575-v1 to be ready
2026-03-28T13:13:28.018350+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
Failed to get response for submission chaiml-gspo-glm47-cas72_44260_v1: ('http://chaiml-gspo-glm47-cas72-44260-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
2026-03-28T13:14:28.141940+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
Inference service chaiml-pony-v3a-q27b-lr-21575-v1 ready after 160.2878782749176s
Pipeline stage VLLMDeployer completed in 160.71s
run pipeline stage %s
Running pipeline stage StressChecker
2026-03-28T13:15:28.262772+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
Failed to get response for submission chaiml-gspo-glm47-chai-_76408_v1: ('http://chaiml-gspo-glm47-chai-76408-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T13:16:28.357330+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
Failed to get response for submission chaiml-gspo-glm47-cas72_44260_v1: ('http://chaiml-gspo-glm47-cas72-44260-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T13:17:28.493908+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.407217264175415s
Received healthy response to inference request in 1.7980608940124512s
Received healthy response to inference request in 4.644434690475464s
Received healthy response to inference request in 2.0264439582824707s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T13:18:28.869022+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.248419523239136s
Failed to get response for submission chaiml-gspo-glm47-chai-_76408_v1: ('http://chaiml-gspo-glm47-chai-76408-v1-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
Received healthy response to inference request in 4.690396547317505s
Received healthy response to inference request in 2.1771364212036133s
Received healthy response to inference request in 2.0146756172180176s
Received healthy response to inference request in 2.022326946258545s
Received healthy response to inference request in 1.9625840187072754s
Received healthy response to inference request in 2.051154375076294s
Received healthy response to inference request in 1.965296983718872s
Received healthy response to inference request in 1.909731149673462s
Received healthy response to inference request in 2.457899570465088s
Received healthy response to inference request in 5.9187235832214355s
Received healthy response to inference request in 2.764009475708008s
Received healthy response to inference request in 2.8070995807647705s
Received healthy response to inference request in 1.9794588088989258s
Received healthy response to inference request in 2.0171618461608887s
Received healthy response to inference request in 2.672569751739502s
Received healthy response to inference request in 2.0131454467773438s
30 requests
9 failed requests
5th percentile: 1.933514940738678
10th percentile: 1.9650256872177123
20th percentile: 2.014369583129883
30th percentile: 2.025208854675293
40th percentile: 2.3455943107604984
50th percentile: 2.785554528236389
60th percentile: 4.502104234695434
70th percentile: 10.179957914352377
80th percentile: 20.139870166778564
90th percentile: 20.167162704467774
95th percentile: 20.201291966438294
99th percentile: 20.335065016746523
mean time: 8.006185324986776
%s, retrying in %s seconds...
Received healthy response to inference request in 1.7648029327392578s
Received healthy response to inference request in 2.9756879806518555s
Received healthy response to inference request in 1.803589105606079s
2026-03-28T13:19:29.354370+00:00 monitor updated for chaiml-pony-v3a-q27b-lr_21575_v1
Received healthy response to inference request in 1.9230005741119385s
Received healthy response to inference request in 2.0849475860595703s
Received healthy response to inference request in 1.7981812953948975s
Received healthy response to inference request in 1.8041694164276123s
Received healthy response to inference request in 1.8018813133239746s
Received healthy response to inference request in 1.9436955451965332s
Received healthy response to inference request in 1.8457753658294678s
Received healthy response to inference request in 1.8922779560089111s
Received healthy response to inference request in 1.8673696517944336s
Received healthy response to inference request in 1.976747751235962s
Received healthy response to inference request in 1.9384636878967285s
Received healthy response to inference request in 2.6302504539489746s
Received healthy response to inference request in 1.9127764701843262s
Received healthy response to inference request in 1.9011056423187256s
Received healthy response to inference request in 1.8324110507965088s
Received healthy response to inference request in 2.008310317993164s
Received healthy response to inference request in 1.9351041316986084s
Received healthy response to inference request in 1.928729772567749s
Received healthy response to inference request in 1.9593112468719482s
Received healthy response to inference request in 2.5024454593658447s
Received healthy response to inference request in 2.3728818893432617s
Received healthy response to inference request in 2.005678415298462s
Received healthy response to inference request in 2.0702426433563232s
Received healthy response to inference request in 1.975886344909668s
Received healthy response to inference request in 2.0105669498443604s
Received healthy response to inference request in 2.0001957416534424s
Received healthy response to inference request in 2.035959243774414s
30 requests
0 failed requests
5th percentile: 1.7998463034629821
10th percentile: 1.8034183263778687
20th percentile: 1.843102502822876
30th percentile: 1.8984573364257813
40th percentile: 1.9264380931854248
50th percentile: 1.9410796165466309
60th percentile: 1.9762309074401856
70th percentile: 2.0064679861068724
80th percentile: 2.042815923690796
90th percentile: 2.38583824634552
95th percentile: 2.572738206386566
99th percentile: 2.8755110979080203
mean time: 2.0167481978734334
Pipeline stage StressChecker completed in 307.77s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.82s
Shutdown handler de-registered
chaiml-pony-v3a-q27b-lr_21575_v1 status is now deployed due to DeploymentManager action
chaiml-pony-v3a-q27b-lr_21575_v1 status is now inactive due to auto deactivation removed underperforming models
chaiml-pony-v3a-q27b-lr_21575_v1 status is now torndown due to DeploymentManager action