developer_uid: zonemercy
submission_id: chaiml-pony-v3-q27b-lr5_22882_v2
model_name: chaiml-pony-v3-q27b-lr5_22882_v2
model_group: ChaiML/pony-v3-q27b-lr5e
status: inactive
timestamp: 2026-03-28T10:17:37+00:00
num_battles: 10687
num_wins: 5437
celo_rating: 1301.5
family_friendly_score: 0.0
family_friendly_standard_error: 0.0
submission_type: basic
model_repo: ChaiML/pony-v3-q27b-lr5e6ep1g8
model_architecture: Qwen3_5ForConditionalGeneration
model_num_parameters: 23564784640.0
best_of: 8
max_input_tokens: 2048
max_output_tokens: 80
reward_model: default
display_name: chaiml-pony-v3-q27b-lr5_22882_v2
ineligible_reason: max_output_tokens!=64
is_internal_developer: True
language_model: ChaiML/pony-v3-q27b-lr5e6ep1g8
model_size: 24B
ranking_group: single
us_pacific_date: 2026-03-28
win_ratio: 0.5087489473191729
generation_params: {'temperature': 1.0, 'top_p': 1.0, 'min_p': 0.0, 'top_k': 40, 'presence_penalty': 0.8, 'frequency_penalty': 0.0, 'stopping_words': ['<|user|>', '<|assistant|>', '<|im_end|>', '####', '</s>'], 'max_input_tokens': 2048, 'best_of': 8, 'max_output_tokens': 80}
formatter: {'memory_template': "<|im_start|>system\n{bot_name}'s persona: {memory}<|im_end|>\n", 'prompt_template': '', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-pony-v3-q27b-lr5-22882-v2-uploader
Waiting for job on chaiml-pony-v3-q27b-lr5-22882-v2-uploader to finish
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Using quantization_mode: fp8
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Checking if ChaiML/pony-v3-q27b-lr5e6ep1g8-FP8 already exists in ChaiML
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Downloading snapshot of ChaiML/pony-v3-q27b-lr5e6ep1g8...
2026-03-28T07:07:45.197240+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Downloaded in 49.010s
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Loading /tmp/model_input...
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Applying quantization...
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:10.220852+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:12.207725+0000 | reset | INFO - Compression lifecycle reset
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:12.209759+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:12.257498+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:12.257757+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:12.270501+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:18.973300+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: 2026-03-28T07:08:18.973531+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Saving to /dev/shm/model_output...
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: warnings.warn(
2026-03-28T07:08:45.290964+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Processed model ChaiML/pony-v3-q27b-lr5e6ep1g8 in 111.662s
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: creating bucket guanaco-vllm-models
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/chat_template.jinja s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/chat_template.jinja
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/config.json
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/recipe.yaml
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/generation_config.json
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/tokenizer_config.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/tokenizer_config.json
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/tokenizer.json
2026-03-28T07:09:45.393129+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
chaiml-pony-v3-q27b-lr5-22882-v2-uploader: cp /dev/shm/model_output/model.safetensors s3://guanaco-vllm-models/chaiml-pony-v3-q27b-lr5-22882-v2/default/model.safetensors
Job chaiml-pony-v3-q27b-lr5-22882-v2-uploader completed after 214.83s with status: succeeded
Stopping job with name chaiml-pony-v3-q27b-lr5-22882-v2-uploader
Pipeline stage VLLMUploader completed in 224.27s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 2.76s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 3.66s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-pony-v3-q27b-lr5-22882-v2
Waiting for inference service chaiml-pony-v3-q27b-lr5-22882-v2 to be ready
2026-03-28T07:10:45.485785+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
2026-03-28T07:11:45.578746+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
2026-03-28T07:12:45.670338+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
Inference service chaiml-pony-v3-q27b-lr5-22882-v2 ready after 150.9703459739685s
Pipeline stage VLLMDeployer completed in 151.40s
run pipeline stage %s
Running pipeline stage StressChecker
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:13:45.838241+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Failed to get request counts for guanaco-submitter. Falling back to default
Received healthy response to inference request in 6.373048543930054s
Received healthy response to inference request in 7.66948127746582s
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
2026-03-28T07:14:45.943924+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
Received healthy response to inference request in 4.951267242431641s
Received healthy response to inference request in 4.949799537658691s
Received healthy response to inference request in 4.884439706802368s
Received healthy response to inference request in 19.652069568634033s
Received healthy response to inference request in 4.742084741592407s
Received healthy response to inference request in 3.9805517196655273s
2026-03-28T07:15:46.428522+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
HTTPConnectionPool(host='guanaco-submitter-v2.guanaco-backend.kchai-google-us-east4.chaiverse.com', port=80): Read timed out. (read timeout=20)
Received unhealthy response to inference request!
Received healthy response to inference request in 4.49804425239563s
Received healthy response to inference request in 4.7025861740112305s
Received healthy response to inference request in 4.960691690444946s
Received healthy response to inference request in 4.862971782684326s
Received healthy response to inference request in 4.89315128326416s
Received healthy response to inference request in 10.07646632194519s
Received healthy response to inference request in 4.670250654220581s
Received healthy response to inference request in 5.004351377487183s
Received healthy response to inference request in 4.274174451828003s
Received healthy response to inference request in 4.703656911849976s
Received healthy response to inference request in 4.943145036697388s
2026-03-28T07:16:46.548051+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
Received healthy response to inference request in 4.7525317668914795s
Received healthy response to inference request in 5.173712730407715s
Received healthy response to inference request in 4.995667219161987s
Received healthy response to inference request in 4.002584218978882s
Received healthy response to inference request in 4.791147947311401s
Received healthy response to inference request in 4.533114194869995s
30 requests
5 failed requests
5th percentile: 4.124799823760986
10th percentile: 4.475657272338867
20th percentile: 4.696119070053101
30th percentile: 4.749397659301758
40th percentile: 4.8758525371551515
50th percentile: 4.9464722871780396
60th percentile: 4.9746819019317625
70th percentile: 5.533513474464413
80th percentile: 11.991586971282986
90th percentile: 20.16251096725464
95th percentile: 20.20322757959366
99th percentile: 20.21013969182968
mean time: 8.132191451390584
%s, retrying in %s seconds...
Received healthy response to inference request in 4.745532989501953s
Received healthy response to inference request in 4.729236125946045s
Received healthy response to inference request in 4.871150493621826s
Received healthy response to inference request in 4.789826154708862s
Received healthy response to inference request in 4.647080659866333s
Received healthy response to inference request in 4.8841423988342285s
2026-03-28T07:17:46.650377+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
Received healthy response to inference request in 4.658332109451294s
Received healthy response to inference request in 4.804731369018555s
Received healthy response to inference request in 5.110602378845215s
Received healthy response to inference request in 4.826892137527466s
Received healthy response to inference request in 4.821890592575073s
Received healthy response to inference request in 4.513208627700806s
Received healthy response to inference request in 4.767880916595459s
Received healthy response to inference request in 5.1366355419158936s
Received healthy response to inference request in 5.480038404464722s
Received healthy response to inference request in 4.9539573192596436s
Received healthy response to inference request in 4.794221878051758s
Received healthy response to inference request in 4.867861986160278s
2026-03-28T07:18:46.808785+00:00 monitor updated for chaiml-pony-v3-q27b-lr5_22882_v2
Received healthy response to inference request in 4.79236626625061s
Received healthy response to inference request in 5.072187662124634s
Received healthy response to inference request in 4.821602821350098s
Received healthy response to inference request in 5.228905916213989s
Received healthy response to inference request in 4.87674880027771s
Received healthy response to inference request in 5.033249616622925s
Received healthy response to inference request in 4.773717403411865s
Received healthy response to inference request in 4.792651414871216s
Received healthy response to inference request in 4.769345283508301s
Received healthy response to inference request in 4.774690389633179s
Received healthy response to inference request in 4.799765348434448s
Received healthy response to inference request in 5.137390613555908s
30 requests
0 failed requests
5th percentile: 4.652143812179565
10th percentile: 4.72214572429657
20th percentile: 4.769052410125733
30th percentile: 4.785285425186157
40th percentile: 4.793593692779541
50th percentile: 4.813167095184326
60th percentile: 4.843280076980591
70th percentile: 4.878966879844666
80th percentile: 5.041037225723267
90th percentile: 5.136711049079895
95th percentile: 5.187724030017852
99th percentile: 5.4072099828720095
mean time: 4.87586145401001
Pipeline stage StressChecker completed in 397.11s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 0.76s
Shutdown handler de-registered
chaiml-pony-v3-q27b-lr5_22882_v2 status is now deployed due to DeploymentManager action
chaiml-pony-v3-q27b-lr5_22882_v2 status is now inactive due to auto deactivation removed underperforming models