Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
2026-04-13T21:47:21.924046+00:00 monitor updated for chaiml-pony-v2-g46-lr1_80834_v33
Running pipeline stage VLLMUploader
Starting job with name qwen-qwen3-235b-a22b-in-31992-v1-uploader
Waiting for job on qwen-qwen3-235b-a22b-in-31992-v1-uploader to finish
2026-04-13T21:48:21.565873+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
2026-04-13T21:48:22.698653+00:00 monitor updated for chaiml-pony-v2-g46-lr1_80834_v33
Tearing down inference service chaiml-pony-v2-g46-lr1-80834-v33
clean up pipeline due to error=DeploymentError('Timeout to start the InferenceService chaiml-pony-v2-g46-lr1-80834-v33. The InferenceService is as following: {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'kind\': \'InferenceService\', \'metadata\': {\'annotations\': {\'autoscaling.knative.dev/class\': \'hpa.autoscaling.knative.dev\', \'autoscaling.knative.dev/container-concurrency-target-percentage\': \'70\', \'autoscaling.knative.dev/initial-scale\': \'5\', \'autoscaling.knative.dev/max-scale-down-rate\': \'1.1\', \'autoscaling.knative.dev/max-scale-up-rate\': \'2\', \'autoscaling.knative.dev/metric\': \'mean_pod_latency_ms_v2\', \'autoscaling.knative.dev/panic-threshold-percentage\': \'650\', \'autoscaling.knative.dev/panic-window-percentage\': \'35\', \'autoscaling.knative.dev/scale-down-delay\': \'30s\', \'autoscaling.knative.dev/scale-to-zero-grace-period\': \'10m\', \'autoscaling.knative.dev/stable-window\': \'180s\', \'autoscaling.knative.dev/target\': \'4000\', \'autoscaling.knative.dev/target-burst-capacity\': \'-1\', \'autoscaling.knative.dev/tick-interval\': \'15s\', \'features.knative.dev/http-full-duplex\': \'Enabled\', \'networking.knative.dev/ingress-class\': \'istio.ingress.networking.knative.dev\', \'serving.knative.dev/progress-deadline\': \'40m\'}, \'creationTimestamp\': \'2026-04-13T21:08:47Z\', \'finalizers\': [\'inferenceservice.finalizers\'], \'generation\': 1, \'labels\': {\'istio.io/rev\': \'prod-canary\', \'knative.coreweave.cloud/ingress\': \'istio.ingress.networking.knative.dev\', \'prometheus.k.chaiverse.com\': \'true\', \'qos.coreweave.cloud/latency\': \'low\'}, \'managedFields\': [{\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:annotations\': {\'.\': {}, \'f:autoscaling.knative.dev/class\': {}, \'f:autoscaling.knative.dev/container-concurrency-target-percentage\': {}, \'f:autoscaling.knative.dev/initial-scale\': {}, \'f:autoscaling.knative.dev/max-scale-down-rate\': {}, \'f:autoscaling.knative.dev/max-scale-up-rate\': {}, \'f:autoscaling.knative.dev/metric\': {}, \'f:autoscaling.knative.dev/panic-threshold-percentage\': {}, \'f:autoscaling.knative.dev/panic-window-percentage\': {}, \'f:autoscaling.knative.dev/scale-down-delay\': {}, \'f:autoscaling.knative.dev/scale-to-zero-grace-period\': {}, \'f:autoscaling.knative.dev/stable-window\': {}, \'f:autoscaling.knative.dev/target\': {}, \'f:autoscaling.knative.dev/target-burst-capacity\': {}, \'f:autoscaling.knative.dev/tick-interval\': {}, \'f:features.knative.dev/http-full-duplex\': {}, \'f:networking.knative.dev/ingress-class\': {}, \'f:serving.knative.dev/progress-deadline\': {}}, \'f:labels\': {\'.\': {}, \'f:istio.io/rev\': {}, \'f:knative.coreweave.cloud/ingress\': {}, \'f:prometheus.k.chaiverse.com\': {}, \'f:qos.coreweave.cloud/latency\': {}}}, \'f:spec\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:affinity\': {\'.\': {}, \'f:nodeAffinity\': {\'.\': {}, \'f:tion\': {}, \'f:requiredDuringSchedulingIgnoredDuringExecution\': {}}, \'f:podAffinity\': {\'.\': {}, \'f:tion\': {}}}, \'f:containerConcurrency\': {}, \'f:containers\': {}, \'f:imagePullSecrets\': {}, \'f:maxReplicas\': {}, \'f:minReplicas\': {}, \'f:priorityClassName\': {}, \'f:timeout\': {}, \'f:volumes\': {}}}}, \'manager\': \'OpenAPI-Generator\', \'operation\': \'Update\', \'time\': \'2026-04-13T21:08:47Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:finalizers\': {\'.\': {}, \'v:"inferenceservice.finalizers"\': {}}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'time\': \'2026-04-13T21:08:47Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:status\': {\'.\': {}, \'f:components\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:latestCreatedRevision\': {}}}, \'f:conditions\': {}, \'f:modelStatus\': {\'.\': {}, \'f:states\': {\'.\': {}, \'f:activeModelState\': {}, \'f:targetModelState\': {}}, \'f:transitionStatus\': {}}, \'f:observedGeneration\': {}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'subresource\': \'status\', \'time\': \'2026-04-13T21:09:49Z\'}], \'name\': \'chaiml-pony-v2-g46-lr1-80834-v33\', \'namespace\': \'tenant-chaiml-guanaco\', \'resourceVersion\': \'1351079795\', \'uid\': \'6e09e3ef-d32f-4ce4-9f36-4b785e0145b0\'}, \'spec\': {\'predictor\': {\'affinity\': {\'nodeAffinity\': {\'tion\': [{\'preference\': {\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}, \'weight\': 5}], \'requiredDuringSchedulingIgnoredDuringExecution\': {\'nodeSelectorTerms\': [{\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}]}}, \'podAffinity\': {\'tion\': [{\'podAffinityTerm\': {\'labelSelector\': {\'matchLabels\': {\'serving.kserve.io/inferenceservice\': \'chaiml-pony-v2-g46-lr1-80834-v33\'}}, \'topologyKey\': \'kubernetes.io/hostname\'}, \'weight\': 100}]}}, \'containerConcurrency\': 0, \'containers\': [{\'args\': [\'serve\', \'s3://guanaco-vllm-models/chaiml-pony-v2-g46-lr1-80834-v33/default\', \'--port\', \'8080\', \'--tensor-parallel-size\', \'8\', \'--max-model-len\', \'10240\', \'--max-num-batched-tokens\', \'10240\', \'--max-num-seqs\', \'64\', \'--gpu-memory-utilization\', \'0.9\', \'--trust-remote-code\', \'--load-format\', \'runai_streamer\', \'--served-model-name\', \'ChaiML/pony-v2-g46-lr1e4ep1r64b16\', \'--model-loader-extra-config\', \'{"distributed": true, "concurrency": 2}\'], \'env\': [{\'name\': \'RESERVE_MEMORY\', \'value\': \'2048\'}, {\'name\': \'DOWNLOAD_TO_LOCAL\', \'value\': \'/dev/shm/model_cache\'}, {\'name\': \'NUM_GPUS\', \'value\': \'8\'}, {\'name\': \'VLLM_ASSETS_CACHE\', \'value\': \'/code/vllm_assets_cache\'}, {\'name\': \'RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING\', \'value\': \'1\'}, {\'name\': \'RUNAI_STREAMER_CONCURRENCY\', \'value\': \'1\'}, {\'name\': \'AWS_EC2_METADATA_DISABLED\', \'value\': \'true\'}, {\'name\': \'AWS_ACCESS_KEY_ID\', \'value\': \'CWZAGMHZXKZRFGJK\'}, {\'name\': \'AWS_SECRET_ACCESS_KEY\', \'value\': \'cwoAeWzp46q4O0sTNXOEuZ1MvZzKEFlS9DtEhnTldKp\'}, {\'name\': \'AWS_ENDPOINT_URL\', \'value\': \'https://cwobject.com\'}, {\'name\': \'HF_TOKEN\', \'valueFrom\': {\'secretKeyRef\': {\'key\': \'token\', \'name\': \'hf-token\'}}}, {\'name\': \'RUNAI_STREAMER_S3_REQUEST_TIMEOUT_MS\', \'value\': \'30000\'}, {\'name\': \'VLLM_ROCM_USE_AITER\', \'value\': \'1\'}, {\'name\': \'VLLM_ROCM_USE_AITER_MOE\', \'value\': \'1\'}, {\'name\': \'VLLM_USE_TRITON_FLASH_ATTN\', \'value\': \'0\'}], \'image\': \'gcr.io/chai-959f8/vllm:v0.19.0\', \'imagePullPolicy\': \'IfNotPresent\', \'name\': \'kserve-container\', \'readinessProbe\': {\'failureThreshold\': 1, \'httpGet\': {\'path\': \'/v1/models\', \'port\': 8080}, \'initialDelaySeconds\': 60, \'periodSeconds\': 10, \'successThreshold\': 1, \'timeoutSeconds\': 5}, \'resources\': {\'limits\': {\'cpu\': \'16\', \'memory\': \'386Gi\', \'nvidia.com/gpu\': \'8\'}, \'requests\': {\'cpu\': \'16\', \'memory\': \'386Gi\', \'nvidia.com/gpu\': \'8\'}}, \'volumeMounts\': [{\'mountPath\': \'/dev/shm\', \'name\': \'shared-memory-cache\'}]}], \'imagePullSecrets\': [{\'name\': \'docker-creds\'}], \'maxReplicas\': 5, \'minReplicas\': 0, \'priorityClassName\': \'chaiverse\', \'timeout\': 20, \'volumes\': [{\'emptyDir\': {\'medium\': \'Memory\', \'sizeLimit\': \'386Gi\'}, \'name\': \'shared-memory-cache\'}]}}, \'status\': {\'components\': {\'predictor\': {\'latestCreatedRevision\': \'chaiml-pony-v2-g46-lr1-80834-v33-predictor-00001\'}}, \'conditions\': [{\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'reason\': \'PredictorConfigurationReady not ready\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'LatestDeploymentReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'PredictorConfigurationReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'message\': \'Configuration "chaiml-pony-v2-g46-lr1-80834-v33-predictor" is waiting for a Revision to become ready.\', \'reason\': \'RevisionMissing\', \'status\': \'Unknown\', \'type\': \'PredictorReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'message\': \'Configuration "chaiml-pony-v2-g46-lr1-80834-v33-predictor" is waiting for a Revision to become ready.\', \'reason\': \'RevisionMissing\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'PredictorRouteReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'message\': \'Configuration "chaiml-pony-v2-g46-lr1-80834-v33-predictor" is waiting for a Revision to become ready.\', \'reason\': \'RevisionMissing\', \'status\': \'Unknown\', \'type\': \'Ready\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'reason\': \'PredictorRouteReady not ready\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'RoutesReady\'}], \'modelStatus\': {\'states\': {\'activeModelState\': \'\', \'targetModelState\': \'Pending\'}, \'transitionStatus\': \'InProgress\'}, \'observedGeneration\': 1}}')
run pipeline stage %s
Running pipeline stage VLLMDeleter
Checking if service chaiml-pony-v2-g46-lr1-80834-v33 is running
Skipping teardown as no inference service was found
Pipeline stage VLLMDeleter completed in 1.92s
run pipeline stage %s
Running pipeline stage VLLMModelDeleter
Cleaning model data from S3
Cleaning model data from model cache
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/.gitattributes from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/chat_template.jinja from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/config.json from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/generation_config.json from bucket guanaco-vllm-models
2026-04-13T21:49:21.994848+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00001-of-00072.safetensors from bucket guanaco-vllm-models
2026-04-13T21:49:23.085407+00:00 monitor updated for chaiml-pony-v2-g46-lr1_80834_v33
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00002-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00003-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00004-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00005-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00006-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00007-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00008-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00009-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00010-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00011-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00012-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00013-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00014-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00015-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00016-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00017-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00018-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00019-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00020-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00021-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00022-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00023-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00024-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00025-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00026-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00027-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00028-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00029-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00030-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00031-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00032-of-00072.safetensors from bucket guanaco-vllm-models
qwen-qwen3-235b-a22b-in-31992-v1-uploader: Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 is already quantized
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00033-of-00072.safetensors from bucket guanaco-vllm-models
qwen-qwen3-235b-a22b-in-31992-v1-uploader: Using quantization_mode: none
qwen-qwen3-235b-a22b-in-31992-v1-uploader: Downloading snapshot of Qwen/Qwen3-235B-A22B-Instruct-2507-FP8...
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00034-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00035-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00036-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00037-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00038-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00039-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00040-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00041-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00042-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00043-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00044-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00045-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00046-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00047-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00048-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00049-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00050-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00051-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00052-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00053-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00054-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00055-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00056-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00057-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00058-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00059-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00060-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00061-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00062-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00063-of-00072.safetensors from bucket guanaco-vllm-models
2026-04-13T21:50:22.834118+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
2026-04-13T21:50:23.480244+00:00 monitor updated for chaiml-pony-v2-g46-lr1_80834_v33
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00064-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00065-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00066-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00067-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00068-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00069-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00070-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00071-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model-00072-of-00072.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/model.safetensors.index.json from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/recipe.yaml from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/special_tokens_map.json from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/tokenizer.json from bucket guanaco-vllm-models
Deleting key chaiml-pony-v2-g46-lr1-80834-v33/default/tokenizer_config.json from bucket guanaco-vllm-models
Pipeline stage VLLMModelDeleter completed in 78.65s
Shutdown handler de-registered
DeploymentError('Timeout to start the InferenceService chaiml-pony-v2-g46-lr1-80834-v33. The InferenceService is as following: {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'kind\': \'InferenceService\', \'metadata\': {\'annotations\': {\'autoscaling.knative.dev/class\': \'hpa.autoscaling.knative.dev\', \'autoscaling.knative.dev/container-concurrency-target-percentage\': \'70\', \'autoscaling.knative.dev/initial-scale\': \'5\', \'autoscaling.knative.dev/max-scale-down-rate\': \'1.1\', \'autoscaling.knative.dev/max-scale-up-rate\': \'2\', \'autoscaling.knative.dev/metric\': \'mean_pod_latency_ms_v2\', \'autoscaling.knative.dev/panic-threshold-percentage\': \'650\', \'autoscaling.knative.dev/panic-window-percentage\': \'35\', \'autoscaling.knative.dev/scale-down-delay\': \'30s\', \'autoscaling.knative.dev/scale-to-zero-grace-period\': \'10m\', \'autoscaling.knative.dev/stable-window\': \'180s\', \'autoscaling.knative.dev/target\': \'4000\', \'autoscaling.knative.dev/target-burst-capacity\': \'-1\', \'autoscaling.knative.dev/tick-interval\': \'15s\', \'features.knative.dev/http-full-duplex\': \'Enabled\', \'networking.knative.dev/ingress-class\': \'istio.ingress.networking.knative.dev\', \'serving.knative.dev/progress-deadline\': \'40m\'}, \'creationTimestamp\': \'2026-04-13T21:08:47Z\', \'finalizers\': [\'inferenceservice.finalizers\'], \'generation\': 1, \'labels\': {\'istio.io/rev\': \'prod-canary\', \'knative.coreweave.cloud/ingress\': \'istio.ingress.networking.knative.dev\', \'prometheus.k.chaiverse.com\': \'true\', \'qos.coreweave.cloud/latency\': \'low\'}, \'managedFields\': [{\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:annotations\': {\'.\': {}, \'f:autoscaling.knative.dev/class\': {}, \'f:autoscaling.knative.dev/container-concurrency-target-percentage\': {}, \'f:autoscaling.knative.dev/initial-scale\': {}, \'f:autoscaling.knative.dev/max-scale-down-rate\': {}, \'f:autoscaling.knative.dev/max-scale-up-rate\': {}, \'f:autoscaling.knative.dev/metric\': {}, \'f:autoscaling.knative.dev/panic-threshold-percentage\': {}, \'f:autoscaling.knative.dev/panic-window-percentage\': {}, \'f:autoscaling.knative.dev/scale-down-delay\': {}, \'f:autoscaling.knative.dev/scale-to-zero-grace-period\': {}, \'f:autoscaling.knative.dev/stable-window\': {}, \'f:autoscaling.knative.dev/target\': {}, \'f:autoscaling.knative.dev/target-burst-capacity\': {}, \'f:autoscaling.knative.dev/tick-interval\': {}, \'f:features.knative.dev/http-full-duplex\': {}, \'f:networking.knative.dev/ingress-class\': {}, \'f:serving.knative.dev/progress-deadline\': {}}, \'f:labels\': {\'.\': {}, \'f:istio.io/rev\': {}, \'f:knative.coreweave.cloud/ingress\': {}, \'f:prometheus.k.chaiverse.com\': {}, \'f:qos.coreweave.cloud/latency\': {}}}, \'f:spec\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:affinity\': {\'.\': {}, \'f:nodeAffinity\': {\'.\': {}, \'f:tion\': {}, \'f:requiredDuringSchedulingIgnoredDuringExecution\': {}}, \'f:podAffinity\': {\'.\': {}, \'f:tion\': {}}}, \'f:containerConcurrency\': {}, \'f:containers\': {}, \'f:imagePullSecrets\': {}, \'f:maxReplicas\': {}, \'f:minReplicas\': {}, \'f:priorityClassName\': {}, \'f:timeout\': {}, \'f:volumes\': {}}}}, \'manager\': \'OpenAPI-Generator\', \'operation\': \'Update\', \'time\': \'2026-04-13T21:08:47Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:finalizers\': {\'.\': {}, \'v:"inferenceservice.finalizers"\': {}}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'time\': \'2026-04-13T21:08:47Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:status\': {\'.\': {}, \'f:components\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:latestCreatedRevision\': {}}}, \'f:conditions\': {}, \'f:modelStatus\': {\'.\': {}, \'f:states\': {\'.\': {}, \'f:activeModelState\': {}, \'f:targetModelState\': {}}, \'f:transitionStatus\': {}}, \'f:observedGeneration\': {}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'subresource\': \'status\', \'time\': \'2026-04-13T21:09:49Z\'}], \'name\': \'chaiml-pony-v2-g46-lr1-80834-v33\', \'namespace\': \'tenant-chaiml-guanaco\', \'resourceVersion\': \'1351079795\', \'uid\': \'6e09e3ef-d32f-4ce4-9f36-4b785e0145b0\'}, \'spec\': {\'predictor\': {\'affinity\': {\'nodeAffinity\': {\'tion\': [{\'preference\': {\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}, \'weight\': 5}], \'requiredDuringSchedulingIgnoredDuringExecution\': {\'nodeSelectorTerms\': [{\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}]}}, \'podAffinity\': {\'tion\': [{\'podAffinityTerm\': {\'labelSelector\': {\'matchLabels\': {\'serving.kserve.io/inferenceservice\': \'chaiml-pony-v2-g46-lr1-80834-v33\'}}, \'topologyKey\': \'kubernetes.io/hostname\'}, \'weight\': 100}]}}, \'containerConcurrency\': 0, \'containers\': [{\'args\': [\'serve\', \'s3://guanaco-vllm-models/chaiml-pony-v2-g46-lr1-80834-v33/default\', \'--port\', \'8080\', \'--tensor-parallel-size\', \'8\', \'--max-model-len\', \'10240\', \'--max-num-batched-tokens\', \'10240\', \'--max-num-seqs\', \'64\', \'--gpu-memory-utilization\', \'0.9\', \'--trust-remote-code\', \'--load-format\', \'runai_streamer\', \'--served-model-name\', \'ChaiML/pony-v2-g46-lr1e4ep1r64b16\', \'--model-loader-extra-config\', \'{"distributed": true, "concurrency": 2}\'], \'env\': [{\'name\': \'RESERVE_MEMORY\', \'value\': \'2048\'}, {\'name\': \'DOWNLOAD_TO_LOCAL\', \'value\': \'/dev/shm/model_cache\'}, {\'name\': \'NUM_GPUS\', \'value\': \'8\'}, {\'name\': \'VLLM_ASSETS_CACHE\', \'value\': \'/code/vllm_assets_cache\'}, {\'name\': \'RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING\', \'value\': \'1\'}, {\'name\': \'RUNAI_STREAMER_CONCURRENCY\', \'value\': \'1\'}, {\'name\': \'AWS_EC2_METADATA_DISABLED\', \'value\': \'true\'}, {\'name\': \'AWS_ACCESS_KEY_ID\', \'value\': \'CWZAGMHZXKZRFGJK\'}, {\'name\': \'AWS_SECRET_ACCESS_KEY\', \'value\': \'cwoAeWzp46q4O0sTNXOEuZ1MvZzKEFlS9DtEhnTldKp\'}, {\'name\': \'AWS_ENDPOINT_URL\', \'value\': \'https://cwobject.com\'}, {\'name\': \'HF_TOKEN\', \'valueFrom\': {\'secretKeyRef\': {\'key\': \'token\', \'name\': \'hf-token\'}}}, {\'name\': \'RUNAI_STREAMER_S3_REQUEST_TIMEOUT_MS\', \'value\': \'30000\'}, {\'name\': \'VLLM_ROCM_USE_AITER\', \'value\': \'1\'}, {\'name\': \'VLLM_ROCM_USE_AITER_MOE\', \'value\': \'1\'}, {\'name\': \'VLLM_USE_TRITON_FLASH_ATTN\', \'value\': \'0\'}], \'image\': \'gcr.io/chai-959f8/vllm:v0.19.0\', \'imagePullPolicy\': \'IfNotPresent\', \'name\': \'kserve-container\', \'readinessProbe\': {\'failureThreshold\': 1, \'httpGet\': {\'path\': \'/v1/models\', \'port\': 8080}, \'initialDelaySeconds\': 60, \'periodSeconds\': 10, \'successThreshold\': 1, \'timeoutSeconds\': 5}, \'resources\': {\'limits\': {\'cpu\': \'16\', \'memory\': \'386Gi\', \'nvidia.com/gpu\': \'8\'}, \'requests\': {\'cpu\': \'16\', \'memory\': \'386Gi\', \'nvidia.com/gpu\': \'8\'}}, \'volumeMounts\': [{\'mountPath\': \'/dev/shm\', \'name\': \'shared-memory-cache\'}]}], \'imagePullSecrets\': [{\'name\': \'docker-creds\'}], \'maxReplicas\': 5, \'minReplicas\': 0, \'priorityClassName\': \'chaiverse\', \'timeout\': 20, \'volumes\': [{\'emptyDir\': {\'medium\': \'Memory\', \'sizeLimit\': \'386Gi\'}, \'name\': \'shared-memory-cache\'}]}}, \'status\': {\'components\': {\'predictor\': {\'latestCreatedRevision\': \'chaiml-pony-v2-g46-lr1-80834-v33-predictor-00001\'}}, \'conditions\': [{\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'reason\': \'PredictorConfigurationReady not ready\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'LatestDeploymentReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'PredictorConfigurationReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'message\': \'Configuration "chaiml-pony-v2-g46-lr1-80834-v33-predictor" is waiting for a Revision to become ready.\', \'reason\': \'RevisionMissing\', \'status\': \'Unknown\', \'type\': \'PredictorReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'message\': \'Configuration "chaiml-pony-v2-g46-lr1-80834-v33-predictor" is waiting for a Revision to become ready.\', \'reason\': \'RevisionMissing\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'PredictorRouteReady\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'message\': \'Configuration "chaiml-pony-v2-g46-lr1-80834-v33-predictor" is waiting for a Revision to become ready.\', \'reason\': \'RevisionMissing\', \'status\': \'Unknown\', \'type\': \'Ready\'}, {\'lastTransitionTime\': \'2026-04-13T21:09:49Z\', \'reason\': \'PredictorRouteReady not ready\', \'severity\': \'Info\', \'status\': \'Unknown\', \'type\': \'RoutesReady\'}], \'modelStatus\': {\'states\': {\'activeModelState\': \'\', \'targetModelState\': \'Pending\'}, \'transitionStatus\': \'InProgress\'}, \'observedGeneration\': 1}}')
chaiml-pony-v2-g46-lr1_80834_v33 status is now failed due to DeploymentManager action
qwen-qwen3-235b-a22b-in-31992-v1-uploader: Downloaded in 72.270s
qwen-qwen3-235b-a22b-in-31992-v1-uploader: Processed model Qwen/Qwen3-235B-A22B-Instruct-2507-FP8 in 72.411s
qwen-qwen3-235b-a22b-in-31992-v1-uploader: creating bucket guanaco-vllm-models
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: if re.search("-\.", bucket, re.UNICODE):
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: if re.search("\.\.", bucket, re.UNICODE):
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
qwen-qwen3-235b-a22b-in-31992-v1-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
qwen-qwen3-235b-a22b-in-31992-v1-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
qwen-qwen3-235b-a22b-in-31992-v1-uploader: Bucket 's3://guanaco-vllm-models/' created
qwen-qwen3-235b-a22b-in-31992-v1-uploader: uploading /tmp/model_output to s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default
2026-04-13T21:51:28.445608+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/.gitattributes s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/.gitattributes
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/generation_config.json s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/generation_config.json
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/LICENSE s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/LICENSE
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/config.json s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/config.json
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/tokenizer_config.json s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/tokenizer_config.json
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/README.md s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/README.md
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/merges.txt s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/merges.txt
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/tokenizer.json s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/tokenizer.json
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/vocab.json s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/vocab.json
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model.safetensors.index.json s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model.safetensors.index.json
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00024-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00024-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00022-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00022-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00020-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00020-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00011-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00011-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00016-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00016-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00010-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00010-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00009-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00009-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00017-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00017-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00021-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00021-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00008-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00008-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00007-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00007-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00004-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00004-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00002-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00002-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00006-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00006-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00001-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00001-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00023-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00023-of-00024.safetensors
qwen-qwen3-235b-a22b-in-31992-v1-uploader: cp /tmp/model_output/model-00005-of-00024.safetensors s3://guanaco-vllm-models/qwen-qwen3-235b-a22b-in-31992-v1/default/model-00005-of-00024.safetensors
2026-04-13T21:52:28.694514+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
Job qwen-qwen3-235b-a22b-in-31992-v1-uploader completed after 319.35s with status: succeeded
Stopping job with name qwen-qwen3-235b-a22b-in-31992-v1-uploader
Pipeline stage VLLMUploader completed in 321.29s
run pipeline stage %s
Running pipeline stage VLLMUploaderAMD
Pipeline stage vllm_upload_amd skipped, reason=not amd cluster
Pipeline stage VLLMUploaderAMD completed in 0.24s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 1.10s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service qwen-qwen3-235b-a22b-in-31992-v1
Waiting for inference service qwen-qwen3-235b-a22b-in-31992-v1 to be ready
2026-04-13T21:53:28.949819+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
2026-04-13T21:54:29.189464+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
2026-04-13T21:55:29.435135+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
2026-04-13T21:56:29.649186+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
2026-04-13T21:57:30.053367+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
Inference service qwen-qwen3-235b-a22b-in-31992-v1 ready after 334.5636477470398s
Pipeline stage VLLMDeployer completed in 336.06s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 4.815791130065918s
Received healthy response to inference request in 1.8511981964111328s
2026-04-13T21:58:30.299506+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
Received healthy response to inference request in 2.017439842224121s
Received healthy response to inference request in 1.848726749420166s
Received healthy response to inference request in 1.9307293891906738s
Received healthy response to inference request in 1.7765295505523682s
Received healthy response to inference request in 2.1293962001800537s
Received healthy response to inference request in 1.8837401866912842s
Received healthy response to inference request in 2.2361738681793213s
Received healthy response to inference request in 1.8425219058990479s
Received healthy response to inference request in 2.0400586128234863s
Received healthy response to inference request in 1.8641645908355713s
Received healthy response to inference request in 1.976017713546753s
Received healthy response to inference request in 1.8855924606323242s
Received healthy response to inference request in 1.9045779705047607s
Received healthy response to inference request in 2.2351393699645996s
Received healthy response to inference request in 1.8906667232513428s
Received healthy response to inference request in 1.8708949089050293s
Received healthy response to inference request in 1.8702712059020996s
Received healthy response to inference request in 2.2052974700927734s
Received healthy response to inference request in 1.8616108894348145s
Received healthy response to inference request in 2.0024428367614746s
Received healthy response to inference request in 4.6631550788879395s
Received healthy response to inference request in 1.947417974472046s
Received healthy response to inference request in 4.968926191329956s
Received healthy response to inference request in 1.8441319465637207s
Received healthy response to inference request in 1.9489545822143555s
Received healthy response to inference request in 1.9348351955413818s
2026-04-13T21:59:30.541004+00:00 monitor updated for qwen-qwen3-235b-a22b-in_31992_v1
Received healthy response to inference request in 2.061389684677124s
Received healthy response to inference request in 1.927743673324585s
30 requests
0 failed requests
5th percentile: 1.8432464241981505
10th percentile: 1.8482672691345214
20th percentile: 1.8636538505554199
30th percentile: 1.8798866033554078
40th percentile: 1.8990134716033935
50th percentile: 1.9327822923660278
60th percentile: 1.9597798347473143
70th percentile: 2.0242254734039307
80th percentile: 2.144576454162598
90th percentile: 2.4788719892501865
95th percentile: 4.747104907035827
99th percentile: 4.924517023563385
mean time: 2.2411845366160077
Pipeline stage StressChecker completed in 74.27s
Shutdown handler de-registered
qwen-qwen3-235b-a22b-in_31992_v1 status is now deployed due to DeploymentManager action
qwen-qwen3-235b-a22b-in_31992_v1 status is now inactive due to auto deactivation removed underperforming models
qwen-qwen3-235b-a22b-in_31992_v1 status is now torndown due to DeploymentManager action