submission_id: chaiml-02f4-69d4-linear_76375_v7
developer_uid: chai_backend_admin
status: failed
model_repo: ChaiML/02f4-69d4-linear-w01-W4A16-G128-AutoRound
generation_params: {'temperature': 0.7, 'top_p': 0.95, 'min_p': 0.025, 'top_k': 80, 'presence_penalty': 0.4, 'frequency_penalty': 0.4, 'stopping_words': ['\n'], 'max_input_tokens': 1024, 'best_of': 8, 'max_output_tokens': 64}
formatter: {'memory_template': '<|im_start|>system\n{memory}<|im_end|>\n', 'prompt_template': '', 'bot_template': '<|im_start|>assistant\n{bot_name}: {message}<|im_end|>\n', 'user_template': '<|im_start|>user\n{user_name}: {message}<|im_end|>\n', 'response_template': '<|im_start|>assistant\n{bot_name}:', 'truncate_by_message': True}
timestamp: 2026-02-06T20:44:14+00:00
model_name: chaiml-02f4-69d4-linear_76375_v7
Resubmit model
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-02f4-69d4-linear-76375-v7-uploader
Waiting for job on chaiml-02f4-69d4-linear-76375-v7-uploader to finish
%s, retrying in %s seconds...
HTTP Request: %s %s "%s %d %s"
chaiml-02f4-69d4-linear-76375-v7-uploader: Using quantization_mode: none
chaiml-02f4-69d4-linear-76375-v7-uploader: Downloading snapshot of ChaiML/02f4-69d4-linear-w01-W4A16-G128-AutoRound...
HTTP Request: %s %s "%s %d %s"
chaiml-02f4-69d4-linear-76375-v7-uploader: Fetching 12 files: 0%| | 0/12 [00:00<?, ?it/s] Fetching 12 files: 8%|▊ | 1/12 [00:00<00:02, 3.71it/s] Fetching 12 files: 42%|████▏ | 5/12 [00:08<00:12, 1.81s/it] Fetching 12 files: 100%|██████████| 12/12 [00:08<00:00, 1.40it/s]
chaiml-02f4-69d4-linear-76375-v7-uploader: Downloaded in 8.720s
chaiml-02f4-69d4-linear-76375-v7-uploader: Processed model ChaiML/02f4-69d4-linear-w01-W4A16-G128-AutoRound in 13.999s
chaiml-02f4-69d4-linear-76375-v7-uploader: creating bucket guanaco-vllm-models
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:56: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-76375-v7-uploader: RE_S3_DATESTRING = re.compile('\.[0-9]*(?:[Z\\-\\+]*?)')
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/BaseUtils.py:57: SyntaxWarning: invalid escape sequence '\s'
chaiml-02f4-69d4-linear-76375-v7-uploader: RE_XML_NAMESPACE = re.compile(b'^(<?[^>]+?>\s*|\s*)(<\w+) xmlns=[\'"](https?://[^\'"]+)[\'"]', re.MULTILINE)
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:240: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-76375-v7-uploader: invalid = re.search("([^a-z0-9\.-])", bucket, re.UNICODE)
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:244: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-76375-v7-uploader: invalid = re.search("([^A-Za-z0-9\._-])", bucket, re.UNICODE)
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:255: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-76375-v7-uploader: if re.search("-\.", bucket, re.UNICODE):
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/Utils.py:257: SyntaxWarning: invalid escape sequence '\.'
chaiml-02f4-69d4-linear-76375-v7-uploader: if re.search("\.\.", bucket, re.UNICODE):
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/S3Uri.py:155: SyntaxWarning: invalid escape sequence '\w'
chaiml-02f4-69d4-linear-76375-v7-uploader: _re = re.compile("^(\w+://)?(.*)", re.UNICODE)
chaiml-02f4-69d4-linear-76375-v7-uploader: /usr/lib/python3/dist-packages/S3/FileLists.py:480: SyntaxWarning: invalid escape sequence '\*'
chaiml-02f4-69d4-linear-76375-v7-uploader: wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)
chaiml-02f4-69d4-linear-76375-v7-uploader: Bucket 's3://guanaco-vllm-models/' created
chaiml-02f4-69d4-linear-76375-v7-uploader: uploading /dev/shm/model_output to s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/.gitattributes s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/.gitattributes
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/generation_config.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/generation_config.json
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/recipe.yaml s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/recipe.yaml
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/config.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/config.json
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/model.safetensors.index.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/model.safetensors.index.json
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/README.md s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/README.md
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/tokenizer.json s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/tokenizer.json
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/model-00003-of-00003.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/model-00003-of-00003.safetensors
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/model-00001-of-00003.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/model-00001-of-00003.safetensors
chaiml-02f4-69d4-linear-76375-v7-uploader: cp /dev/shm/model_output/model-00002-of-00003.safetensors s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7/model-00002-of-00003.safetensors
Job chaiml-02f4-69d4-linear-76375-v7-uploader completed after 309.6s with status: succeeded
Stopping job with name chaiml-02f4-69d4-linear-76375-v7-uploader
Pipeline stage VLLMUploader completed in 310.11s
run pipeline stage %s
Running pipeline stage VLLMTemplater
Pipeline stage VLLMTemplater completed in 0.13s
run pipeline stage %s
Running pipeline stage VLLMDeployer
Creating inference service chaiml-02f4-69d4-linear-76375-v7
Waiting for inference service chaiml-02f4-69d4-linear-76375-v7 to be ready
HTTP Request: %s %s "%s %d %s"
HTTP Request: %s %s "%s %d %s"
Unable to record family friendly update due to error: Invalid JSON input: Expecting value: line 1 column 1 (char 0)
Unable to obtain autoscaler chaiml-2fe5-c13f-linear-w01_v12 due to error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Falling back to minimal defaults: logs=None autoscaler_id='fallback' status='deployed' metric=<function null_metric at 0x7efd8f6d6160> observations=[] max_observations_length=1000 target=0.0 actions=[SubmissionAutoscalerParameterAction(generation_parameter='max_input_tokens', current_scale=128, min_scale=512, max_scale=2048)] scale_up_policy=AutoscalerPolicy(max_percent_change=0.25, period=0.1, stabilisation_window=0, tolerance=0.1) scale_down_policy=AutoscalerPolicy(max_percent_change=0.25, period=0.1, stabilisation_window=0, tolerance=0.1) panic_policy=AutoscalerPanicPolicy(max_percent_change=1.0, period=0.1, stabilisation_window=0, tolerance=-1e-06, z_score=3, num_observations=5, num_historical_observations=100)
Tearing down inference service chaiml-02f4-69d4-linear-76375-v7
clean up pipeline due to error=DeploymentError('Timeout to start the InferenceService chaiml-02f4-69d4-linear-76375-v7. The InferenceService is as following: {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'kind\': \'InferenceService\', \'metadata\': {\'annotations\': {\'autoscaling.knative.dev/class\': \'hpa.autoscaling.knative.dev\', \'autoscaling.knative.dev/container-concurrency-target-percentage\': \'70\', \'autoscaling.knative.dev/initial-scale\': \'5\', \'autoscaling.knative.dev/max-scale-down-rate\': \'1.1\', \'autoscaling.knative.dev/max-scale-up-rate\': \'2\', \'autoscaling.knative.dev/metric\': \'mean_pod_latency_ms_v2\', \'autoscaling.knative.dev/panic-threshold-percentage\': \'650\', \'autoscaling.knative.dev/panic-window-percentage\': \'35\', \'autoscaling.knative.dev/scale-down-delay\': \'30s\', \'autoscaling.knative.dev/scale-to-zero-grace-period\': \'10m\', \'autoscaling.knative.dev/stable-window\': \'180s\', \'autoscaling.knative.dev/target\': \'4000\', \'autoscaling.knative.dev/target-burst-capacity\': \'-1\', \'autoscaling.knative.dev/tick-interval\': \'15s\', \'features.knative.dev/http-full-duplex\': \'Enabled\', \'networking.knative.dev/ingress-class\': \'istio.ingress.networking.knative.dev\'}, \'creationTimestamp\': \'2026-02-06T20:24:00Z\', \'finalizers\': [\'inferenceservice.finalizers\'], \'generation\': 1, \'labels\': {\'knative.coreweave.cloud/ingress\': \'istio.ingress.networking.knative.dev\', \'prometheus.k.chaiverse.com\': \'true\', \'qos.coreweave.cloud/latency\': \'low\'}, \'managedFields\': [{\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:annotations\': {\'.\': {}, \'f:autoscaling.knative.dev/class\': {}, \'f:autoscaling.knative.dev/container-concurrency-target-percentage\': {}, \'f:autoscaling.knative.dev/initial-scale\': {}, \'f:autoscaling.knative.dev/max-scale-down-rate\': {}, \'f:autoscaling.knative.dev/max-scale-up-rate\': {}, \'f:autoscaling.knative.dev/metric\': {}, \'f:autoscaling.knative.dev/panic-threshold-percentage\': {}, \'f:autoscaling.knative.dev/panic-window-percentage\': {}, \'f:autoscaling.knative.dev/scale-down-delay\': {}, \'f:autoscaling.knative.dev/scale-to-zero-grace-period\': {}, \'f:autoscaling.knative.dev/stable-window\': {}, \'f:autoscaling.knative.dev/target\': {}, \'f:autoscaling.knative.dev/target-burst-capacity\': {}, \'f:autoscaling.knative.dev/tick-interval\': {}, \'f:features.knative.dev/http-full-duplex\': {}, \'f:networking.knative.dev/ingress-class\': {}}, \'f:labels\': {\'.\': {}, \'f:knative.coreweave.cloud/ingress\': {}, \'f:prometheus.k.chaiverse.com\': {}, \'f:qos.coreweave.cloud/latency\': {}}}, \'f:spec\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:affinity\': {\'.\': {}, \'f:nodeAffinity\': {\'.\': {}, \'f:tion\': {}, \'f:requiredDuringSchedulingIgnoredDuringExecution\': {}}}, \'f:containerConcurrency\': {}, \'f:containers\': {}, \'f:imagePullSecrets\': {}, \'f:maxReplicas\': {}, \'f:minReplicas\': {}, \'f:priorityClassName\': {}, \'f:timeout\': {}, \'f:volumes\': {}}}}, \'manager\': \'OpenAPI-Generator\', \'operation\': \'Update\', \'time\': \'2026-02-06T20:24:00Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:finalizers\': {\'.\': {}, \'v:"inferenceservice.finalizers"\': {}}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'time\': \'2026-02-06T20:24:00Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:status\': {\'.\': {}, \'f:components\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:latestCreatedRevision\': {}}}, \'f:conditions\': {}, \'f:modelStatus\': {\'.\': {}, \'f:lastFailureInfo\': {\'.\': {}, \'f:exitCode\': {}, \'f:message\': {}, \'f:reason\': {}}, \'f:states\': {\'.\': {}, \'f:activeModelState\': {}, \'f:targetModelState\': {}}, \'f:transitionStatus\': {}}, \'f:observedGeneration\': {}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'subresource\': \'status\', \'time\': \'2026-02-06T20:34:01Z\'}], \'name\': \'chaiml-02f4-69d4-linear-76375-v7\', \'namespace\': \'tenant-chaiml-guanaco\', \'resourceVersion\': \'421232809\', \'uid\': \'9067adf5-9906-4282-90bb-cbbb6a1b4239\'}, \'spec\': {\'predictor\': {\'affinity\': {\'nodeAffinity\': {\'tion\': [{\'preference\': {\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}, \'weight\': 5}], \'requiredDuringSchedulingIgnoredDuringExecution\': {\'nodeSelectorTerms\': [{\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'A100_NVLINK_80GB\']}]}]}}}, \'containerConcurrency\': 0, \'containers\': [{\'args\': [\'serve\', \'s3://guanaco-vllm-models/chaiml-02f4-69d4-linear-76375-v7\', \'--port\', \'8080\', \'--tensor-parallel-size\', \'1\', \'--gpu-memory-utilization\', \'0.96\', \'--max-model-len\', \'10240\', \'--max-num-batched-tokens\', \'32768\', \'--max-num-seqs\', \'256\', \'--load-format\', \'runai_streamer\', \'--served-model-name\', \'ChaiML/02f4-69d4-linear-w01-W4A16-G128-AutoRound\'], \'env\': [{\'name\': \'RESERVE_MEMORY\', \'value\': \'2048\'}, {\'name\': \'DOWNLOAD_TO_LOCAL\', \'value\': \'/dev/shm/model_cache\'}, {\'name\': \'NUM_GPUS\', \'value\': \'1\'}, {\'name\': \'VLLM_ASSETS_CACHE\', \'value\': \'/code/vllm_assets_cache\'}, {\'name\': \'RUNAI_STREAMER_S3_USE_VIRTUAL_ADDRESSING\', \'value\': \'0\'}, {\'name\': \'RUNAI_STREAMER_CONCURRENCY\', \'value\': \'1\'}, {\'name\': \'AWS_EC2_METADATA_DISABLED\', \'value\': \'true\'}, {\'name\': \'AWS_ENDPOINT_URL_S3\', \'value\': \'http://s3-proxy.storage-system.svc.cluster.local:8080\'}, {\'name\': \'AWS_ACCESS_KEY_ID\', \'value\': \'LETMTTRMLFFAMTBK\'}, {\'name\': \'AWS_SECRET_ACCESS_KEY\', \'value\': \'VwwZaqefOOoaouNxUk03oUmK9pVEfruJhjBHPGdgycK\'}, {\'name\': \'AWS_ENDPOINT_URL\', \'value\': \'https://object.ord1.coreweave.com\'}, {\'name\': \'HF_TOKEN\', \'valueFrom\': {\'secretKeyRef\': {\'key\': \'token\', \'name\': \'hf-token\'}}}], \'image\': \'gcr.io/chai-959f8/vllm:v0.13.0\', \'imagePullPolicy\': \'IfNotPresent\', \'name\': \'kserve-container\', \'readinessProbe\': {\'failureThreshold\': 1, \'httpGet\': {\'path\': \'/v1/models\', \'port\': 8080}, \'initialDelaySeconds\': 60, \'periodSeconds\': 10, \'successThreshold\': 1, \'timeoutSeconds\': 5}, \'resources\': {\'limits\': {\'cpu\': \'2\', \'memory\': \'64Gi\', \'nvidia.com/gpu\': \'1\'}, \'requests\': {\'cpu\': \'2\', \'memory\': \'64Gi\', \'nvidia.com/gpu\': \'1\'}}, \'volumeMounts\': [{\'mountPath\': \'/dev/shm\', \'name\': \'shared-memory-cache\'}, {\'mountPath\': \'/root/.cache\', \'name\': \'cache-volume\'}]}], \'imagePullSecrets\': [{\'name\': \'docker-creds\'}], \'maxReplicas\': 40, \'minReplicas\': 0, \'priorityClassName\': \'creator-studio\', \'timeout\': 60, \'volumes\': [{\'emptyDir\': {\'medium\': \'Memory\', \'sizeLimit\': \'64Gi\'}, \'name\': \'shared-memory-cache\'}, {\'name\': \'cache-volume\', \'persistentVolumeClaim\': {\'claimName\': \'cache-pvc\'}}]}}, \'status\': {\'components\': {\'predictor\': {\'latestCreatedRevision\': \'chaiml-02f4-69d4-linear-76375-v7-predictor-00001\'}}, \'conditions\': [{\'lastTransitionTime\': \'2026-02-06T20:34:01Z\', \'reason\': \'PredictorConfigurationReady not ready\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'LatestDeploymentReady\'}, {\'lastTransitionTime\': \'2026-02-06T20:34:01Z\', \'message\': \'Revision "chaiml-02f4-69d4-linear-76375-v7-predictor-00001" failed with message: Container failed with: ^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 215, in from_vllm_config\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m return cls(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m self.engine_core = EngineCoreClient.make_async_mp_client(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m return AsyncMPClient(*client_args)\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 820, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m super().__init__(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 477, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m with launch_core_engines(vllm_config, executor_class, log_stats) as (\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m next(self.gen)\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 903, in launch_core_engines\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m wait_for_engine_startup(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 960, in wait_for_engine_startup\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m raise RuntimeError(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}\\n.\', \'reason\': \'RevisionFailed\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'PredictorConfigurationReady\'}, {\'lastTransitionTime\': \'2026-02-06T20:34:01Z\', \'message\': \'Configuration "chaiml-02f4-69d4-linear-76375-v7-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'status\': \'False\', \'type\': \'PredictorReady\'}, {\'lastTransitionTime\': \'2026-02-06T20:34:01Z\', \'message\': \'Configuration "chaiml-02f4-69d4-linear-76375-v7-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'PredictorRouteReady\'}, {\'lastTransitionTime\': \'2026-02-06T20:34:01Z\', \'message\': \'Configuration "chaiml-02f4-69d4-linear-76375-v7-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'status\': \'False\', \'type\': \'Ready\'}, {\'lastTransitionTime\': \'2026-02-06T20:34:01Z\', \'reason\': \'PredictorRouteReady not ready\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'RoutesReady\'}], \'modelStatus\': {\'lastFailureInfo\': {\'exitCode\': 1, \'message\': \'^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 215, in from_vllm_config\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m return cls(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m self.engine_core = EngineCoreClient.make_async_mp_client(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m return AsyncMPClient(*client_args)\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 820, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m super().__init__(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 477, in __init__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m with launch_core_engines(vllm_config, executor_class, log_stats) as (\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m next(self.gen)\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 903, in launch_core_engines\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m wait_for_engine_startup(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 960, in wait_for_engine_startup\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m raise RuntimeError(\\n\\x1b[0;36m(APIServer pid=1)\\x1b[0;0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}\\n\', \'reason\': \'ModelLoadFailed\'}, \'states\': {\'activeModelState\': \'\', \'targetModelState\': \'FailedToLoad\'}, \'transitionStatus\': \'BlockedByFailedLoad\'}, \'observedGeneration\': 1}}')
run pipeline stage %s
Running pipeline stage VLLMDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage VLLMDeleter completed in 0.19s
run pipeline stage %s
Running pipeline stage VLLMModelDeleter
Cleaning model data from S3
Cleaning model data from model cache
Deleting key chaiml-02f4-69d4-linear-76375-v7/.gitattributes from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/README.md from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/config.json from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/generation_config.json from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/model-00001-of-00003.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/model-00002-of-00003.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/model-00003-of-00003.safetensors from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/model.safetensors.index.json from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/recipe.yaml from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/special_tokens_map.json from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/tokenizer.json from bucket guanaco-vllm-models
Deleting key chaiml-02f4-69d4-linear-76375-v7/tokenizer_config.json from bucket guanaco-vllm-models
Pipeline stage VLLMModelDeleter completed in 2.41s
Shutdown handler de-registered
chaiml-02f4-69d4-linear_76375_v7 status is now failed due to DeploymentManager action