Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name rica40325-mixdata-1w5step-v3-mkmlizer
Waiting for job on rica40325-mixdata-1w5step-v3-mkmlizer to finish
rica40325-mixdata-1w5step-v3-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
rica40325-mixdata-1w5step-v3-mkmlizer: ║ _____ __ __ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ /___/ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ Version: 0.11.12 ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ https://mk1.ai ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ The license key for the current software has been verified as ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ belonging to: ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ Chai Research Corp. ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ Expiration: 2025-01-15 23:59:59 ║
rica40325-mixdata-1w5step-v3-mkmlizer: ║ ║
rica40325-mixdata-1w5step-v3-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
rica40325-mixdata-1w5step-v3-mkmlizer: Downloaded to shared memory in 46.820s
rica40325-mixdata-1w5step-v3-mkmlizer: quantizing model to /dev/shm/model_cache, profile:s0, folder:/tmp/tmpn67_ueeb, device:0
rica40325-mixdata-1w5step-v3-mkmlizer: Saving flywheel model at /dev/shm/model_cache
rica40325-mixdata-1w5step-v3-mkmlizer: quantized model in 41.855s
rica40325-mixdata-1w5step-v3-mkmlizer: Processed model rica40325/mixdata-1w5step in 88.675s
rica40325-mixdata-1w5step-v3-mkmlizer: creating bucket guanaco-mkml-models
rica40325-mixdata-1w5step-v3-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
rica40325-mixdata-1w5step-v3-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3
rica40325-mixdata-1w5step-v3-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3/tokenizer_config.json
rica40325-mixdata-1w5step-v3-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3/tokenizer.json
rica40325-mixdata-1w5step-v3-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3/flywheel_model.0.safetensors
rica40325-mixdata-1w5step-v3-mkmlizer:
Loading 0: 0%| | 0/363 [00:00<?, ?it/s]
Loading 0: 1%|▏ | 5/363 [00:00<00:16, 21.23it/s]
Loading 0: 3%|▎ | 10/363 [00:00<00:12, 27.25it/s]
Loading 0: 4%|▍ | 14/363 [00:00<00:14, 24.03it/s]
Loading 0: 6%|▌ | 20/363 [00:00<00:10, 33.25it/s]
Loading 0: 7%|▋ | 24/363 [00:00<00:14, 22.76it/s]
Loading 0: 7%|▋ | 27/363 [00:01<00:15, 21.88it/s]
Loading 0: 9%|▊ | 31/363 [00:01<00:13, 25.13it/s]
Loading 0: 9%|▉ | 34/363 [00:01<00:13, 25.12it/s]
Loading 0: 11%|█ | 39/363 [00:01<00:11, 28.38it/s]
Loading 0: 12%|█▏ | 43/363 [00:01<00:11, 27.22it/s]
Loading 0: 13%|█▎ | 48/363 [00:01<00:10, 29.89it/s]
Loading 0: 14%|█▍ | 52/363 [00:01<00:11, 28.24it/s]
Loading 0: 15%|█▌ | 56/363 [00:02<00:10, 28.59it/s]
Loading 0: 17%|█▋ | 61/363 [00:02<00:12, 23.63it/s]
Loading 0: 18%|█▊ | 64/363 [00:02<00:14, 20.66it/s]
Loading 0: 20%|█▉ | 71/363 [00:02<00:10, 27.68it/s]
Loading 0: 21%|██ | 75/363 [00:02<00:10, 27.15it/s]
Loading 0: 21%|██▏ | 78/363 [00:03<00:11, 24.63it/s]
Loading 0: 23%|██▎ | 84/363 [00:03<00:09, 28.87it/s]
Loading 0: 24%|██▍ | 88/363 [00:03<00:10, 27.23it/s]
Loading 0: 25%|██▌ | 91/363 [00:03<00:09, 27.63it/s]
Loading 0: 26%|██▌ | 95/363 [00:03<00:10, 24.86it/s]
Loading 0: 28%|██▊ | 101/363 [00:03<00:11, 23.57it/s]
Loading 0: 29%|██▊ | 104/363 [00:04<00:12, 20.90it/s]
Loading 0: 31%|███ | 111/363 [00:04<00:09, 27.52it/s]
Loading 0: 31%|███▏ | 114/363 [00:04<00:10, 24.03it/s]
Loading 0: 33%|███▎ | 118/363 [00:04<00:09, 26.81it/s]
Loading 0: 34%|███▎ | 122/363 [00:04<00:09, 24.50it/s]
Loading 0: 36%|███▌ | 129/363 [00:04<00:07, 30.58it/s]
Loading 0: 37%|███▋ | 133/363 [00:05<00:07, 29.06it/s]
Loading 0: 38%|███▊ | 137/363 [00:05<00:07, 29.44it/s]
Loading 0: 39%|███▉ | 142/363 [00:05<00:08, 25.34it/s]
Loading 0: 40%|███▉ | 145/363 [00:05<00:09, 23.85it/s]
Loading 0: 41%|████ | 149/363 [00:05<00:09, 23.00it/s]
Loading 0: 42%|████▏ | 154/363 [00:05<00:07, 27.94it/s]
Loading 0: 44%|████▎ | 158/363 [00:06<00:08, 24.71it/s]
Loading 0: 45%|████▍ | 163/363 [00:06<00:06, 29.41it/s]
Loading 0: 46%|████▌ | 167/363 [00:06<00:07, 25.09it/s]
Loading 0: 47%|████▋ | 172/363 [00:06<00:06, 29.87it/s]
Loading 0: 48%|████▊ | 176/363 [00:06<00:07, 25.71it/s]
Loading 0: 50%|████▉ | 181/363 [00:06<00:05, 30.43it/s]
Loading 0: 51%|█████ | 185/363 [00:07<00:08, 20.06it/s]
Loading 0: 53%|█████▎ | 192/363 [00:07<00:06, 26.45it/s]
Loading 0: 54%|█████▍ | 196/363 [00:07<00:06, 26.09it/s]
Loading 0: 55%|█████▌ | 201/363 [00:07<00:05, 28.59it/s]
Loading 0: 56%|█████▋ | 205/363 [00:07<00:05, 27.84it/s]
Loading 0: 58%|█████▊ | 210/363 [00:07<00:05, 30.44it/s]
Loading 0: 59%|█████▉ | 214/363 [00:08<00:05, 29.16it/s]
Loading 0: 60%|██████ | 218/363 [00:08<00:04, 29.07it/s]
Loading 0: 61%|██████▏ | 223/363 [00:08<00:05, 25.00it/s]
Loading 0: 62%|██████▏ | 226/363 [00:08<00:05, 23.65it/s]
Loading 0: 63%|██████▎ | 230/363 [00:08<00:05, 22.42it/s]
Loading 0: 65%|██████▌ | 237/363 [00:09<00:04, 29.27it/s]
Loading 0: 66%|██████▋ | 241/363 [00:09<00:04, 27.71it/s]
Loading 0: 68%|██████▊ | 246/363 [00:09<00:04, 29.25it/s]
Loading 0: 69%|██████▉ | 250/363 [00:09<00:04, 28.23it/s]
Loading 0: 70%|███████ | 255/363 [00:09<00:03, 30.82it/s]
Loading 0: 71%|███████▏ | 259/363 [00:09<00:03, 29.26it/s]
Loading 0: 72%|███████▏ | 263/363 [00:10<00:04, 23.76it/s]
Loading 0: 73%|███████▎ | 266/363 [00:10<00:04, 21.16it/s]
Loading 0: 75%|███████▌ | 273/363 [00:10<00:03, 28.03it/s]
Loading 0: 76%|███████▋ | 277/363 [00:10<00:03, 27.56it/s]
Loading 0: 78%|███████▊ | 282/363 [00:10<00:02, 30.27it/s]
Loading 0: 79%|███████▉ | 286/363 [00:10<00:02, 28.85it/s]
Loading 0: 80%|████████ | 291/363 [00:10<00:02, 31.15it/s]
Loading 0: 81%|████████▏ | 295/363 [00:11<00:02, 29.72it/s]
Loading 0: 82%|████████▏ | 299/363 [00:11<00:02, 29.70it/s]
Loading 0: 84%|████████▎ | 304/363 [00:11<00:02, 25.40it/s]
Loading 0: 85%|████████▍ | 307/363 [00:11<00:02, 24.06it/s]
Loading 0: 86%|████████▌ | 311/363 [00:11<00:02, 22.25it/s]
Loading 0: 88%|████████▊ | 318/363 [00:11<00:01, 29.03it/s]
Loading 0: 89%|████████▊ | 322/363 [00:12<00:01, 28.09it/s]
Loading 0: 90%|████████▉ | 325/363 [00:12<00:01, 28.41it/s]
Loading 0: 91%|█████████ | 329/363 [00:12<00:01, 25.22it/s]
Loading 0: 93%|█████████▎| 336/363 [00:12<00:00, 31.44it/s]
Loading 0: 94%|█████████▎| 340/363 [00:12<00:00, 29.56it/s]
Loading 0: 95%|█████████▍| 344/363 [00:19<00:09, 2.03it/s]
Loading 0: 96%|█████████▌| 348/363 [00:19<00:05, 2.69it/s]
Loading 0: 97%|█████████▋| 353/363 [00:20<00:02, 3.88it/s]
Loading 0: 98%|█████████▊| 357/363 [00:20<00:01, 5.01it/s]
Job rica40325-mixdata-1w5step-v3-mkmlizer completed after 125.0s with status: succeeded
Stopping job with name rica40325-mixdata-1w5step-v3-mkmlizer
Pipeline stage MKMLizer completed in 127.22s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service rica40325-mixdata-1w5step-v3
Waiting for inference service rica40325-mixdata-1w5step-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Tearing down inference service rica40325-mixdata-1w5step-v3
%s, retrying in %s seconds...
Creating inference service rica40325-mixdata-1w5step-v3
Waiting for inference service rica40325-mixdata-1w5step-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Tearing down inference service rica40325-mixdata-1w5step-v3
%s, retrying in %s seconds...
Creating inference service rica40325-mixdata-1w5step-v3
Waiting for inference service rica40325-mixdata-1w5step-v3 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Tearing down inference service rica40325-mixdata-1w5step-v3
clean up pipeline due to error=DeploymentError('Timeout to start the InferenceService rica40325-mixdata-1w5step-v3. The InferenceService is as following: {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'kind\': \'InferenceService\', \'metadata\': {\'annotations\': {\'autoscaling.knative.dev/class\': \'hpa.autoscaling.knative.dev\', \'autoscaling.knative.dev/container-concurrency-target-percentage\': \'70\', \'autoscaling.knative.dev/initial-scale\': \'1\', \'autoscaling.knative.dev/max-scale-down-rate\': \'1.1\', \'autoscaling.knative.dev/max-scale-up-rate\': \'2\', \'autoscaling.knative.dev/metric\': \'mean_pod_latency_ms_v2\', \'autoscaling.knative.dev/panic-threshold-percentage\': \'650\', \'autoscaling.knative.dev/panic-window-percentage\': \'35\', \'autoscaling.knative.dev/scale-down-delay\': \'30s\', \'autoscaling.knative.dev/scale-to-zero-grace-period\': \'10m\', \'autoscaling.knative.dev/stable-window\': \'180s\', \'autoscaling.knative.dev/target\': \'3700\', \'autoscaling.knative.dev/target-burst-capacity\': \'-1\', \'autoscaling.knative.dev/tick-interval\': \'15s\', \'features.knative.dev/http-full-duplex\': \'Enabled\', \'networking.knative.dev/ingress-class\': \'istio.ingress.networking.knative.dev\'}, \'creationTimestamp\': \'2024-12-06T06:33:56Z\', \'finalizers\': [\'inferenceservice.finalizers\'], \'generation\': 1, \'labels\': {\'knative.coreweave.cloud/ingress\': \'istio.ingress.networking.knative.dev\', \'prometheus.k.chaiverse.com\': \'true\', \'qos.coreweave.cloud/latency\': \'low\'}, \'managedFields\': [{\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:annotations\': {\'.\': {}, \'f:autoscaling.knative.dev/class\': {}, \'f:autoscaling.knative.dev/container-concurrency-target-percentage\': {}, \'f:autoscaling.knative.dev/initial-scale\': {}, \'f:autoscaling.knative.dev/max-scale-down-rate\': {}, \'f:autoscaling.knative.dev/max-scale-up-rate\': {}, \'f:autoscaling.knative.dev/metric\': {}, \'f:autoscaling.knative.dev/panic-threshold-percentage\': {}, \'f:autoscaling.knative.dev/panic-window-percentage\': {}, \'f:autoscaling.knative.dev/scale-down-delay\': {}, \'f:autoscaling.knative.dev/scale-to-zero-grace-period\': {}, \'f:autoscaling.knative.dev/stable-window\': {}, \'f:autoscaling.knative.dev/target\': {}, \'f:autoscaling.knative.dev/target-burst-capacity\': {}, \'f:autoscaling.knative.dev/tick-interval\': {}, \'f:features.knative.dev/http-full-duplex\': {}, \'f:networking.knative.dev/ingress-class\': {}}, \'f:labels\': {\'.\': {}, \'f:knative.coreweave.cloud/ingress\': {}, \'f:prometheus.k.chaiverse.com\': {}, \'f:qos.coreweave.cloud/latency\': {}}}, \'f:spec\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:affinity\': {\'.\': {}, \'f:nodeAffinity\': {\'.\': {}, \'f:tion\': {}, \'f:requiredDuringSchedulingIgnoredDuringExecution\': {}}}, \'f:containerConcurrency\': {}, \'f:containers\': {}, \'f:imagePullSecrets\': {}, \'f:maxReplicas\': {}, \'f:minReplicas\': {}, \'f:timeout\': {}, \'f:volumes\': {}}}}, \'manager\': \'OpenAPI-Generator\', \'operation\': \'Update\', \'time\': \'2024-12-06T06:33:55Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:metadata\': {\'f:finalizers\': {\'.\': {}, \'v:"inferenceservice.finalizers"\': {}}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'time\': \'2024-12-06T06:35:43Z\'}, {\'apiVersion\': \'serving.kserve.io/v1beta1\', \'fieldsType\': \'FieldsV1\', \'fieldsV1\': {\'f:status\': {\'.\': {}, \'f:components\': {\'.\': {}, \'f:predictor\': {\'.\': {}, \'f:latestCreatedRevision\': {}}}, \'f:conditions\': {}, \'f:modelStatus\': {\'.\': {}, \'f:lastFailureInfo\': {\'.\': {}, \'f:exitCode\': {}, \'f:message\': {}, \'f:reason\': {}}, \'f:states\': {\'.\': {}, \'f:activeModelState\': {}, \'f:targetModelState\': {}}, \'f:transitionStatus\': {}}, \'f:observedGeneration\': {}}}, \'manager\': \'manager\', \'operation\': \'Update\', \'subresource\': \'status\', \'time\': \'2024-12-06T06:38:35Z\'}], \'name\': \'rica40325-mixdata-1w5step-v3\', \'namespace\': \'tenant-chaiml-guanaco\', \'resourceVersion\': \'185487047\', \'uid\': \'d6f4e35b-1ca0-48f7-9be5-09e33578fe64\'}, \'spec\': {\'predictor\': {\'affinity\': {\'nodeAffinity\': {\'tion\': [{\'preference\': {\'matchExpressions\': [{\'key\': \'topology.kubernetes.io/region\', \'operator\': \'In\', \'values\': [\'ORD1\']}]}, \'weight\': 5}], \'requiredDuringSchedulingIgnoredDuringExecution\': {\'nodeSelectorTerms\': [{\'matchExpressions\': [{\'key\': \'gpu.nvidia.com/class\', \'operator\': \'In\', \'values\': [\'RTX_A5000\']}]}]}}}, \'containerConcurrency\': 0, \'containers\': [{\'env\': [{\'name\': \'MAX_TOKEN_INPUT\', \'value\': \'1024\'}, {\'name\': \'BEST_OF\', \'value\': \'8\'}, {\'name\': \'TEMPERATURE\', \'value\': \'0.95\'}, {\'name\': \'PRESENCE_PENALTY\', \'value\': \'0.0\'}, {\'name\': \'FREQUENCY_PENALTY\', \'value\': \'0.0\'}, {\'name\': \'TOP_P\', \'value\': \'0.95\'}, {\'name\': \'MIN_P\', \'value\': \'0.05\'}, {\'name\': \'TOP_K\', \'value\': \'80\'}, {\'name\': \'STOPPING_WORDS\', \'value\': \'["\\\\\\\\n"]\'}, {\'name\': \'MAX_TOKENS\', \'value\': \'64\'}, {\'name\': \'MAX_BATCH_SIZE\', \'value\': \'128\'}, {\'name\': \'URL_ROUTE\', \'value\': \'GPT-J-6B-lit-v2\'}, {\'name\': \'OBJ_ACCESS_KEY_ID\', \'value\': \'LETMTTRMLFFAMTBK\'}, {\'name\': \'OBJ_SECRET_ACCESS_KEY\', \'value\': \'VwwZaqefOOoaouNxUk03oUmK9pVEfruJhjBHPGdgycK\'}, {\'name\': \'OBJ_ENDPOINT\', \'value\': \'https://accel-object.ord1.coreweave.com\'}, {\'name\': \'TENSORIZER_URI\', \'value\': \'s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3\'}, {\'name\': \'RESERVE_MEMORY\', \'value\': \'2048\'}, {\'name\': \'DOWNLOAD_TO_LOCAL\', \'value\': \'/dev/shm/model_cache\'}, {\'name\': \'NUM_GPUS\', \'value\': \'1\'}, {\'name\': \'MK1_MKML_LICENSE_KEY\', \'valueFrom\': {\'secretKeyRef\': {\'key\': \'key\', \'name\': \'mkml-license-key\'}}}], \'image\': \'gcr.io/chai-959f8/chai-guanaco/mkml:mkml_v0.11.12_dg\', \'imagePullPolicy\': \'IfNotPresent\', \'name\': \'kserve-container\', \'readinessProbe\': {\'exec\': {\'command\': [\'cat\', \'/tmp/ready\']}, \'failureThreshold\': 1, \'initialDelaySeconds\': 10, \'periodSeconds\': 10, \'successThreshold\': 1, \'timeoutSeconds\': 5}, \'resources\': {\'limits\': {\'cpu\': \'2\', \'memory\': \'14Gi\', \'nvidia.com/gpu\': \'1\'}, \'requests\': {\'cpu\': \'2\', \'memory\': \'14Gi\', \'nvidia.com/gpu\': \'1\'}}, \'volumeMounts\': [{\'mountPath\': \'/dev/shm\', \'name\': \'shared-memory-cache\'}]}], \'imagePullSecrets\': [{\'name\': \'docker-creds\'}], \'maxReplicas\': 500, \'minReplicas\': 0, \'timeout\': 60, \'volumes\': [{\'emptyDir\': {\'medium\': \'Memory\'}, \'name\': \'shared-memory-cache\'}]}}, \'status\': {\'components\': {\'predictor\': {\'latestCreatedRevision\': \'rica40325-mixdata-1w5step-v3-predictor-00001\'}}, \'conditions\': [{\'lastTransitionTime\': \'2024-12-06T06:38:34Z\', \'reason\': \'PredictorConfigurationReady not ready\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'LatestDeploymentReady\'}, {\'lastTransitionTime\': \'2024-12-06T06:38:34Z\', \'message\': \'Revision "rica40325-mixdata-1w5step-v3-predictor-00001" failed with message: Container failed with: ��\\n║ ║\\n║ Chai Research Corp. ║\\n║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║\\n║ Expiration: 2025-01-15 23:59:59 ║\\n║ ║\\n╚═════════════════════════════════════════════════════════════════════╝\\n\\nINFO:datasets:PyTorch version 2.3.0 available.\\nInference config: InferenceConfig(server_num_workers=1, server_port=8080, max_batch_size=128, log_level=0, reserve_memory=2048, num_gpus=1, quantization_profile=s0, all_reduce_profile=None, kv_cache_profile=None, calibration_samples=-1, sampling=SamplingParameters(temperature=0.95, top_p=0.95, min_p=0.05, top_k=80, max_input_tokens=1024, max_tokens=64, stop=[\\\'\\\\n\\\'], eos_token_ids=[], frequency_penalty=0.0, presence_penalty=0.0, reward_enabled=True, num_samples=8, reward_max_token_input=256, drop_incomplete_sentences=True, profile=False), url_route=GPT-J-6B-lit-v2, tensorizer_uri=s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3, s3_creds=S3Credentials(s3_access_key_id=\\\'LETMTTRMLFFAMTBK\\\', s3_secret_access_key=\\\'VwwZaqefOOoaouNxUk03oUmK9pVEfruJhjBHPGdgycK\\\', s3_endpoint=\\\'https://accel-object.ord1.coreweave.com\\\', s3_uncached_endpoint=\\\'https://object.ord1.coreweave.com\\\'), local_folder=/dev/shm/model_cache)\\nTraceback (most recent call last):\\n File "/code/mkml_inference_service/main.py", line 95, in <module>\\n model.load()\\n File "/code/mkml_inference_service/main.py", line 31, in load\\n self.engine = mkml_backend.AsyncInferenceService.from_folder(settings, settings.local_folder)\\n File "/code/mkml_inference_service/mkml_backend.py", line 45, in from_folder\\n with open(model_config) as f:\\nFileNotFoundError: [Errno 2] No such file or directory: \\\'/dev/shm/model_cache/config.json\\\'\\n.\', \'reason\': \'RevisionFailed\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'PredictorConfigurationReady\'}, {\'lastTransitionTime\': \'2024-12-06T06:38:34Z\', \'message\': \'Configuration "rica40325-mixdata-1w5step-v3-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'status\': \'False\', \'type\': \'PredictorReady\'}, {\'lastTransitionTime\': \'2024-12-06T06:38:34Z\', \'message\': \'Configuration "rica40325-mixdata-1w5step-v3-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'PredictorRouteReady\'}, {\'lastTransitionTime\': \'2024-12-06T06:38:34Z\', \'message\': \'Configuration "rica40325-mixdata-1w5step-v3-predictor" does not have any ready Revision.\', \'reason\': \'RevisionMissing\', \'status\': \'False\', \'type\': \'Ready\'}, {\'lastTransitionTime\': \'2024-12-06T06:38:34Z\', \'reason\': \'PredictorRouteReady not ready\', \'severity\': \'Info\', \'status\': \'False\', \'type\': \'RoutesReady\'}], \'modelStatus\': {\'lastFailureInfo\': {\'exitCode\': 1, \'message\': \'��\\n║ ║\\n║ Chai Research Corp. ║\\n║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║\\n║ Expiration: 2025-01-15 23:59:59 ║\\n║ ║\\n╚═════════════════════════════════════════════════════════════════════╝\\n\\nINFO:datasets:PyTorch version 2.3.0 available.\\nInference config: InferenceConfig(server_num_workers=1, server_port=8080, max_batch_size=128, log_level=0, reserve_memory=2048, num_gpus=1, quantization_profile=s0, all_reduce_profile=None, kv_cache_profile=None, calibration_samples=-1, sampling=SamplingParameters(temperature=0.95, top_p=0.95, min_p=0.05, top_k=80, max_input_tokens=1024, max_tokens=64, stop=[\\\'\\\\n\\\'], eos_token_ids=[], frequency_penalty=0.0, presence_penalty=0.0, reward_enabled=True, num_samples=8, reward_max_token_input=256, drop_incomplete_sentences=True, profile=False), url_route=GPT-J-6B-lit-v2, tensorizer_uri=s3://guanaco-mkml-models/rica40325-mixdata-1w5step-v3, s3_creds=S3Credentials(s3_access_key_id=\\\'LETMTTRMLFFAMTBK\\\', s3_secret_access_key=\\\'VwwZaqefOOoaouNxUk03oUmK9pVEfruJhjBHPGdgycK\\\', s3_endpoint=\\\'https://accel-object.ord1.coreweave.com\\\', s3_uncached_endpoint=\\\'https://object.ord1.coreweave.com\\\'), local_folder=/dev/shm/model_cache)\\nTraceback (most recent call last):\\n File "/code/mkml_inference_service/main.py", line 95, in <module>\\n model.load()\\n File "/code/mkml_inference_service/main.py", line 31, in load\\n self.engine = mkml_backend.AsyncInferenceService.from_folder(settings, settings.local_folder)\\n File "/code/mkml_inference_service/mkml_backend.py", line 45, in from_folder\\n with open(model_config) as f:\\nFileNotFoundError: [Errno 2] No such file or directory: \\\'/dev/shm/model_cache/config.json\\\'\\n\', \'reason\': \'ModelLoadFailed\'}, \'states\': {\'activeModelState\': \'\', \'targetModelState\': \'FailedToLoad\'}, \'transitionStatus\': \'BlockedByFailedLoad\'}, \'observedGeneration\': 1}}')
Shutdown handler de-registered
rica40325-mixdata-1w5step_v3 status is now failed due to DeploymentManager action