Running pipeline stage VLLMizer
Starting job with name nousresearch-meta-llama-4941-v10-vllmizer
Waiting for job on nousresearch-meta-llama-4941-v10-vllmizer to finish
nousresearch-meta-llama-4941-v10-vllmizer: Downloading and saving tokenizer from NousResearch/Meta-Llama-3-8B-Instruct
nousresearch-meta-llama-4941-v10-vllmizer: /usr/local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py:655: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
nousresearch-meta-llama-4941-v10-vllmizer: warnings.warn(
nousresearch-meta-llama-4941-v10-vllmizer: Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
nousresearch-meta-llama-4941-v10-vllmizer: Downloading and saving tokenizer from ChaiML/reward_gpt2_medium_preference_24m_e2
nousresearch-meta-llama-4941-v10-vllmizer: Downloading and saving model from NousResearch/Meta-Llama-3-8B-Instruct
nousresearch-meta-llama-4941-v10-vllmizer: /usr/local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
nousresearch-meta-llama-4941-v10-vllmizer: warnings.warn(
nousresearch-meta-llama-4941-v10-vllmizer:
Downloading shards: 0%| | 0/4 [00:00<?, ?it/s]
Downloading shards: 25%|██▌ | 1/4 [00:04<00:13, 4.63s/it]
Downloading shards: 50%|█████ | 2/4 [00:09<00:09, 4.71s/it]
Downloading shards: 75%|███████▌ | 3/4 [00:14<00:04, 4.82s/it]
Downloading shards: 100%|██████████| 4/4 [00:16<00:00, 3.58s/it]
Downloading shards: 100%|██████████| 4/4 [00:16<00:00, 4.00s/it]
nousresearch-meta-llama-4941-v10-vllmizer:
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:00<00:01, 1.54it/s]
Loading checkpoint shards: 50%|█████ | 2/4 [00:01<00:01, 1.47it/s]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:02<00:00, 1.43it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 1.93it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:02<00:00, 1.73it/s]
nousresearch-meta-llama-4941-v10-vllmizer: /usr/local/lib/python3.8/site-packages/transformers/utils/hub.py:374: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
nousresearch-meta-llama-4941-v10-vllmizer: warnings.warn(
nousresearch-meta-llama-4941-v10-vllmizer: /usr/local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py:472: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers.
nousresearch-meta-llama-4941-v10-vllmizer: warnings.warn(
nousresearch-meta-llama-4941-v10-vllmizer: Downloading and saving model from ChaiML/reward_gpt2_medium_preference_24m_e2
nousresearch-meta-llama-4941-v10-vllmizer: Tensorizing model to /mnt/pvc/nousresearch-meta-llama-4941-v10/nousresearch-meta-llama-4941-v10.tensors
Failed to get response for submission blend_milat_2024-02-09: ('http://khoantap-noromaid-test-v5-predictor-default.tenant-chaiml-guanaco.knative.ord1.coreweave.cloud/v1/models/GPT-J-6B-lit-v2:predict', 'read tcp 127.0.0.1:46280->127.0.0.1:8080: read: connection reset by peer\n')
nousresearch-meta-llama-4941-v10-vllmizer: Tensorizing model to /mnt/pvc/nousresearch-meta-llama-4941-v10/reward/nousresearch-meta-llama-4941-v10-reward.tensors
nousresearch-meta-llama-4941-v10-vllmizer: Uploading /mnt/pvc/nousresearch-meta-llama-4941-v10 to guanaco-model-bucket
nousresearch-meta-llama-4941-v10-vllmizer: creating bucket guanaco-model-bucket
nousresearch-meta-llama-4941-v10-vllmizer: Bucket 's3://guanaco-model-bucket/' created
nousresearch-meta-llama-4941-v10-vllmizer: fast uploading tensorized model tensors from /mnt/pvc/nousresearch-meta-llama-4941-v10
nousresearch-meta-llama-4941-v10-vllmizer: cp /mnt/pvc/nousresearch-meta-llama-4941-v10/nousresearch-meta-llama-4941-v10.tensors s3://guanaco-model-bucket/nousresearch-meta-llama-4941-v10.tensors
nousresearch-meta-llama-4941-v10-vllmizer: fast uploading tensorized reward tensors from /mnt/pvc/nousresearch-meta-llama-4941-v10
nousresearch-meta-llama-4941-v10-vllmizer: cp /mnt/pvc/nousresearch-meta-llama-4941-v10/reward/nousresearch-meta-llama-4941-v10-reward.tensors s3://guanaco-model-bucket/nousresearch-meta-llama-4941-v10-reward.tensors
Job nousresearch-meta-llama-4941-v10-vllmizer completed after 306.96s with status: succeeded
Stopping job with name nousresearch-meta-llama-4941-v10-vllmizer
Pipeline stage VLLMizer completed in 311.38s
Running pipeline stage VLLMKubeTemplater
Pipeline stage VLLMKubeTemplater completed in 0.11s
Running pipeline stage ISVCDeployer
Creating inference service nousresearch-meta-llama-4941-v10
Waiting for inference service nousresearch-meta-llama-4941-v10 to be ready
Inference service nousresearch-meta-llama-4941-v10 ready after 60.46896696090698s
Pipeline stage ISVCDeployer completed in 67.76s
Running pipeline stage StressChecker
Received healthy response to inference request in 3.6669046878814697s
Received healthy response to inference request in 3.479057788848877s
Received healthy response to inference request in 3.844785213470459s
Received healthy response to inference request in 3.4304885864257812s
Received healthy response to inference request in 3.6567656993865967s
5 requests
0 failed requests
5th percentile: 3.4402024269104006
10th percentile: 3.4499162673950194
20th percentile: 3.4693439483642576
30th percentile: 3.5145993709564207
40th percentile: 3.5856825351715087
50th percentile: 3.6567656993865967
60th percentile: 3.660821294784546
70th percentile: 3.6648768901824953
80th percentile: 3.7024807929992676
90th percentile: 3.7736330032348633
95th percentile: 3.809209108352661
99th percentile: 3.8376699924468993
mean time: 3.6156003952026365
Pipeline stage StressChecker completed in 18.73s
Running pipeline stage DaemonicModelEvalScorer
Pipeline stage DaemonicModelEvalScorer completed in 0.03s
Running pipeline stage DaemonicSafetyScorer
Running M-Eval for topic stay_in_character
Pipeline stage DaemonicSafetyScorer completed in 0.04s
M-Eval Dataset for topic stay_in_character is loaded
nousresearch-meta-llama_4941_v10 status is now deployed due to DeploymentManager action
%s, retrying in %s seconds...
nousresearch-meta-llama_4941_v10 status is now inactive due to auto deactivation removed underperforming models
nousresearch-meta-llama_4941_v10 status is now inactive due to auto deactivation removed underperforming models
nousresearch-meta-llama_4941_v10 status is now inactive due to auto deactivation removed underperforming models
nousresearch-meta-llama_4941_v10 status is now inactive due to auto deactivation removed underperforming models
admin requested tearing down of nousresearch-meta-llama_4941_v10
Running pipeline stage ISVCDeleter
Checking if service nousresearch-meta-llama-4941-v10 is running
Tearing down inference service nousresearch-meta-llama-4941-v10
Toredown service nousresearch-meta-llama-4941-v10
Pipeline stage ISVCDeleter completed in 8.00s
Running pipeline stage ModelDeleter
Cleaning model data from PVC
Starting job with name nousresearch-meta-llama-4941-v10-pvc-cleaner
Waiting for job on nousresearch-meta-llama-4941-v10-pvc-cleaner to finish
Job nousresearch-meta-llama-4941-v10-pvc-cleaner completed after 20.76s with status: succeeded
Stopping job with name nousresearch-meta-llama-4941-v10-pvc-cleaner
Cleaning model data from model cache
Deleting key nousresearch-meta-llama-4941-v10.tensors from bucket guanaco-model-bucket
Cleaning model data from model cache
Deleting key nousresearch-meta-llama-4941-v10-reward.tensors from bucket guanaco-model-bucket
Pipeline stage ModelDeleter completed in 24.77s
nousresearch-meta-llama_4941_v10 status is now torndown due to DeploymentManager action