Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLizer
Starting job with name alexdaoud-trainer-bagir-1353-v1-mkmlizer
Waiting for job on alexdaoud-trainer-bagir-1353-v1-mkmlizer to finish
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ╔═════════════════════════════════════════════════════════════════════╗
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ _____ __ __ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ / _/ /_ ___ __/ / ___ ___ / / ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ / _/ / // / |/|/ / _ \/ -_) -_) / ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ /_//_/\_, /|__,__/_//_/\__/\__/_/ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ /___/ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ Version: 0.11.12 ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ Copyright 2023 MK ONE TECHNOLOGIES Inc. ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ https://mk1.ai ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ The license key for the current software has been verified as ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ belonging to: ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ Chai Research Corp. ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ Account ID: 7997a29f-0ceb-4cc7-9adf-840c57b4ae6f ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ Expiration: 2025-01-15 23:59:59 ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ║ ║
alexdaoud-trainer-bagir-1353-v1-mkmlizer: ╚═════════════════════════════════════════════════════════════════════╝
alexdaoud-trainer-bagir-1353-v1-mkmlizer: Downloaded to shared memory in 31.646s
alexdaoud-trainer-bagir-1353-v1-mkmlizer: quantizing model to /dev/shm/model_cache, profile:t0, folder:/tmp/tmpubh_zjpm, device:0
alexdaoud-trainer-bagir-1353-v1-mkmlizer: Saving flywheel model at /dev/shm/model_cache
alexdaoud-trainer-bagir-1353-v1-mkmlizer: quantized model in 84.915s
alexdaoud-trainer-bagir-1353-v1-mkmlizer: Processed model alexdaoud/trainer_bagir_2024-12-11-checkpoint-60 in 116.561s
alexdaoud-trainer-bagir-1353-v1-mkmlizer: creating bucket guanaco-mkml-models
alexdaoud-trainer-bagir-1353-v1-mkmlizer: Bucket 's3://guanaco-mkml-models/' created
alexdaoud-trainer-bagir-1353-v1-mkmlizer: uploading /dev/shm/model_cache to s3://guanaco-mkml-models/alexdaoud-trainer-bagir-1353-v1
alexdaoud-trainer-bagir-1353-v1-mkmlizer: cp /dev/shm/model_cache/config.json s3://guanaco-mkml-models/alexdaoud-trainer-bagir-1353-v1/config.json
alexdaoud-trainer-bagir-1353-v1-mkmlizer: cp /dev/shm/model_cache/special_tokens_map.json s3://guanaco-mkml-models/alexdaoud-trainer-bagir-1353-v1/special_tokens_map.json
alexdaoud-trainer-bagir-1353-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer_config.json s3://guanaco-mkml-models/alexdaoud-trainer-bagir-1353-v1/tokenizer_config.json
alexdaoud-trainer-bagir-1353-v1-mkmlizer: cp /dev/shm/model_cache/tokenizer.json s3://guanaco-mkml-models/alexdaoud-trainer-bagir-1353-v1/tokenizer.json
alexdaoud-trainer-bagir-1353-v1-mkmlizer: cp /dev/shm/model_cache/flywheel_model.0.safetensors s3://guanaco-mkml-models/alexdaoud-trainer-bagir-1353-v1/flywheel_model.0.safetensors
Job alexdaoud-trainer-bagir-1353-v1-mkmlizer completed after 145.13s with status: succeeded
Stopping job with name alexdaoud-trainer-bagir-1353-v1-mkmlizer
Pipeline stage MKMLizer completed in 145.62s
run pipeline stage %s
Running pipeline stage MKMLTemplater
Pipeline stage MKMLTemplater completed in 0.15s
run pipeline stage %s
Running pipeline stage MKMLDeployer
Creating inference service alexdaoud-trainer-bagir-1353-v1
Waiting for inference service alexdaoud-trainer-bagir-1353-v1 to be ready
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Connection pool is full, discarding connection: %s. Connection pool size: %s
Inference service alexdaoud-trainer-bagir-1353-v1 ready after 210.82571482658386s
Pipeline stage MKMLDeployer completed in 211.37s
run pipeline stage %s
Running pipeline stage StressChecker
Received healthy response to inference request in 5.478979110717773s
Received healthy response to inference request in 4.783249616622925s
Received healthy response to inference request in 3.6250033378601074s
Received healthy response to inference request in 1.9718976020812988s
Received healthy response to inference request in 2.964050531387329s
5 requests
0 failed requests
5th percentile: 2.1703281879425047
10th percentile: 2.368758773803711
20th percentile: 2.7656199455261232
30th percentile: 3.096241092681885
40th percentile: 3.360622215270996
50th percentile: 3.6250033378601074
60th percentile: 4.0883018493652346
70th percentile: 4.551600360870361
80th percentile: 4.922395515441894
90th percentile: 5.200687313079834
95th percentile: 5.339833211898804
99th percentile: 5.45114993095398
mean time: 3.7646360397338867
%s, retrying in %s seconds...
Received healthy response to inference request in 3.458415985107422s
Received healthy response to inference request in 4.61032772064209s
Received healthy response to inference request in 3.647390604019165s
Received healthy response to inference request in 2.487088918685913s
Received healthy response to inference request in 5.386816740036011s
5 requests
0 failed requests
5th percentile: 2.6813543319702147
10th percentile: 2.8756197452545167
20th percentile: 3.2641505718231203
30th percentile: 3.4962109088897706
40th percentile: 3.5718007564544676
50th percentile: 3.647390604019165
60th percentile: 4.032565450668335
70th percentile: 4.417740297317505
80th percentile: 4.765625524520874
90th percentile: 5.076221132278443
95th percentile: 5.231518936157227
99th percentile: 5.355757179260254
mean time: 3.91800799369812
%s, retrying in %s seconds...
Received healthy response to inference request in 4.842776298522949s
Received healthy response to inference request in 3.63948392868042s
Received healthy response to inference request in 3.642622947692871s
Received healthy response to inference request in 3.671832323074341s
Received healthy response to inference request in 2.8814661502838135s
5 requests
0 failed requests
5th percentile: 3.033069705963135
10th percentile: 3.1846732616424562
20th percentile: 3.4878803730010985
30th percentile: 3.64011173248291
40th percentile: 3.6413673400878905
50th percentile: 3.642622947692871
60th percentile: 3.654306697845459
70th percentile: 3.665990447998047
80th percentile: 3.9060211181640625
90th percentile: 4.374398708343506
95th percentile: 4.6085875034332275
99th percentile: 4.795938539505005
mean time: 3.735636329650879
Pipeline stage StressChecker completed in 61.07s
run pipeline stage %s
Running pipeline stage OfflineFamilyFriendlyTriggerPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage OfflineFamilyFriendlyTriggerPipeline completed in 2.23s
run pipeline stage %s
Running pipeline stage TriggerMKMLProfilingPipeline
run_pipeline:run_in_cloud %s
starting trigger_guanaco_pipeline args=%s
triggered trigger_guanaco_pipeline args=%s
Pipeline stage TriggerMKMLProfilingPipeline completed in 2.07s
Shutdown handler de-registered
alexdaoud-trainer-bagir-_1353_v1 status is now deployed due to DeploymentManager action
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage MKMLProfilerDeleter completed in 0.13s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.11s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service alexdaoud-trainer-bagir-1353-v1-profiler
Waiting for inference service alexdaoud-trainer-bagir-1353-v1-profiler to be ready
Inference service alexdaoud-trainer-bagir-1353-v1-profiler ready after 210.47037720680237s
Pipeline stage MKMLProfilerDeployer completed in 210.84s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplozvf7k:/code/chaiverse_profiler_1734356296 --namespace tenant-chaiml-guanaco
kubectl exec -it alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplozvf7k --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1734356296 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1734356296/summary.json'
%s, retrying in %s seconds...
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplozvf7k:/code/chaiverse_profiler_1734359073 --namespace tenant-chaiml-guanaco
%s, retrying in %s seconds...
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplozvf7k:/code/chaiverse_profiler_1734359073 --namespace tenant-chaiml-guanaco
kubectl exec -it alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplozvf7k --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1734359073 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1734359073/summary.json'
Received signal 2, running shutdown handler
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service alexdaoud-trainer-bagir-1353-v1-profiler is running
Tearing down inference service alexdaoud-trainer-bagir-1353-v1-profiler
Service alexdaoud-trainer-bagir-1353-v1-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.24s
Shutdown handler de-registered
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service alexdaoud-trainer-bagir-1353-v1-profiler is running
Skipping teardown as no inference service was found
Pipeline stage MKMLProfilerDeleter completed in 2.35s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.12s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service alexdaoud-trainer-bagir-1353-v1-profiler
Waiting for inference service alexdaoud-trainer-bagir-1353-v1-profiler to be ready
Inference service alexdaoud-trainer-bagir-1353-v1-profiler ready after 110.25988626480103s
Pipeline stage MKMLProfilerDeployer completed in 110.58s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo48ht2:/code/chaiverse_profiler_1734359827 --namespace tenant-chaiml-guanaco
kubectl exec -it alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo48ht2 --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1734359827 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1734359827/summary.json'
%s, retrying in %s seconds...
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo48ht2:/code/chaiverse_profiler_1734362626 --namespace tenant-chaiml-guanaco
%s, retrying in %s seconds...
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo48ht2:/code/chaiverse_profiler_1734362627 --namespace tenant-chaiml-guanaco
kubectl exec -it alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo48ht2 --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1734362627 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1734362627/summary.json'
Received signal 2, running shutdown handler
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service alexdaoud-trainer-bagir-1353-v1-profiler is running
Tearing down inference service alexdaoud-trainer-bagir-1353-v1-profiler
Service alexdaoud-trainer-bagir-1353-v1-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.54s
Shutdown handler de-registered
Shutdown handler registered
run pipeline %s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service alexdaoud-trainer-bagir-1353-v1-profiler is running
Skipping teardown as no inference service was found
Pipeline stage MKMLProfilerDeleter completed in 2.57s
run pipeline stage %s
Running pipeline stage MKMLProfilerTemplater
Pipeline stage MKMLProfilerTemplater completed in 0.14s
run pipeline stage %s
Running pipeline stage MKMLProfilerDeployer
Creating inference service alexdaoud-trainer-bagir-1353-v1-profiler
Waiting for inference service alexdaoud-trainer-bagir-1353-v1-profiler to be ready
Inference service alexdaoud-trainer-bagir-1353-v1-profiler ready after 40.11048150062561s
Pipeline stage MKMLProfilerDeployer completed in 40.44s
run pipeline stage %s
Running pipeline stage MKMLProfilerRunner
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo2vxtc:/code/chaiverse_profiler_1734363387 --namespace tenant-chaiml-guanaco
kubectl exec -it alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo2vxtc --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1734363387 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1734363387/summary.json'
%s, retrying in %s seconds...
kubectl cp /code/guanaco/guanaco_inference_services/src/inference_scripts tenant-chaiml-guanaco/alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo2vxtc:/code/chaiverse_profiler_1734366163 --namespace tenant-chaiml-guanaco
kubectl exec -it alexdaoud-trainer-ba1f6138763cf21d7852dcfaa5cf6dc156-deplo2vxtc --namespace tenant-chaiml-guanaco -- sh -c 'cd /code/chaiverse_profiler_1734366163 && python profiles.py profile --best_of_n 1 --auto_batch 5 --batches 1,5,10,15,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100,105,110,115,120,125,130,135,140,145,150,155,160,165,170,175,180,185,190,195 --samples 200 --input_tokens 256 --output_tokens 1 --summary /code/chaiverse_profiler_1734366163/summary.json'
Received signal 2, running shutdown handler
run pipeline stage %s
Running pipeline stage MKMLProfilerDeleter
Checking if service alexdaoud-trainer-bagir-1353-v1-profiler is running
Tearing down inference service alexdaoud-trainer-bagir-1353-v1-profiler
Service alexdaoud-trainer-bagir-1353-v1-profiler has been torndown
Pipeline stage MKMLProfilerDeleter completed in 2.46s
Shutdown handler de-registered
alexdaoud-trainer-bagir-_1353_v1 status is now inactive due to auto deactivation removed underperforming models