Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMUploader
Starting job with name chaiml-mega-d1-pv3-q27b-1650-v3-uploader
Waiting for job on chaiml-mega-d1-pv3-q27b-1650-v3-uploader to finish
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Using quantization_mode: fp8
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Checking if ChaiML/mega-d1-pv3-q27b-lr5e6ep1g4-FP8 already exists in ChaiML
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Downloading snapshot of ChaiML/mega-d1-pv3-q27b-lr5e6ep1g4...
2026-03-28T16:46:58.195872+00:00 monitor updated for chaiml-mega-d1-pv3-q27b-_1650_v3
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Downloaded in 25.208s
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Loading /tmp/model_input...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Applying quantization...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:09.746753+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:12.259794+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:12.263373+0000 | norm_calibration_context | INFO - Found 161 offset-norm modules to convert
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:12.272007+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:12.318879+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:12.319125+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:12.331593+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:19.388775+0000 | norm_calibration_context | INFO - Restoring 161 norm modules to offset convention
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:20.151353+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:47:20.151568+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Saving to /dev/shm/model_output...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: warnings.warn(
2026-03-28T16:47:58.289643+00:00 monitor updated for chaiml-mega-d1-pv3-q27b-_1650_v3
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Updating config in /dev/shm/model_output
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Traceback (most recent call last):
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 344, in <module>
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: cli()
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1485, in __call__
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return self.main(*args, **kwargs)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1406, in main
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: rv = self.invoke(ctx)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1873, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return _process_result(sub_ctx.command.invoke(sub_ctx))
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1269, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return ctx.invoke(self.callback, **ctx.params)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 824, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return callback(*args, **kwargs)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 54, in process
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: quantize_fp8(repo_id, download_path, output_path, revision, hf_auth_token)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 223, in quantize_fp8
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: update_fp8_config(model_arch, output_path, ignore_layers)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: TypeError: update_fp8_config() takes 2 positional arguments but 3 were given
Job chaiml-mega-d1-pv3-q27b-1650-v3-uploader completed after 133.89s with status: failed
Job failed chaiml-mega-d1-pv3-q27b-1650-v3-uploader:
Stopping job with name chaiml-mega-d1-pv3-q27b-1650-v3-uploader
%s, retrying in %s seconds...
Starting job with name chaiml-mega-d1-pv3-q27b-1650-v3-uploader
Waiting for job on chaiml-mega-d1-pv3-q27b-1650-v3-uploader to finish
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Using quantization_mode: fp8
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Checking if ChaiML/mega-d1-pv3-q27b-lr5e6ep1g4-FP8 already exists in ChaiML
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Downloading snapshot of ChaiML/mega-d1-pv3-q27b-lr5e6ep1g4...
2026-03-28T16:48:58.392431+00:00 monitor updated for chaiml-mega-d1-pv3-q27b-_1650_v3
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Downloaded in 21.180s
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Loading /tmp/model_input...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Applying quantization...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:03.654288+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:05.715397+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:05.719099+0000 | norm_calibration_context | INFO - Found 161 offset-norm modules to convert
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:05.728106+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:05.776078+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:05.776324+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:05.788804+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:12.724509+0000 | norm_calibration_context | INFO - Restoring 161 norm modules to offset convention
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:13.384325+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:49:13.384491+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Saving to /dev/shm/model_output...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: warnings.warn(
2026-03-28T16:49:58.491642+00:00 monitor updated for chaiml-mega-d1-pv3-q27b-_1650_v3
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Updating config in /dev/shm/model_output
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Traceback (most recent call last):
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 344, in <module>
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: cli()
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1485, in __call__
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return self.main(*args, **kwargs)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1406, in main
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: rv = self.invoke(ctx)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1873, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return _process_result(sub_ctx.command.invoke(sub_ctx))
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1269, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return ctx.invoke(self.callback, **ctx.params)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 824, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return callback(*args, **kwargs)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 54, in process
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: quantize_fp8(repo_id, download_path, output_path, revision, hf_auth_token)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 223, in quantize_fp8
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: update_fp8_config(model_arch, output_path, ignore_layers)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: TypeError: update_fp8_config() takes 2 positional arguments but 3 were given
Job chaiml-mega-d1-pv3-q27b-1650-v3-uploader completed after 115.12s with status: failed
Job failed chaiml-mega-d1-pv3-q27b-1650-v3-uploader:
Stopping job with name chaiml-mega-d1-pv3-q27b-1650-v3-uploader
%s, retrying in %s seconds...
Starting job with name chaiml-mega-d1-pv3-q27b-1650-v3-uploader
Waiting for job on chaiml-mega-d1-pv3-q27b-1650-v3-uploader to finish
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Using quantization_mode: fp8
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Checking if ChaiML/mega-d1-pv3-q27b-lr5e6ep1g4-FP8 already exists in ChaiML
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Downloading snapshot of ChaiML/mega-d1-pv3-q27b-lr5e6ep1g4...
Failed to get response for submission chaiml-pony-d3-g46-pv2-l_7830_v2: ('http://chaiml-pony-d3-g46-pv2-l-7830-v2-predictor.tenant-chaiml-guanaco.k2.chaiverse.com/v1/completions', 'activator request timeout')
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Downloaded in 23.789s
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Loading /tmp/model_input...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: The fast path is not available because one of the required library is not installed. Falling back to torch implementation. To install follow https://github.com/fla-org/flash-linear-attention#installation and https://github.com/Dao-AILab/causal-conv1d
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Applying quantization...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:47.549909+0000 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
2026-03-28T16:50:58.609194+00:00 monitor updated for chaiml-mega-d1-pv3-q27b-_1650_v3
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:49.784840+0000 | reset | INFO - Compression lifecycle reset
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:49.788914+0000 | norm_calibration_context | INFO - Found 161 offset-norm modules to convert
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:49.798161+0000 | from_modifiers | INFO - Creating recipe from modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:49.970372+0000 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:49.971148+0000 | IndependentPipeline | INFO - Inferred `DataFreePipeline` for `QuantizationModifier`
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:50.009053+0000 | dispatch_model | WARNING - Forced to offload modules due to insufficient gpu resources
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:57.029586+0000 | norm_calibration_context | INFO - Restoring 161 norm modules to offset convention
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:57.690680+0000 | finalize | INFO - Compression lifecycle finalized for 1 modifiers
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: 2026-03-28T16:50:57.690856+0000 | post_process | WARNING - Optimized model is not saved. To save, please provide`output_dir` as input arg.Ex. `oneshot(..., output_dir=...)`
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Saving to /dev/shm/model_output...
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: /usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py:3344: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (50GB default)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: warnings.warn(
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Updating config in /dev/shm/model_output
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: Traceback (most recent call last):
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 344, in <module>
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: cli()
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1485, in __call__
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return self.main(*args, **kwargs)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1406, in main
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: rv = self.invoke(ctx)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1873, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return _process_result(sub_ctx.command.invoke(sub_ctx))
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1269, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return ctx.invoke(self.callback, **ctx.params)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 824, in invoke
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: return callback(*args, **kwargs)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: ^^^^^^^^^^^^^^^^^^^^^^^^^
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 54, in process
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: quantize_fp8(repo_id, download_path, output_path, revision, hf_auth_token)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: File "/code/uploading/compress.py", line 223, in quantize_fp8
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: update_fp8_config(model_arch, output_path, ignore_layers)
chaiml-mega-d1-pv3-q27b-1650-v3-uploader: TypeError: update_fp8_config() takes 2 positional arguments but 3 were given
Job chaiml-mega-d1-pv3-q27b-1650-v3-uploader completed after 104.05s with status: failed
Job failed chaiml-mega-d1-pv3-q27b-1650-v3-uploader:
Stopping job with name chaiml-mega-d1-pv3-q27b-1650-v3-uploader
clean up pipeline due to error=VLLMUploaderError('')
run pipeline stage %s
Running pipeline stage VLLMDeleter
Skipping teardown as no inference service was successfully deployed
Pipeline stage VLLMDeleter completed in 0.21s
run pipeline stage %s
Running pipeline stage VLLMModelDeleter
Cleaning model data from S3
Pipeline stage VLLMModelDeleter completed in 0.23s
Shutdown handler de-registered
chaiml-mega-d1-pv3-q27b-_1650_v3 status is now failed due to DeploymentManager action
admin requested tearing down of chaiml-mega-d1-pv3-q27b-_1650_v3
Shutdown handler not registered because Python interpreter is not running in the main thread
run pipeline %s
run pipeline stage %s
Running pipeline stage VLLMDeleter
Skipping teardown as no inference service was successfully deployed
chaiml-mega-d1-pv3-q27b-_1650_v3 status is now torndown due to DeploymentManager action