Skip to content

device Issues on Macbook (M1) (mps) on Transformers code #8

@Luismbpr

Description

@Luismbpr

I've encountered several issues while trying to run the code on a Macbook M1 Pro.

############### Issues and Trying to solve it - Start ###############

Issue 01 - Error while trying to train the model (Partially Solved)

Error shown:

ValueError: fp16 mixed precision requires a GPU (not 'mps').

Partial solution:

The workaround was the following:

  • Change the fp16 from True to False on VisionClassifierTrainer (full code displayed at the end)

fp16 = False

  • Add the following piece of code to ViTForImageClassification.from_pretrained (full code displayed at the end)

ignore_mismatched_sizes=True

The training happened without any errors shown after that code modification but the issues persisted.

A warning was shown then training began.
"Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference."

Issue 02 - Error while trying to evaluate using F1 Score (Unsolved)

After training the model and trying to use the following code the error appeared

#%% Model Evaluation
y_true, y_pred = trainer.evaluate_f1_score()

Error shown:
RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0') must be on the same device

I already tried to change the device type on various ways to 'mps' and then train the model I did also tried to change the device type to 'cpu' and the error persisted on both occasions (for device mps and device cpu).

I tried all of these before training the model and none of them worked

device = torch.device('mps')
torch.set_default_device("mps")
torch.device('cpu')
device = torch.device('cuda' if torch.cuda.is_available() else 'mps')

I tried this as well before training the model and did not work as well

if torch.cuda.is_available():
    device = torch.device("cuda")
elif (
    hasattr(torch.backends, "mps")
    and torch.backends.mps.is_available()
    and torch.backends.mps.is_built()
):
    device = torch.device("mps")
else:
    device = torch.device("cpu")

I tried changing the model using trainer.to(device) and did not work
############### Issues and Trying to solve it - End ###############

More information about my installation

!transformers-cli env
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 4.45.2
- Platform: macOS-15.0.1-arm64-arm-64bit
- Python version: 3.10.14
- Huggingface_hub version: 0.26.1
- Safetensors version: 0.4.5
- Accelerate version: 1.0.1
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.4.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
import platform
platform.platform()
'macOS-15.0.1-arm64-arm-64bit'

Not sure how to put both the device and weight onto the same device so the computation can be performed

input(device='cpu') and weight(device=mps:0')  must be on the same device

Here is the code that I used with modifications

#%%
# source: https://medium.com/@yanis.labrak/how-to-train-a-custom-vision-transformer-vit-image-classifier-to-help-endoscopists-in-under-5-min-2e7e4110a353
import pandas
import torch

#%% Packages
from hugsvision.dataio.VisionDataset import VisionDataset
from hugsvision.nnet.VisionClassifierTrainer import VisionClassifierTrainer
from transformers import ViTFeatureExtractor, ViTForImageClassification
from transformers import ViTImageProcessor### new
from hugsvision.inference.VisionClassifierInference import VisionClassifierInference

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix


if torch.cuda.is_available():
    device = torch.device("cuda")
elif (
    hasattr(torch.backends, "mps")
    and torch.backends.mps.is_available()
    and torch.backends.mps.is_built()
):
    device = torch.device("mps")
else:
    device = torch.device("cpu")


print(torch.backends.mps.is_available())
print(torch.backends.mps.is_built())

## Data Preparation
##id2label
## label2id

## Using 10% of the train data for validation

train, val, id2label, label2id = VisionDataset.fromImageFolder(
    "/Users/.../300_Transformers/data/train/",
    test_ratio   = 0.1,
    balanced     = True,
    augmentation = True, 
    torch_vision = False,
)



huggingface_model = 'google/vit-base-patch16-224-in21k'


trainer = VisionClassifierTrainer(
	model_name   = "MyDogClassifier",
	train        = train,
	test         = val,
	#output_dir   = "./out/",
    output_dir="/Users/.../300_Transformers/out/",
	max_epochs   = 2,
	batch_size   = 4, 
	lr	     = 2e-5,
	fp16	     = False,
	model = ViTForImageClassification.from_pretrained(
	    huggingface_model,
	    num_labels = len(label2id),
	    label2id   = label2id,
	    id2label   = id2label,
        #ignore_mismatched_sizes=True
	).to(device),
	#feature_extractor = ViTFeatureExtractor.from_pretrained(
		#huggingface_model,
	#),
    feature_extractor= ViTImageProcessor.from_pretrained(huggingface_model,
                                                         )
)

## Error
#%% Model Evaluation
y_true, y_pred = trainer.evaluate_f1_score()
#RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')  must be on the same device

At this point I am not sure what needs to be done to run this piece of code on a Macbook M1 Pro.
Note: I ran the code with 2 epochs on Colab and it worked, so does anyone has any idea on how to solve this issue on a Mac M1 (Silicon)?

  • Also how can I save the trained model's weights with state_dict(), instantiate a new model instant to load the model and use the to(device). That might help but I could not save the state_dict().

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions