device Issues on Macbook (M1) (mps) on Transformers code

I've encountered several issues while trying to run the code on a Macbook M1 Pro.

############### Issues and Trying to solve it - Start ###############

### Issue 01 - Error while trying to train the model (Partially Solved)

Error shown: 

`ValueError: fp16 mixed precision requires a GPU (not 'mps').`

Partial solution:

The workaround was the following:

* Change the fp16 from True to False on VisionClassifierTrainer (full code displayed at the end)

`fp16 = False`

* Add the following piece of code to ViTForImageClassification.from_pretrained (full code displayed at the end)

`ignore_mismatched_sizes=True`

The training happened without any errors shown after that code modification but the issues persisted.

A warning was shown then training began.
`"Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']`

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference."

### Issue 02 - Error while trying to evaluate using F1 Score (Unsolved)

After training the model and trying to use the following code the error appeared

```Python
#%% Model Evaluation
y_true, y_pred = trainer.evaluate_f1_score()
```

Error shown: 
`RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')  must be on the same device`

I already tried to change the device type on various ways to 'mps' and then train the model I did also tried to change the device type to 'cpu' and the error persisted on both occasions (for device mps and device cpu).

I tried all of these before training the model and none of them worked

```Python
device = torch.device('mps')
torch.set_default_device("mps")
torch.device('cpu')
device = torch.device('cuda' if torch.cuda.is_available() else 'mps')
```

I tried this as well before training the model and did not work as well

```Python
if torch.cuda.is_available():
    device = torch.device("cuda")
elif (
    hasattr(torch.backends, "mps")
    and torch.backends.mps.is_available()
    and torch.backends.mps.is_built()
):
    device = torch.device("mps")
else:
    device = torch.device("cpu")
```
I tried changing the model using [trainer.to](https://trainer.to/)(device) and did not work
############### Issues and Trying to solve it - End ###############

More information about my installation

```Python
!transformers-cli env
```

```Markdown
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 4.45.2
- Platform: macOS-15.0.1-arm64-arm-64bit
- Python version: 3.10.14
- Huggingface_hub version: 0.26.1
- Safetensors version: 0.4.5
- Accelerate version: 1.0.1
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.4.1 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: <fill in>
```

```Python
import platform
platform.platform()
```
```Markdown
'macOS-15.0.1-arm64-arm-64bit'
```

Not sure how to put both the device and weight onto the same device so the computation can be performed
```Markdown
input(device='cpu') and weight(device=mps:0')  must be on the same device
```


Here is the code that I used with modifications
```Python
#%%
# source: https://medium.com/@yanis.labrak/how-to-train-a-custom-vision-transformer-vit-image-classifier-to-help-endoscopists-in-under-5-min-2e7e4110a353
import pandas
import torch

#%% Packages
from hugsvision.dataio.VisionDataset import VisionDataset
from hugsvision.nnet.VisionClassifierTrainer import VisionClassifierTrainer
from transformers import ViTFeatureExtractor, ViTForImageClassification
from transformers import ViTImageProcessor### new
from hugsvision.inference.VisionClassifierInference import VisionClassifierInference

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix


if torch.cuda.is_available():
    device = torch.device("cuda")
elif (
    hasattr(torch.backends, "mps")
    and torch.backends.mps.is_available()
    and torch.backends.mps.is_built()
):
    device = torch.device("mps")
else:
    device = torch.device("cpu")


print(torch.backends.mps.is_available())
print(torch.backends.mps.is_built())

## Data Preparation
##id2label
## label2id

## Using 10% of the train data for validation

train, val, id2label, label2id = VisionDataset.fromImageFolder(
    "/Users/.../300_Transformers/data/train/",
    test_ratio   = 0.1,
    balanced     = True,
    augmentation = True, 
    torch_vision = False,
)



huggingface_model = 'google/vit-base-patch16-224-in21k'


trainer = VisionClassifierTrainer(
	model_name   = "MyDogClassifier",
	train        = train,
	test         = val,
	#output_dir   = "./out/",
    output_dir="/Users/.../300_Transformers/out/",
	max_epochs   = 2,
	batch_size   = 4, 
	lr	     = 2e-5,
	fp16	     = False,
	model = ViTForImageClassification.from_pretrained(
	    huggingface_model,
	    num_labels = len(label2id),
	    label2id   = label2id,
	    id2label   = id2label,
        #ignore_mismatched_sizes=True
	).to(device),
	#feature_extractor = ViTFeatureExtractor.from_pretrained(
		#huggingface_model,
	#),
    feature_extractor= ViTImageProcessor.from_pretrained(huggingface_model,
                                                         )
)

## Error
#%% Model Evaluation
y_true, y_pred = trainer.evaluate_f1_score()
#RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')  must be on the same device

```

At this point I am not sure what needs to be done to run this piece of code on a Macbook M1 Pro.
Note: I ran the code with 2 epochs on Colab and it worked, so does anyone has any idea on how to solve this issue on a Mac M1 (Silicon)?

- Also how can I save the trained model's weights with state_dict(), instantiate a new model instant to load the model and use the to(device). That might help but I could not save the state_dict().



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device Issues on Macbook (M1) (mps) on Transformers code #8

Issue 01 - Error while trying to train the model (Partially Solved)

Issue 02 - Error while trying to evaluate using F1 Score (Unsolved)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

device Issues on Macbook (M1) (mps) on Transformers code #8

Description

Issue 01 - Error while trying to train the model (Partially Solved)

Issue 02 - Error while trying to evaluate using F1 Score (Unsolved)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions