You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 28, 2025. It is now read-only.
# scheduled, multi-phase, fine-tuning of foundation models. Gradual unfreezing (i.e. thawing) can help maximize
25
25
# foundation model knowledge retention while allowing (typically upper layers of) the model to
26
-
# optimally adapt to new tasks during transfer learning [1, 2, 3](#f1)
26
+
# optimally adapt to new tasks during transfer learning [1, 2, 3]
27
27
#
28
28
# </div>
29
29
#
@@ -42,10 +42,8 @@
42
42
#
43
43
# ## Basic Usage
44
44
#
45
-
# <div id="basic_usage">
46
-
#
47
45
# If no fine-tuning schedule is provided by the user, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) will generate a
48
-
# [default schedule](#The-Default-Finetuning-Schedule) and proceed to fine-tune according to the generated schedule,
46
+
# [default schedule](#The-Default-Fine-Tuning-Schedule) and proceed to fine-tune according to the generated schedule,
49
47
# using default [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) and [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) callbacks with ``monitor=val_loss``.
50
48
#
51
49
# </div>
@@ -111,7 +109,7 @@
111
109
#
112
110
#
113
111
#
114
-
# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/).
112
+
# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#Scheduled-Fine-Tuning-For-SuperGLUE)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/).
115
113
# Please see the [official Fine-Tuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#example-scheduled-fine-tuning-for-superglue) using the LightningCLI.
116
114
117
115
# %% [markdown]
@@ -158,8 +156,6 @@
158
156
# </div>
159
157
160
158
# %% [markdown]
161
-
# <div id="superglue"></div>
162
-
#
163
159
# ## Scheduled Fine-Tuning For SuperGLUE
164
160
#
165
161
# The following example demonstrates the use of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). Iterative early-stopping will be applied according to a user-specified schedule.
# Though other optimizers can arguably yield some marginal advantage contingent on the context,
458
452
# the Adam optimizer (and the [AdamW version](https://pytorch.org/docs/stable/_modules/torch/optim/adamw.html#AdamW) which
459
453
# implements decoupled weight decay) remains robust to hyperparameter choices and is commonly used for fine-tuning
460
-
# foundation language models. See [(Sivaprasad et al., 2020)](#f2) and [(Mosbach, Andriushchenko & Klakow, 2020)](#f3) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning
454
+
# foundation language models. See (Sivaprasad et al., 2020) and (Mosbach, Andriushchenko & Klakow, 2020) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning
461
455
# large transformer-based language models. The values used here have some justification
462
456
# in the referenced literature but have been largely empirically determined and while a good
# The [CosineAnnealingWarmRestarts scheduler](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html?highlight=cosineannealingwarm#torch.optim.lr_scheduler.CosineAnnealingWarmRestarts) nicely fits with our iterative fine-tuning since it does not depend upon a global max_epoch
476
-
# value. The importance of initial warmup is reduced due to the innate warmup effect of Adam bias correction [[5]](#f3)
468
+
# value. The importance of initial warmup is reduced due to the innate warmup effect of Adam bias correction [5]
477
469
# and the gradual thawing we are performing. Note that commonly used LR schedulers that depend on providing
# used in other pytorch-lightning tutorials) also work with FinetuningScheduler. Though the LR scheduler is theoretically
481
-
# justified [(Loshchilov & Hutter, 2016)](#f4), the particular values provided here are primarily empircally driven.
473
+
# justified (Loshchilov & Hutter, 2016), the particular values provided here are primarily empircally driven.
482
474
#
483
475
# [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) also supports both optimizer and LR scheduler
484
476
# reinitialization in explicit and implicit finetuning schedule modes. See the advanced usage documentation ([LR scheduler reinitialization](https://finetuning-scheduler.readthedocs.io/en/stable/advanced/lr_scheduler_reinitialization.html), [optimizer reinitialization](https://finetuning-scheduler.readthedocs.io/en/stable/advanced/optimizer_reinitialization.html)) for explanations and demonstration of the extension's support for more complex requirements.
# The only callback required to invoke the [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) is the [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) callback itself.
504
496
# Default versions of [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) and [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping)
505
-
# (if not specifying ``epoch_only_transitions``) will be included ([as discussed above](#basic_usage)) if not provided
497
+
# (if not specifying ``epoch_only_transitions``) will be included ([as discussed above](#Basic-Usage)) if not provided
506
498
# in the callbacks list. For demonstration purposes I'm including example configurations of all three callbacks below.
507
499
508
500
# %%
@@ -622,47 +614,18 @@ def train() -> None:
622
614
# %% [markdown]
623
615
# ## Footnotes
624
616
#
625
-
# <ol>
626
-
# <li id="f1">
627
-
#
628
-
# [Howard, J., & Ruder, S. (2018)](https://arxiv.org/pdf/1801.06146.pdf). Fine-tuned Language
629
-
# Models for Text Classification. ArXiv, abs/1801.06146. [↩](#a1)
630
-
#
631
-
# </li>
632
-
# <li>
633
-
#
634
-
# [Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019)](https://arxiv.org/pdf/1902.10547.pdf).
617
+
# - [Howard, J., & Ruder, S. (2018)](https://arxiv.org/pdf/1801.06146.pdf). Fine-tuned Language
618
+
# Models for Text Classification. ArXiv, abs/1801.06146. [↩](#Scheduled-Fine-Tuning-with-the-Fine-Tuning-Scheduler-Extension)
619
+
# - [Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019)](https://arxiv.org/pdf/1902.10547.pdf).
635
620
# An embarrassingly simple approach for transfer learning from pretrained language models. arXiv
636
-
# preprint arXiv:1902.10547. [↩](#a1)
637
-
#
638
-
# </li>
639
-
# <li>
640
-
#
641
-
# [Peters, M. E., Ruder, S., & Smith, N. A. (2019)](https://arxiv.org/pdf/1903.05987.pdf). To tune or not to
642
-
# tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987. [↩](#a1)
643
-
#
644
-
# </li>
645
-
# <li id="f2">
646
-
#
647
-
# [Sivaprasad, P. T., Mai, F., Vogels, T., Jaggi, M., & Fleuret, F. (2020)](https://arxiv.org/pdf/1910.11758.pdf).
0 commit comments