Skip to content
This repository was archived by the owner on Aug 28, 2025. It is now read-only.

Commit dfc4f04

Browse files
authored
bump nbsphinx & fix labels & resolve sudo (#280)
1 parent 8460b48 commit dfc4f04

5 files changed

Lines changed: 24 additions & 75 deletions

File tree

.actions/assistant.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ class AssistantCLI:
158158
_EXT_ARCHIVE_TAR = (".tar", ".gz")
159159
_EXT_ARCHIVE = _EXT_ARCHIVE_ZIP + _EXT_ARCHIVE_TAR
160160
_AZURE_POOL = "lit-rtx-3090"
161-
_AZURE_DOCKER = "pytorchlightning/pytorch_lightning:base-cuda-py3.9-torch1.12-cuda11.6.1"
161+
_AZURE_DOCKER = "pytorchlightning/tutorials:latest"
162162

163163
@staticmethod
164164
def _find_meta(folder: str) -> str:

.azure/ipynb-publish.yml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -111,11 +111,6 @@ jobs:
111111
pip list
112112
displayName: "Image info & NVIDIA"
113113
114-
- script: |
115-
/tmp/docker exec -t -u 0 $CONTAINER_ID \
116-
sh -c "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -o Dpkg::Options::="--force-confold" -y install sudo"
117-
displayName: "Install Sudo in container (thanks Microsoft!)"
118-
119114
- bash: |
120115
git config --global user.email "pipelines@azure.com"
121116
git config --global user.name "Azure Pipelines"
@@ -135,10 +130,7 @@ jobs:
135130
136131
- bash: |
137132
set -e
138-
sudo apt-get update -q --fix-missing
139-
sudo apt install -y tree ffmpeg
140-
#pip install --upgrade pip
141-
#pip --version
133+
pip --version
142134
pip install -r requirements.txt -r _requirements/data.txt
143135
pip list
144136
displayName: "Install dependencies"

.azure/ipynb-tests.yml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -72,15 +72,9 @@ jobs:
7272
pip list | grep torch
7373
displayName: "Image info & NVIDIA"
7474
75-
- script: |
76-
/tmp/docker exec -t -u 0 $CONTAINER_ID \
77-
sh -c "apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -o Dpkg::Options::="--force-confold" -y install sudo"
78-
displayName: "Install Sudo in container (thanks Microsoft!)"
79-
8075
- bash: |
8176
set -e
82-
sudo apt-get update -q --fix-missing
83-
sudo apt install -y tree ffmpeg
77+
pip --version
8478
pip install -r requirements.txt -r _requirements/data.txt
8579
pip list
8680
displayName: "Install dependencies"

_requirements/docs.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
sphinx >5.0, <6.0
22
myst-parser >=0.18.1, <3.0.0
3-
nbsphinx >=0.8.5, <=0.8.9
3+
nbsphinx >=0.8.5, <0.10
44
pandoc >=1.0, <=2.3
55
#docutils >=0.16
66
sphinx-paramlinks >=0.5.1, <=0.5.4

lightning_examples/finetuning-scheduler/finetuning-scheduler.py

Lines changed: 20 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -18,12 +18,12 @@
1818
#
1919
#
2020
#
21-
# <div style="display:inline" id="a1">
21+
# <div style="display:inline">
2222
#
2323
# Fundamentally, [Fine-Tuning Scheduler](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) enables
2424
# scheduled, multi-phase, fine-tuning of foundation models. Gradual unfreezing (i.e. thawing) can help maximize
2525
# foundation model knowledge retention while allowing (typically upper layers of) the model to
26-
# optimally adapt to new tasks during transfer learning [1, 2, 3](#f1)
26+
# optimally adapt to new tasks during transfer learning [1, 2, 3]
2727
#
2828
# </div>
2929
#
@@ -42,10 +42,8 @@
4242
#
4343
# ## Basic Usage
4444
#
45-
# <div id="basic_usage">
46-
#
4745
# If no fine-tuning schedule is provided by the user, [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) will generate a
48-
# [default schedule](#The-Default-Finetuning-Schedule) and proceed to fine-tune according to the generated schedule,
46+
# [default schedule](#The-Default-Fine-Tuning-Schedule) and proceed to fine-tune according to the generated schedule,
4947
# using default [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping) and [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) callbacks with ``monitor=val_loss``.
5048
#
5149
# </div>
@@ -111,7 +109,7 @@
111109
#
112110
#
113111
#
114-
# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#superglue)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/).
112+
# The end-to-end example in this notebook ([Scheduled Fine-Tuning For SuperGLUE](#Scheduled-Fine-Tuning-For-SuperGLUE)) uses [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) in explicit mode to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/).
115113
# Please see the [official Fine-Tuning Scheduler documentation](https://finetuning-scheduler.readthedocs.io/en/stable/index.html) if you are interested in a similar [CLI-based example](https://finetuning-scheduler.readthedocs.io/en/stable/index.html#example-scheduled-fine-tuning-for-superglue) using the LightningCLI.
116114

117115
# %% [markdown]
@@ -158,8 +156,6 @@
158156
# </div>
159157

160158
# %% [markdown]
161-
# <div id="superglue"></div>
162-
#
163159
# ## Scheduled Fine-Tuning For SuperGLUE
164160
#
165161
# The following example demonstrates the use of [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) to fine-tune a small foundation model on the [RTE](https://huggingface.co/datasets/viewer/?dataset=super_glue&config=rte) task of [SuperGLUE](https://super.gluebenchmark.com/). Iterative early-stopping will be applied according to a user-specified schedule.
@@ -452,12 +448,10 @@ def configure_optimizers(self):
452448
# %% [markdown]
453449
# ### Optimizer Configuration
454450
#
455-
# <div id="a2">
456-
#
457451
# Though other optimizers can arguably yield some marginal advantage contingent on the context,
458452
# the Adam optimizer (and the [AdamW version](https://pytorch.org/docs/stable/_modules/torch/optim/adamw.html#AdamW) which
459453
# implements decoupled weight decay) remains robust to hyperparameter choices and is commonly used for fine-tuning
460-
# foundation language models. See [(Sivaprasad et al., 2020)](#f2) and [(Mosbach, Andriushchenko & Klakow, 2020)](#f3) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning
454+
# foundation language models. See (Sivaprasad et al., 2020) and (Mosbach, Andriushchenko & Klakow, 2020) for theoretical and systematic empirical justifications of Adam and its use in fine-tuning
461455
# large transformer-based language models. The values used here have some justification
462456
# in the referenced literature but have been largely empirically determined and while a good
463457
# starting point could be could be further tuned.
@@ -470,15 +464,13 @@ def configure_optimizers(self):
470464
# %% [markdown]
471465
# ### LR Scheduler Configuration
472466
#
473-
# <div id="a3">
474-
#
475467
# The [CosineAnnealingWarmRestarts scheduler](https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.html?highlight=cosineannealingwarm#torch.optim.lr_scheduler.CosineAnnealingWarmRestarts) nicely fits with our iterative fine-tuning since it does not depend upon a global max_epoch
476-
# value. The importance of initial warmup is reduced due to the innate warmup effect of Adam bias correction [[5]](#f3)
468+
# value. The importance of initial warmup is reduced due to the innate warmup effect of Adam bias correction [5]
477469
# and the gradual thawing we are performing. Note that commonly used LR schedulers that depend on providing
478470
# max_iterations/epochs (e.g. the
479471
# [CosineWarmupScheduler](https://github.com/Lightning-AI/tutorials/blob/0c325829101d5a6ebf32ed99bbf5b09badf04a59/course_UvA-DL/05-transformers-and-MH-attention/Transformers_MHAttention.py#L688)
480472
# used in other pytorch-lightning tutorials) also work with FinetuningScheduler. Though the LR scheduler is theoretically
481-
# justified [(Loshchilov & Hutter, 2016)](#f4), the particular values provided here are primarily empircally driven.
473+
# justified (Loshchilov & Hutter, 2016), the particular values provided here are primarily empircally driven.
482474
#
483475
# [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) also supports both optimizer and LR scheduler
484476
# reinitialization in explicit and implicit finetuning schedule modes. See the advanced usage documentation ([LR scheduler reinitialization](https://finetuning-scheduler.readthedocs.io/en/stable/advanced/lr_scheduler_reinitialization.html), [optimizer reinitialization](https://finetuning-scheduler.readthedocs.io/en/stable/advanced/optimizer_reinitialization.html)) for explanations and demonstration of the extension's support for more complex requirements.
@@ -502,7 +494,7 @@ def configure_optimizers(self):
502494
#
503495
# The only callback required to invoke the [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) is the [FinetuningScheduler](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts.html#finetuning_scheduler.fts.FinetuningScheduler) callback itself.
504496
# Default versions of [FTSCheckpoint](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSCheckpoint) and [FTSEarlyStopping](https://finetuning-scheduler.readthedocs.io/en/stable/api/finetuning_scheduler.fts_supporters.html#finetuning_scheduler.fts_supporters.FTSEarlyStopping)
505-
# (if not specifying ``epoch_only_transitions``) will be included ([as discussed above](#basic_usage)) if not provided
497+
# (if not specifying ``epoch_only_transitions``) will be included ([as discussed above](#Basic-Usage)) if not provided
506498
# in the callbacks list. For demonstration purposes I'm including example configurations of all three callbacks below.
507499

508500
# %%
@@ -622,47 +614,18 @@ def train() -> None:
622614
# %% [markdown]
623615
# ## Footnotes
624616
#
625-
# <ol>
626-
# <li id="f1">
627-
#
628-
# [Howard, J., & Ruder, S. (2018)](https://arxiv.org/pdf/1801.06146.pdf). Fine-tuned Language
629-
# Models for Text Classification. ArXiv, abs/1801.06146. [↩](#a1)
630-
#
631-
# </li>
632-
# <li>
633-
#
634-
# [Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019)](https://arxiv.org/pdf/1902.10547.pdf).
617+
# - [Howard, J., & Ruder, S. (2018)](https://arxiv.org/pdf/1801.06146.pdf). Fine-tuned Language
618+
# Models for Text Classification. ArXiv, abs/1801.06146. [↩](#Scheduled-Fine-Tuning-with-the-Fine-Tuning-Scheduler-Extension)
619+
# - [Chronopoulou, A., Baziotis, C., & Potamianos, A. (2019)](https://arxiv.org/pdf/1902.10547.pdf).
635620
# An embarrassingly simple approach for transfer learning from pretrained language models. arXiv
636-
# preprint arXiv:1902.10547. [↩](#a1)
637-
#
638-
# </li>
639-
# <li>
640-
#
641-
# [Peters, M. E., Ruder, S., & Smith, N. A. (2019)](https://arxiv.org/pdf/1903.05987.pdf). To tune or not to
642-
# tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987. [↩](#a1)
643-
#
644-
# </li>
645-
# <li id="f2">
646-
#
647-
# [Sivaprasad, P. T., Mai, F., Vogels, T., Jaggi, M., & Fleuret, F. (2020)](https://arxiv.org/pdf/1910.11758.pdf).
621+
# preprint arXiv:1902.10547. [↩](#Scheduled-Fine-Tuning-with-the-Fine-Tuning-Scheduler-Extension)
622+
# - [Peters, M. E., Ruder, S., & Smith, N. A. (2019)](https://arxiv.org/pdf/1903.05987.pdf). To tune or not to
623+
# tune? adapting pretrained representations to diverse tasks. arXiv preprint arXiv:1903.05987. [↩](#Scheduled-Fine-Tuning-with-the-Fine-Tuning-Scheduler-Extension)
624+
# - [Sivaprasad, P. T., Mai, F., Vogels, T., Jaggi, M., & Fleuret, F. (2020)](https://arxiv.org/pdf/1910.11758.pdf).
648625
# Optimizer benchmarking needs to account for hyperparameter tuning. In International Conference on Machine Learning
649-
# (pp. 9036-9045). PMLR. [↩](#a2)
650-
#
651-
# </li>
652-
# <li id="f3">
653-
#
654-
# [Mosbach, M., Andriushchenko, M., & Klakow, D. (2020)](https://arxiv.org/pdf/2006.04884.pdf). On the stability of
655-
# fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:2006.04884. [↩](#a2)
656-
#
657-
# </li>
658-
# <li id="f4">
659-
#
660-
# [Loshchilov, I., & Hutter, F. (2016)](https://arxiv.org/pdf/1608.03983.pdf). Sgdr: Stochastic gradient descent with
661-
# warm restarts. arXiv preprint arXiv:1608.03983. [↩](#a3)
662-
#
663-
# </li>
664-
#
665-
# </ol>
666-
667-
# %% [markdown]
626+
# (pp. 9036-9045). PMLR. [↩](#Optimizer-Configuration)
627+
# - [Mosbach, M., Andriushchenko, M., & Klakow, D. (2020)](https://arxiv.org/pdf/2006.04884.pdf). On the stability of
628+
# fine-tuning bert: Misconceptions, explanations, and strong baselines. arXiv preprint arXiv:2006.04884. [↩](#Optimizer-Configuration)
629+
# - [Loshchilov, I., & Hutter, F. (2016)](https://arxiv.org/pdf/1608.03983.pdf). Sgdr: Stochastic gradient descent with
630+
# warm restarts. arXiv preprint arXiv:1608.03983. [↩](#LR-Scheduler-Configuration)
668631
#

0 commit comments

Comments
 (0)