PaddlePaddle · ooooo-create · Nov 4, 2025 · Nov 4, 2025 · Nov 4, 2025
diff --git a/_typos.toml b/_typos.toml
@@ -26,9 +26,6 @@ Nervana = "Nervana"
 
 # These words need to be fixed
 Creenshot = "Creenshot"
-Embeddding = "Embeddding"
-Embeding = "Embeding"
-Engish = "Engish"
 Learing = "Learing"
 Moible = "Moible"
 Operaton = "Operaton"
@@ -57,15 +54,6 @@ dimention = "dimention"
 dimentions = "dimentions"
 dirrectories = "dirrectories"
 disucssion = "disucssion"
-egde = "egde"
-enviornment = "enviornment"
-erros = "erros"
-evalute = "evalute"
-exampels = "exampels"
-exection = "exection"
-exlusive = "exlusive"
-exmaple = "exmaple"
-exsits = "exsits"
 feeded = "feeded"
 flaot = "flaot"
 fliters = "fliters"

diff --git a/ci_scripts/check_api_docs_en.py b/ci_scripts/check_api_docs_en.py
@@ -124,6 +124,6 @@ def check_system_message_in_doc(doc_file):
     if error_files:
         print("error files: ", error_files)
         print(
-            "ERROR: these docs exsits System Message: WARNING/ERROR, please check and fix them"
+            "ERROR: these docs exists System Message: WARNING/ERROR, please check and fix them"
         )
         sys.exit(1)
diff --git a/ci_scripts/check_api_docs_en.sh b/ci_scripts/check_api_docs_en.sh
@@ -13,7 +13,7 @@ function check_system_message(){
     fi
 }
 
-echo "RUN Engish API Docs Checks"
+echo "RUN English API Docs Checks"
 jsonfn=$1
 output_path=$2
 need_check_api_py_files="${3}"

diff --git a/docs/design/dist_train/README.md b/docs/design/dist_train/README.md
@@ -48,7 +48,7 @@ The training process of asynchronous training can be:
     2. Trainer gets all parameters back from pserver.
 
 ### Note:
-There are also some conditions that need to consider. For exmaple:
+There are also some conditions that need to consider. For example:
 
 1. If trainer needs to wait for the pserver to apply it's gradient and then get back the parameters back.
 1. If we need a lock between parameter update and parameter fetch.

diff --git a/docs/design/memory/memory_optimization.md b/docs/design/memory/memory_optimization.md
@@ -60,7 +60,7 @@ We can leran these techniques from compilers. There are mainly two stages to mak
 
 
 #### Control Flow Graph
-To perform analysis on a program, it is often useful to make a control flow graph. A [control flow graph](https://en.wikipedia.org/wiki/Control_flow_graph) (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. Each statement in the program is a node in the flow graph; if statemment x can be followed by statement y, there is an egde from x to y.
+To perform analysis on a program, it is often useful to make a control flow graph. A [control flow graph](https://en.wikipedia.org/wiki/Control_flow_graph) (CFG) in computer science is a representation, using graph notation, of all paths that might be traversed through a program during its execution. Each statement in the program is a node in the flow graph; if statemment x can be followed by statement y, there is an edge from x to y.
 
 Following is the flow graph for a simple loop.
 

diff --git a/docs/design/mkldnn/inplace/inplace.md b/docs/design/mkldnn/inplace/inplace.md
@@ -94,4 +94,4 @@ replace this original name in all of next op instances.
 
 \* oneDNN gelu kernel is able to perform in-place execution, but currently gelu op does not support in-place execution.
 
-\*\* sum kernel is using oneDNN sum primitive that does not provide in-place exection, so in-place computation is done faked through external buffer. So it was not added into oneDNN inplace pass.
+\*\* sum kernel is using oneDNN sum primitive that does not provide in-place execution, so in-place computation is done faked through external buffer. So it was not added into oneDNN inplace pass.
diff --git a/docs/design/phi/kernel_migrate_cn.md b/docs/design/phi/kernel_migrate_cn.md
@@ -159,7 +159,7 @@ void LogSoftmaxKernel(const Context& dev_ctx,
 | `auto* ptr = out->mutbale_data()` | `auto* ptr = out->data()` |
 | `out->mutbale_data(dims, place)` | `out->Resize(dims); dev_ctx.template Alloc(out)` |
 | `out->mutbale_data(place, dtype)` | `dev_ctx.Alloc(out, dtype)` |
-| `platform::erros::XXX` | `phi::erros::XXX` |
+| `platform::errors::XXX` | `phi::errors::XXX` |
 | `platform::float16/bfloat16/complex64/complex128` | `dtype::float16/bfloat16/complex64/complex128` |
 | `framework::Eigen***` | `Eigen***` |
 | `platform::XXXPlace` | `phi::XXXPlace` |

diff --git a/docs/design/phi/kernel_migrate_en.md b/docs/design/phi/kernel_migrate_en.md
@@ -159,7 +159,7 @@ Secondly, it is necessary to replace some of the types or functions that were on
 | `auto* ptr = out->mutbale_data()` | `auto* ptr = out->data()` |
 | `out->mutbale_data(dims, place)` | `out->Resize(dims); dev_ctx.template Alloc(out)` |
 | `out->mutbale_data(place, dtype)` | `dev_ctx.Alloc(out, dtype)` |
-| `platform::erros::XXX` | `phi::erros::XXX` |
+| `platform::errors::XXX` | `phi::errors::XXX` |
 | `platform::float16/bfloat16/complex64/complex128` | `dtype::float16/bfloat16/complex64/complex128` |
 | `framework::Eigen***` | `Eigen***` |
 | `platform::XXXPlace` | `phi::XXXPlace` |

diff --git a/docs/dev_guides/custom_device_docs/custom_device_example_en.md b/docs/dev_guides/custom_device_docs/custom_device_example_en.md
@@ -10,7 +10,7 @@ In this section we will walk through the steps required to extend a fake hardwar
 
 **InitPlugin**
 
-As a custom runtime entry function, InitPlugin is required to be implemented by the plug-in. The parameter in InitPlugin should also be checked, device information should be filled in, and the runtime API should be registered. In the initialization, PaddlePaddle loads the plug-in and invokes InitPlugin to initialize it, and register runtime (The whole process can be done automatically by the framework, only if the dynamic-link library is in site-packages/paddle-plugins/ or the designated directory of the enviornment variable of CUSTOM_DEVICE_ROOT).
+As a custom runtime entry function, InitPlugin is required to be implemented by the plug-in. The parameter in InitPlugin should also be checked, device information should be filled in, and the runtime API should be registered. In the initialization, PaddlePaddle loads the plug-in and invokes InitPlugin to initialize it, and register runtime (The whole process can be done automatically by the framework, only if the dynamic-link library is in site-packages/paddle-plugins/ or the designated directory of the environment variable of CUSTOM_DEVICE_ROOT).
 
 Example:
 

diff --git a/docs/guides/06_distributed_training/model_parallel_cn.rst b/docs/guides/06_distributed_training/model_parallel_cn.rst
@@ -29,7 +29,7 @@
 
 对于 Embedding 操作，可以将其理解为一种查找表操作。即，将输入看做索引，将 Embedding 参数看做查找表，根据该索引查表得到相应的输出，如下图（a）所示。当采用模型并行时，Embedding 的参数被均匀切分到多个卡上。假设 Embedding 参数的维度为 N*D，并采用 K 张卡执行模型并行，那么模型并行模式下每张卡上的 Embedding 参数的维度为 N//K*D。当参数的维度 N 不能被卡数 K 整除时，最后一张卡的参数维度值为(N//K+N%K)*D。以下图（b）为例，Embedding 参数的维度为 8*D，采用 2 张卡执行模型并行，那么每张卡上 Embedding 参数的维度为 4*D。
 
-为了便于说明，以下我们均假设 Embedding 的参数维度值 D 可以被模型并行的卡数 D 整除。此时，每张卡上 Embedding 参数的索引值为[0, N/K)，逻辑索引值为[k*N/K, (k+1)*N/K)，其中 k 表示卡序号，0<=k<K。对于输入索引 I，如果该索引在该卡表示的逻辑索引范围内，则返回该索引所表示的表项（索引值为 I-k*N/K；否则，返回值为全 0 的虚拟表项。随后，通过 AllReduce 操作获取所有输出表项的和，即对应该 Embeding 操作的输出；整个查表过程如下图（b）所示。
+为了便于说明，以下我们均假设 Embedding 的参数维度值 D 可以被模型并行的卡数 D 整除。此时，每张卡上 Embedding 参数的索引值为[0, N/K)，逻辑索引值为[k*N/K, (k+1)*N/K)，其中 k 表示卡序号，0<=k<K。对于输入索引 I，如果该索引在该卡表示的逻辑索引范围内，则返回该索引所表示的表项（索引值为 I-k*N/K；否则，返回值为全 0 的虚拟表项。随后，通过 AllReduce 操作获取所有输出表项的和，即对应该 Embedding 操作的输出；整个查表过程如下图（b）所示。
 
 .. image:: ./images/parallel_embedding.png
   :width: 600

diff --git a/docs/guides/advanced/customize_cn.ipynb b/docs/guides/advanced/customize_cn.ipynb
@@ -318,8 +318,8 @@
     "        def on_epoch_end(self, epoch, logs=None)            每轮训练结束后，`Model.fit`接口中调用\n",
     "        def on_train_batch_begin(self, step, logs=None)     单个Batch训练开始前，`Model.fit`和`Model.train_batch`接口中调用\n",
     "        def on_train_batch_end(self, step, logs=None)       单个Batch训练结束后，`Model.fit`和`Model.train_batch`接口中调用\n",
-    "        def on_eval_batch_begin(self, step, logs=None)      单个Batch评估开始前，`Model.evalute`和`Model.eval_batch`接口中调用\n",
-    "        def on_eval_batch_end(self, step, logs=None)        单个Batch评估结束后，`Model.evalute`和`Model.eval_batch`接口中调用\n",
+    "        def on_eval_batch_begin(self, step, logs=None)      单个Batch评估开始前，`Model.evaluate`和`Model.eval_batch`接口中调用\n",
+    "        def on_eval_batch_end(self, step, logs=None)        单个Batch评估结束后，`Model.evaluate`和`Model.eval_batch`接口中调用\n",
     "        def on_predict_batch_begin(self, step, logs=None)   单个Batch推理开始前，`Model.predict`和`Model.test_batch`接口中调用\n",
     "        def on_predict_batch_end(self, step, logs=None)     单个Batch推理结束后，`Model.predict`和`Model.test_batch`接口中调用\n",
     "    \"\"\"\n",

diff --git a/docs/guides/advanced/layer_and_model_en.md b/docs/guides/advanced/layer_and_model_en.md
@@ -264,7 +264,7 @@ Tensor(shape=[10, 1], dtype=float32, place=CPUPlace, stop_gradient=True,
        ...
 ```
 
-Here we first set the execution mode to **eval**, and soon after to **train**. The two execution modes are exlusive therefore the latter mode will override the former.
+Here we first set the execution mode to **eval**, and soon after to **train**. The two execution modes are exclusive therefore the latter mode will override the former.
 
 ### Perform an execution
 

diff --git a/..._pytorch/api_difference/input_args_usage_diff/torch.nn.functional.avg_pool1d.md b/..._pytorch/api_difference/input_args_usage_diff/torch.nn.functional.avg_pool1d.md
@@ -30,5 +30,5 @@ paddle.nn.functional.avg_pool1d(x, kernel_size, stride=None, padding=0, exclusiv
 torch.nn.functional.avg_pool1d(input=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, count_include_pad=False)
 
 # Paddle 写法
-paddle.nn.functional.avg_pool1d(x=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, exlusive=True)
+paddle.nn.functional.avg_pool1d(x=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, exclusive=True)
 ```
diff --git a/..._pytorch/api_difference/input_args_usage_diff/torch.nn.functional.avg_pool2d.md b/..._pytorch/api_difference/input_args_usage_diff/torch.nn.functional.avg_pool2d.md
@@ -31,5 +31,5 @@ paddle.nn.functional.avg_pool2d(x, kernel_size, stride=None, padding=0, ceil_mod
 torch.nn.functional.avg_pool2d(input=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, count_include_pad=False)
 
 # Paddle 写法
-paddle.nn.AvgPool2D(x=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, exlusive=True)
+paddle.nn.AvgPool2D(x=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, exclusive=True)
 ```
diff --git a/..._pytorch/api_difference/input_args_usage_diff/torch.nn.functional.avg_pool3d.md b/..._pytorch/api_difference/input_args_usage_diff/torch.nn.functional.avg_pool3d.md
@@ -31,5 +31,5 @@ paddle.nn.functional.avg_pool3d(x, kernel_size, stride=None, padding=0, ceil_mod
 torch.nn.functional.avg_pool3d(input=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, count_include_pad=False)
 
 # Paddle 写法
-paddle.nn.functional.avg_pool3d(x=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, exlusive=True)
+paddle.nn.functional.avg_pool3d(x=input, kernel_size=2, stride=2, padding=1, ceil_mode=True, exclusive=True)
 ```
diff --git a/docs/guides/model_convert/convert_from_pytorch/nlp_fast_explore_cn.md b/docs/guides/model_convert/convert_from_pytorch/nlp_fast_explore_cn.md
@@ -387,7 +387,7 @@ def _init_weights(self, module):
 | [torch.nn.Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html?highlight=dropout#torch.nn.Dropout) | [paddle.nn.Dropout](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/Dropout_cn.html#dropout) | PyTorch 有 inplace 参数，表示在不更改变量的内存地址的情况下，直接修改变量的值，飞桨无此参数。 |
 | [torch.nn.Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html?highlight=linear#torch.nn.Linear) | [paddle.nn.Linear](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/Linear_cn.html#linear) | PyTorch `bias`默认为 True，表示使用可更新的偏置参数。飞桨 `weight_attr`/`bias_attr`默认使用默认的权重/偏置参数属性，否则为指定的权重/偏置参数属性，具体用法参见[ParamAttr](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/ParamAttr_cn.html#paramattr)；当`bias_attr`设置为 bool 类型与 PyTorch 的作用一致。 |
 | [torch.nn.LayerNorm](https://pytorch.org/docs/stable/generated/torch.nn.LayerNorm.html?highlight=layernorm#torch.nn.LayerNorm) | [paddle.nn.LayerNorm](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api/paddle/nn/LayerNorm_cn.html#layernorm) | 注意参数 epsilon 不同模型参数值，可能不同，对模型精度影响大。  |
-| [torch.nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html?highlight=embedding#torch.nn.Embedding) | [paddle.nn.Embedding](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/Embedding_cn.html#embedding) | PyTorch：当 max_norm 不为`None`时，如果 Embeddding 向量的范数（范数的计算方式由 norm_type 决定）超过了 max_norm 这个界限，就要再进行归一化。PaddlePaddle：PaddlePaddle 无此要求，因此不需要归一化。PyTorch：若 scale_grad_by_freq 设置为`True`，会根据单词在 mini-batch 中出现的频率，对梯度进行放缩。 PaddlePaddle：PaddlePaddle 无此功能。 |
+| [torch.nn.Embedding](https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html?highlight=embedding#torch.nn.Embedding) | [paddle.nn.Embedding](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/Embedding_cn.html#embedding) | PyTorch：当 max_norm 不为`None`时，如果 Embedding 向量的范数（范数的计算方式由 norm_type 决定）超过了 max_norm 这个界限，就要再进行归一化。PaddlePaddle：PaddlePaddle 无此要求，因此不需要归一化。PyTorch：若 scale_grad_by_freq 设置为`True`，会根据单词在 mini-batch 中出现的频率，对梯度进行放缩。 PaddlePaddle：PaddlePaddle 无此功能。 |
 
 ### 3.2 权重转换
 

diff --git a/docs/practices/nlp/seq2seq_with_attention.ipynb b/docs/practices/nlp/seq2seq_with_attention.ipynb
@@ -670,20 +670,20 @@
     "encoder.eval()\n",
     "atten_decoder.eval()\n",
     "\n",
-    "num_of_exampels_to_evaluate = 10\n",
+    "num_of_examples_to_evaluate = 10\n",
     "\n",
     "indices = np.random.choice(\n",
-    "    len(train_en_sents), num_of_exampels_to_evaluate, replace=False\n",
+    "    len(train_en_sents), num_of_examples_to_evaluate, replace=False\n",
     ")\n",
     "x_data = train_en_sents[indices]\n",
     "sent = paddle.to_tensor(x_data)\n",
     "en_repr = encoder(sent)\n",
     "\n",
-    "word = np.array([[cn_vocab[\"<bos>\"]]] * num_of_exampels_to_evaluate)\n",
+    "word = np.array([[cn_vocab[\"<bos>\"]]] * num_of_examples_to_evaluate)\n",
     "word = paddle.to_tensor(word)\n",
     "\n",
-    "hidden = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])\n",
-    "cell = paddle.zeros([num_of_exampels_to_evaluate, 1, hidden_size])\n",
+    "hidden = paddle.zeros([num_of_examples_to_evaluate, 1, hidden_size])\n",
+    "cell = paddle.zeros([num_of_examples_to_evaluate, 1, hidden_size])\n",
     "\n",
     "decoded_sent = []\n",
     "for i in range(MAX_LEN + 2):\n",
@@ -693,7 +693,7 @@
     "    word = paddle.unsqueeze(word, axis=-1)\n",
     "\n",
     "results = np.stack(decoded_sent, axis=1)\n",
-    "for i in range(num_of_exampels_to_evaluate):\n",
+    "for i in range(num_of_examples_to_evaluate):\n",
     "    en_input = \" \".join(filtered_pairs[indices[i]][0])\n",
     "    ground_truth_translate = \"\".join(filtered_pairs[indices[i]][1])\n",
     "    model_translate = \"\"\n",

diff --git a/docs/practices/quick_start/high_level_api.ipynb b/docs/practices/quick_start/high_level_api.ipynb
@@ -941,8 +941,8 @@
     "        def on_epoch_end(self, epoch, logs=None)            每轮训练结束后，`Model.fit`接口中调用 \n",
     "        def on_train_batch_begin(self, step, logs=None)     单个Batch训练开始前，`Model.fit`和`Model.train_batch`接口中调用\n",
     "        def on_train_batch_end(self, step, logs=None)       单个Batch训练结束后，`Model.fit`和`Model.train_batch`接口中调用\n",
-    "        def on_eval_batch_begin(self, step, logs=None)      单个Batch评估开始前，`Model.evalute`和`Model.eval_batch`接口中调用\n",
-    "        def on_eval_batch_end(self, step, logs=None)        单个Batch评估结束后，`Model.evalute`和`Model.eval_batch`接口中调用\n",
+    "        def on_eval_batch_begin(self, step, logs=None)      单个Batch评估开始前，`Model.evaluate`和`Model.eval_batch`接口中调用\n",
+    "        def on_eval_batch_end(self, step, logs=None)        单个Batch评估结束后，`Model.evaluate`和`Model.eval_batch`接口中调用\n",
     "        def on_test_batch_begin(self, step, logs=None)      单个Batch预测测试开始前，`Model.predict`和`Model.test_batch`接口中调用\n",
     "        def on_test_batch_end(self, step, logs=None)        单个Batch预测测试结束后，`Model.predict`和`Model.test_batch`接口中调用\n",
     "    \"\"\"\n",
Original file line number	Diff line number	Diff line change
Expand Up		@@ -94,4 +94,4 @@ replace this original name in all of next op instances.

		\* oneDNN gelu kernel is able to perform in-place execution, but currently gelu op does not support in-place execution.

		\\ sum kernel is using oneDNN sum primitive that does not provide in-place exection, so in-place computation is done faked through external buffer. So it was not added into oneDNN inplace pass.
		\\ sum kernel is using oneDNN sum primitive that does not provide in-place execution, so in-place computation is done faked through external buffer. So it was not added into oneDNN inplace pass.
-Original file line number
+Diff line change
@@ Expand Up @@
            ...
     ```
-    Here we first set the execution mode to **eval**, and soon after to **train**. The two execution modes are exlusive therefore the latter mode will override the former.
+    Here we first set the execution mode to **eval**, and soon after to **train**. The two execution modes are exclusive therefore the latter mode will override the former.
     ### Perform an execution
@@ Expand Down @@