Skip to content

Commit a5b3abd

Browse files
Malewareadwk67
andauthored
Add azure adls to troubleshooting guide (#745)
* Add azure adls to troubleshooting guide * Removing trailing whitespace * Fixing typos * Applying suggestions Co-authored-by: Andrew Kenworthy <1712947+adwk67@users.noreply.github.com> * fixing wording --------- Co-authored-by: Andrew Kenworthy <1712947+adwk67@users.noreply.github.com>
1 parent 5abad5b commit a5b3abd

1 file changed

Lines changed: 48 additions & 0 deletions

File tree

  • docs/modules/airflow/pages/troubleshooting

docs/modules/airflow/pages/troubleshooting/index.adoc

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,53 @@
11
= Troubleshooting
22

3+
== Azure Blob Storage Logging
4+
5+
Azure's `ADLS` can be used to store Airflow task logs.
6+
7+
Assume a regular storage container in Azures ADLS backend: this can be accessed with either the `adls[s]` or `wasb` connector using the https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/connections/adls_v2.html[Azure Data Lake Storage Gen2 Connection] or the https://airflow.apache.org/docs/apache-airflow-providers-microsoft-azure/stable/connections/wasb.html[Microsoft Azure Blob Storage Connection] respectively.
8+
9+
If `ADLS` is used as a task log backend it must be accessed via `wasb` and thus the configuration in the environment should look like:
10+
[source,yaml]
11+
----
12+
webservers:
13+
envOverrides: &logging_overrides
14+
AIRFLOW__AZURE_REMOTE_LOGGING__REMOTE_WASB_LOG_CONTAINER: "<container-name>" #<1>
15+
AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "wasb-<folder-name>" #<2>
16+
AIRFLOW__LOGGING__REMOTE_LOGGING: "True"
17+
AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "<connection-name>" #<3>
18+
triggerers:
19+
envOverrides: *logging_overrides
20+
kubernetesExecutors:
21+
envOverrides: *logging_overrides
22+
schedulers:
23+
envOverrides: *logging_overrides
24+
----
25+
<1> This env var is only used for wasb connections.
26+
<2> Note that the <container-name> is *not* referenced.
27+
<3> This connection can be defined in the AirflowUI or declared as an environment variable.
28+
29+
Due to this open https://github.com/apache/airflow/issues/58946[issue] with Airflow, it's recommended to use `wasb-<folder-name>` rather then `wasb://<folder-name>` as using the latter option would assume the target location looks like this:
30+
[source,text]
31+
----
32+
<container-name>
33+
└── wasb:/
34+
└── tasklogs/
35+
└── dag_id=...
36+
----
37+
However the workaround will result in
38+
[source,text]
39+
----
40+
<container-name>
41+
└── wasb-tasklogs/
42+
└── dag_id=...
43+
----
44+
45+
The `Azure Blob Storage Connection` will offer the optional field `Host` which should have a value looking like this:
46+
[source,text]
47+
----
48+
https://<storage-account-name>.blob.core.windows.net
49+
----
50+
351
== S3 Logging: An error occurred (411) when calling the PutObject operation: Length Required
452

553
If Airflow is trying to access S3 (e.g. for remote task logging) and throws the following error

0 commit comments

Comments
 (0)