Skip to content

Commit b1e2c93

Browse files
committed
fix connection timeouts and ssl errors
1 parent 8c3c9fd commit b1e2c93

13 files changed

Lines changed: 144 additions & 54 deletions

File tree

README.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,13 @@ java -cp 'lib/*:bin/*.jar' nl.melp.linkchecker.LinkChecker
1919
[--redis-host=HOST]
2020
[--redis-port=PORT]
2121
[--threads=N]
22-
[--reset|--resume]
23-
[--report|--report-ok]
22+
[--reset|--resume|--recheck]
23+
[--report|--report-all]
2424
[--follow-local|--follow-from-local|--no-follow]
25-
[--recheck|--recheck-only-errors|--no-recheck]
25+
[--recheck-only-errors|--no-recheck]
2626
[--ignore=PATTERN1[,PATTERN2...] [--ignore=PATTERN3...]]
27+
[--include=PATTERN1[,PATTERN2...] [--include=PATTERN3...]]
28+
[--ignore-ssl-errors]
2729
http://localhost/
2830
https://localhost/
2931
```
@@ -35,19 +37,26 @@ java -cp 'lib/*:bin/*.jar' nl.melp.linkchecker.LinkChecker
3537
| `--threads=N` | Configure number of threads to use. There will be running 1 master thread, 1 logger thread and N worker threads. |
3638
| `--redis-host=HOST` | Configure HOST as the Redis host. |
3739
| `--redis-port=PORT` | Configure PORT as the Redis port |
38-
| `--follow-local` | Only local links to that local domain are followed* |
39-
| `--follow-from-local` | Only follow links that are mentioned on the local domain. This means that the link checker only spans over multiple hosts *once*. |
40+
| `--follow-local` | Only local links to that local* domain are followed |
41+
| `--follow-from-local` | Only follow links that are mentioned on the local* domain. This means that the link checker only spans over multiple hosts *once*. |
4042
| `--no-follow` | No links are followed. This is typically useful in combination with the `--recheck` flag |
4143
| `--recheck` | Reset the status for each of the previously failed URLs, and try them again. |
42-
| `--recheck-only-errors` | Only recheck links that had an internal error state, i.e. all urls that (which usually are out of your control anyway |
43-
| `--no-recheck` | Don't do recheck, even if url's are marked as "processing". This happens if a linkchecker process was interrupted without finishing cleanly. |
44+
| `--recheck-only-errors` | Only recheck links that had an internal error state, i.e. all urls that had connection errors, timeouts, etc. |
45+
| `--no-recheck` | Don't do recheck, even if url's are marked as "processing". |
4446
| `--reset` | Start with a clean slate |
4547
| `--resume` | Resume a previously stopped session. |
4648
| `--report` | When done, write a report to stdout and to reporting keys in Redis. |
4749
| `--report-all` | Also report working links. By default, only error statuses are reported |
4850

4951
*) The start URLs passed in the command line will be considered "local
50-
domains". This means that with
52+
domains". This means that with the flags `--follow-from-local`, pages
53+
read from domains that are part of the arguments list are considered
54+
"local" pages and every link mentioned on that page will be followed.
55+
Similarly, if the `--follow-local` flag is passed, only links on the
56+
same domain as the domains mentioned in these start urls are followed.
57+
58+
Note that this way you can actually allow multiple domains to be checked,
59+
by specifying multiple start urls on different domains.
5160

5261
## Resuming state
5362
All status data is stored in Maps and Sets which are persisted in

lib/commons-codec-1.11.jar

327 KB
Binary file not shown.

lib/commons-codec-1.9.jar

-258 KB
Binary file not shown.

lib/fetch.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,9 @@ fetch() {
3939
)
4040
rm -rf "$dir";
4141
}
42-
42+
4343
fetch_maven_deps \
44-
'org.apache.httpcomponents:httpclient:4.5.3' \
44+
'org.apache.httpcomponents:httpclient:4.5.12' \
4545
'junit:junit:4.12' \
4646
'org.slf4j:slf4j-log4j12:1.7.7' \
4747
'org.jsoup:jsoup:1.11.3'

lib/httpclient-4.5.12.jar

760 KB
Binary file not shown.

lib/httpclient-4.5.3.jar

-730 KB
Binary file not shown.

lib/httpcore-4.4.13.jar

321 KB
Binary file not shown.

lib/httpcore-4.4.6.jar

-316 KB
Binary file not shown.

resources/sample/index.html

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
<html>
2+
<body>
3+
<ul>
4+
<li><a href="index2.html">index 2</a></li>
5+
</ul>
6+
7+
<p>Author: <a href="mailto:test@example.org">Mr. Foo Bar</a>, more info <a href="subdir/foo.html">subdir/foo.html</a></p>
8+
</body>
9+
</html>

resources/sample/index2.html

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
<html>
2+
<body>
3+
<h1 id="chapter1">Chapter 1</h1>
4+
5+
<p>interesting words</p>
6+
<p>interesting words</p>
7+
<p>interesting words</p>
8+
<p>interesting words</p>
9+
<p>interesting words</p>
10+
<p>interesting words</p>
11+
<p>interesting words</p>
12+
<p>interesting words</p>
13+
<p>interesting words</p>
14+
<p>interesting words</p>
15+
<p>See also <a href="#chapter2">chapter 2</a></p>
16+
<p>interesting words</p>
17+
<p>interesting words</p>
18+
<p>interesting words</p>
19+
<p>interesting words</p>
20+
21+
<h1 id="chapter2">Chapter 2</h1>
22+
23+
<p>interesting words</p>
24+
<p>interesting words</p>
25+
<p>interesting words</p>
26+
<p>interesting words</p>
27+
<p>interesting words</p>
28+
<p>See also <a href="#chapter1">chapter 1</a></p>
29+
<p>interesting words</p>
30+
<p>interesting words</p>
31+
<p>interesting words</p>
32+
</body>
33+
</html>

0 commit comments

Comments
 (0)