{
"args": {
"n": "3"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=3"
}
{
"args": {
"n": "3"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=3"
}
{
"args": {
"n": "3"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=3"
}
{
"args": {
"n": "1"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=1"
}
{
"args": {
"n": "1"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=1"
}
{
"args": {
"n": "1"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=1"
}
{
"args": {
"n": "1"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=1"
}
{
"args": {
"n": "1"
},
"data": "",
"files": {},
"form": {},
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip",
"Host": "httpbin.org",
"User-Agent": "colly - https://github.com/gocolly/colly/v2",
"X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd"
},
"origin": "83.139.137.160",
"url": "https://httpbin.org/delay/2?n=1"
}
Hello guys, recently I was using crawler to crawl some stuff and it was taking quite a lot of time, so I decided to use async mode. While using the async mode I've noticed a lot of duplicates in my results, especially number of duplicates was matching the number of threads I was launching my crawler.
Here is a quick example, let's take an example from official docs - https://github.com/gocolly/colly/blob/master/_examples/rate_limit/rate_limit.go
If we would launch this code, we can see the results:
A lot of text here with http body response
{ "args": { "n": "3" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=3" } { "args": { "n": "3" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=3" } { "args": { "n": "3" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-0ce769125429588340e95d6c" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=3" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" } { "args": { "n": "1" }, "data": "", "files": {}, "form": {}, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip", "Host": "httpbin.org", "User-Agent": "colly - https://github.com/gocolly/colly/v2", "X-Amzn-Trace-Id": "Root=1-659818d1-41c8deb73f2c9a702e3a9fcd" }, "origin": "83.139.137.160", "url": "https://httpbin.org/delay/2?n=1" }As you can see, there are duplicates in results. Maybe I'm doing something wrong, not setting up crawler properly, but still I highly doubt if this is a intended behaviour. Anyways, would appreciate any help.