As long as a crawler downloads pages, it is sure to be able to handle them.
查看答案
The robots exclusion protocol, or the robots.txt protocol, is able to solve the problem of the costs of using Web crawlers.
A. 对
B. 错
“Crawl-delay:” parameter in the robots.txt file asks crawlers to delay for a number of seconds between requests.
A. 对
B. 错
A parallel crawler can run multiple processes in parallel, as a result of which the download rate is maximized, the overhead is minimized and repeated downloads of the same page are avoided.
A. 对
B. 错
The new URLs discovered during the crawling process are assigned so that the same page is not downloaded more than once.
A. 对
B. 错