public class Fetcher extends Object
限定符和类型 | 类和说明 |
---|---|
static class |
Fetcher.FetchItem |
static class |
Fetcher.FetchQueue |
static class |
Fetcher.QueueFeeder |
限定符和类型 | 字段和说明 |
---|---|
DbUpdater |
dbUpdater |
static int |
FETCH_FAILED |
static int |
FETCH_SUCCESS |
Handler |
handler |
ParserFactory |
parserFactory |
RequestFactory |
requestFactory |
构造器和说明 |
---|
Fetcher() |
限定符和类型 | 方法和说明 |
---|---|
void |
fetchAll(Generator generator)
抓取当前所有任务,会阻塞到爬取完成
|
DbUpdater |
getDbUpdater()
返回CrawlDB更新器
|
Handler |
getHandler()
返回处理抓取消息的Handler
|
boolean |
getNeedUpdateDb()
返回是否存储爬取信息
|
ParserFactory |
getParserFactory()
返回解析器生成器
|
RequestFactory |
getRequestFactory()
返回请求生成器
|
int |
getRetry()
返回http请求失败后重试的次数
|
int |
getThreads()
返回爬虫的线程数
|
boolean |
isIsContentStored()
返回是否存储网页/文件的内容
|
boolean |
isParsing()
返回是否解析网页(解析链接、文本)
|
void |
setDbUpdater(DbUpdater dbUpdater)
设置CrawlDB更新器
|
void |
setHandler(Handler handler)
设置处理抓取消息的Handler
|
void |
setIsContentStored(boolean isContentStored)
设置是否存储网页/文件的内容
|
void |
setNeedUpdateDb(boolean needUpdateDb)
设置是否存储爬取信息
|
void |
setParserFactory(ParserFactory parserFactory)
设置解析器生成器
|
void |
setParsing(boolean parsing)
设置是否解析网页(解析链接、文本)
|
void |
setRequestFactory(RequestFactory requestFactory)
设置请求生成器
|
void |
setRetry(int retry)
设置http请求失败后重试的次数
|
void |
setThreads(int threads)
设置爬虫的线程数
|
void |
stop()
停止爬取
|
public DbUpdater dbUpdater
public Handler handler
public RequestFactory requestFactory
public ParserFactory parserFactory
public static final int FETCH_SUCCESS
public static final int FETCH_FAILED
public void fetchAll(Generator generator) throws Exception
generator
- 给抓取提供任务的Generator(抓取任务生成器)IOException
Exception
public void stop()
public int getThreads()
public void setThreads(int threads)
threads
- 爬虫的线程数public Handler getHandler()
public void setHandler(Handler handler)
handler
- 处理抓取消息的Handlerpublic boolean getNeedUpdateDb()
public void setNeedUpdateDb(boolean needUpdateDb)
needUpdateDb
- 是否存储爬取信息public int getRetry()
public void setRetry(int retry)
retry
- http请求失败后重试的次数public boolean isIsContentStored()
public void setIsContentStored(boolean isContentStored)
isContentStored
- 是否存储网页/文件的内容public boolean isParsing()
public void setParsing(boolean parsing)
parsing
- 是否解析网页(解析链接、文本)public DbUpdater getDbUpdater()
public void setDbUpdater(DbUpdater dbUpdater)
dbUpdater
- CrawlDB更新器public RequestFactory getRequestFactory()
public void setRequestFactory(RequestFactory requestFactory)
requestFactory
- 请求生成器public ParserFactory getParserFactory()
public void setParserFactory(ParserFactory parserFactory)
parserFactory
- 解析器生成器Copyright © 2014. All Rights Reserved.