一、简介
HttpClient 是Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的客户端编程工具包,并且它支持 HTTP 协议最新的版本和建议。,此处基于4.5.2版本。maven依赖:
org.apache.httpcomponents httpclient 4.5.2
二、HelloWorld实现
package com.xsjt.chap01;import java.io.IOException;import org.apache.http.HttpEntity;import org.apache.http.client.ClientProtocolException;import org.apache.http.client.methods.CloseableHttpResponse;import org.apache.http.client.methods.HttpGet;import org.apache.http.impl.client.CloseableHttpClient;import org.apache.http.impl.client.HttpClients;import org.apache.http.util.EntityUtils;public class HelloWorld { /** * 抓取网页信息使用 get请求 * @param args * @throws IOException * @throws ClientProtocolException */ public static void main(String[] args) throws ClientProtocolException, IOException { // 创建httpClient实例 CloseableHttpClient httpClient = HttpClients.createDefault(); // 创建httpGet实例 HttpGet httpGet = new HttpGet("http://www.cnblogs.com"); // http://www.tuicool.com/ CloseableHttpResponse response = httpClient.execute(httpGet); if(response != null){ HttpEntity entity = response.getEntity(); // 获取网页内容 String result = EntityUtils.toString(entity, "UTF-8"); System.out.println("网页内容:" + result); } if(response != null){ response.close(); } if(httpClient != null){ httpClient.close(); } }}
上述代码中可以直接获取到 网页内容,有的获取到的内容是 中文乱码的,这就需要根据 网页的编码 来设置编码了,比如gb2312。
三、爬虫教程
四、HttpClient学习地址