Current location - Quotes Website - Signature design - Java crawler grabs specified data
Java crawler grabs specified data
According to the related content of java network programming, the html page code of the webpage corresponding to url can be obtained by using the related classes provided by jdk.

For the obtained html code, we can get what we want by using regular expressions.

For example, if we want to get all the text content of a webpage including the keyword "java", we can match the webpage code line by line with regular expressions. Finally, the html tags and irrelevant content are removed, and only the content containing the keyword "java" is obtained.

The process of grabbing pictures from web pages is basically the same as that of grabbing content, except that there is one more step to grab pictures.

You need to match the regular expression of img tag to get img tag, then use the regular expression of src attribute to get the image url of src attribute in this img tag, and then read the image information of this image url by buffering the input stream object, and write the read image information into the local area in cooperation with the fileoutputstream.