Java crawler grabs specified data

According to the related content of java network programming, the html page code of the webpage corresponding to url can be obtained by using the related classes provided by jdk.

For the obtained html code, we can get what we want by using regular expressions.

For example, if we want to get all the text content of a webpage including the keyword "java", we can match the webpage code line by line with regular expressions. Finally, the html tags and irrelevant content are removed, and only the content containing the keyword "java" is obtained.

The process of grabbing pictures from web pages is basically the same as that of grabbing content, except that there is one more step to grab pictures.

You need to match the regular expression of img tag to get img tag, then use the regular expression of src attribute to get the image url of src attribute in this img tag, and then read the image information of this image url by buffering the input stream object, and write the read image information into the local area in cooperation with the fileoutputstream.

Factory maintenance labor construction contract template

How to write the signature of the electronic account book?

Girls�� WeChat names bring good fortune and are domineering

Lee's guitar timetable

Is there any desktop theme or computer software that can put your daily plan and inspirational quotations on the computer desktop and make them flow like lyrics?

Who can help me think of a better and gentlemanly QQ nickname and signature?

Who starred in the film Huo Yuanjia published more than twenty years ago?

Is a will made during a serious illness valid?

Party A and Party B signed the wrong signature on the rental contract

Today I asked the director to resign, but he refused. He said that if we leave, he will hack us online so that we can��t find jobs.