When logging into Sina Weibo from a PC, the user name and password are pre-encrypted using js on the client, and a set of parameters will be GET before POST, which will also be used as part of POST_DATA. In this way, you cannot use the usual simple method to simulate POST login (such as Renren).
To obtain Sina Weibo data through crawlers, simulated login is essential.
1. Before submitting the POST request, you need to get four parameters (servertime, nonce, pubkey and rsakv) through GET. It is not just about getting the simple servertime and nonce as mentioned before. This is mainly due to js's The encryption method of username and password has been changed.
1.1 Due to changes in encryption methods, we will use the RSA module here. For an introduction to the RSA public key encryption algorithm, please refer to the relevant content in the network. Download and install the rsa module:
Download: /signup/signin.php) source code, where you can find the js address /js/sso/ssologin.js, but the content inside is encrypted after opening If you pass, you can find an online decryption site on the Internet to decrypt and check the encryption method of the final user name and password.
1.3 Login
The first step in logging in is to add your username (username) and request the prelogin_url link address:
prelogin_url = '/sso/prelogin. php?entry=sso&callback=sinaSSOController.preloginCallBack&su=%s&rsakt=mod&client=ssologin.js(v1.4.4)' % username
Use the get method to get the following similar content:
sinaSSOController. preloginCallBack({"retcode":0,"servertime":1362041092,"pcid":"gz-6664c3dea2bfdaa3c94e8734c9ec2c9e6a1f","nonce":"IRYP4N","pubkey":"EB2A38568661887FA180BDDB5CABD5F 21C7BFD59C090CB2D245A87AC253062882729293E5506350508E7F9AA3BB77F433323 1490F915F6D63C55FE2F08A49B353F444AD3993CACC02DB784ABBB8E42A9B1BBFFFB3 8BE18D78E87A0E41B9B8F73A928EE0CCEE1F6739884B9777E4FE9E88A1BBE495927AC4A799B3181D6442443","rsakv":"1330428213"," exectime":1})
And then extract the servertime, nonce, pubkey and rsakv we want. Of course, we can hard-code the values ??of pubkey and rsakv in the code, they are fixed values.
2. The previous username was calculated by BASE64:
Copy the code as follows:
username_ = urllib.quote(username)
username = base64.encodestring(username)[:-1]
The password has been SHA1 encrypted three times, and the values ??of servertime and nonce have been added to interfere. That is: after SHA1 encryption twice, the result is added with the values ??of servertime and nonce, and then SHA1 is calculated again.
In the latest rsa encryption method, username is still treated the same as before;
The password encryption method is different from the original one:
2.1 First create an rsa The public key and the two parameters of the public key are given fixed values ??on Sina Weibo, but they are all hexadecimal strings. The first one is the pubkey in the first step of logging in, and the second one is in the js encrypted file. '10001'.
These two values ??need to be converted from hexadecimal to decimal first, but they can also be hard-coded in the code. Here, 10001 is directly written as 65537.
The code is as follows:
Copy the code as follows:
rsaPublickey = int(pubkey, 16)
key = rsa.PublicKey(rsaPublickey, 65537) #Create a public key Key
message = str(servertime) + '\t' + str(nonce) + '\n' + str(password) #Get from splicing plaintext js encrypted file
passwd = rsa.encrypt(message, key) #Encryption
passwd = binascii.b2a_hex(passwd) #Convert the encrypted information to hexadecimal.
2.2 Request pass url: login_url ='/sso/login.php?client=ssologin.js(v1.4.4)'
Header information that needs to be sent
< p>Copy the code as follows:postPara = {
'entry': 'weibo',
'gateway': '1',
'from': '',
'savestate': '7',
'userticket': '1',
'ssosimplelogin ': '1',
'vsnf': '1',
'vsnval': '',
'su': encodedUserName, p>
'service': 'miniblog',
'servertime': serverTime,
'nonce': nonce,
'pwencode': 'rsa2',
'sp': encodedPassWord,
'encoding': 'UTF-8',
'prelt': '115',< /p>
'rsakv' : rsakv,
'url': '/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack',
'returntype' : 'META'
}
rsakv is added to the requested content, and the value of pwencode is changed to rsa2. Others are the same as before.
Organize the parameters and make a POST request. To check whether the login is successful, you can refer to the sentence location.replace("/ajaxlogin.php?framelogin=1&callback=parent.sinaSSOController.feedBackUrlCallBack&retcode=101&reason=%B5%C7%C2%BC%C3%FB% in the content obtained after POST BB%F2%C3%DC%C2%EB%B4%ED%CE%F3");
If retcode=101, the login failed. After successful login, the result is similar, but the value of retcode is 0.
3. After successful login, the url in the replace information in the body is the url we will use in the next step. Then use the GET method to make a request to the server for the above URL, and save the cookie information for this request, which is the login cookie we need.