2021年6月21日 06:32 by wst
python小技巧在实际业务中,我们经常碰到需要解析URL的情况,这里把能列的东西全部列出,做个记录。
url示例:
url="https://report.baidu.com/malacca/report.do?uid=866369032262653&prov=1.0>m=1624242291507&appId=1133&channel_id=kuaiyi&version=6.5.0&sip=175.153.138.214&nodeId=1&adSpaceId=4617&adSpaceType=4&deviceModel=BND-AL10&imei=866369032262653&configId=1&operators=70123&network=4&adPlatform=1&eid=null&devisionId=488548120&from=3&etype=7&reqId=f6778a938a7543a19183041c05995ea4&mateId=d41a4b88b7a03892865c0d9102288e6d&adId=1159254694804136798&tasid=a11a5n&taid=0f3fe053de7e669a074992b1fcaeee72&adSrc=42&tk=450be53669615d29f3046a710d08e0ef"
解析步骤:
1. 解析出所有部分;
2. 解析查询参数;
具体代码如下:
from urllib.parse import urlparse, parse_qsl
import html
url = "https://report.baidu.com/malacca/report.do?uid=866369032262653&prov=1.0>m=1624242291507&appId=1133&channel_id=kuaiyi&version=6.5.0&sip=175.153.138.214&nodeId=1&adSpaceId=4617&adSpaceType=4&deviceModel=BND-AL10&imei=866369032262653&configId=1&operators=70123&network=4&adPlatform=1&eid=null&devisionId=488548120&from=3&etype=7&reqId=f6778a938a7543a19183041c05995ea4&mateId=d41a4b88b7a03892865c0d9102288e6d&adId=1159254694804136798&tasid=a11a5n&taid=0f3fe053de7e669a074992b1fcaeee72&adSrc=42&tk=450be53669615d29f3046a710d08e0ef"
parse_obj = urlparse(url)
print("parse_obj:", parse_obj)
print("协议:", parse_obj.scheme)
print("域名:", parse_obj.netloc)
print("请求路径:", parse_obj.path)
print("查询字符串(原始):", parse_obj.query)
# html中的特殊符号解码
query = html.unescape(parse_obj.query)
print("查询字符串(转换):", query)
parse_query = parse_qsl(query)
print("查询参数:", dict(parse_query))