当前位置：首页 > 文章列表 > 文章 > python教程 > Python爬虫如何抓取特定类型的文档_基于正则过滤后缀名实现

Python爬虫如何抓取特定类型的文档_基于正则过滤后缀名实现

2026-05-04 23:55:10 0浏览收藏

怎么入门文章编程？需要学习哪些知识点？这是新手们刚接触编程时常见的问题；下面golang学习网就来给大家整理分享一些知识点，希望能够给初学者一些帮助。本篇文章就来介绍《Python爬虫如何抓取特定类型的文档_基于正则过滤后缀名实现》，涉及到，有需要的可以收藏一下

re.search(r'.pdf$', url) 更可靠，因它可配合先清理 URL 的 # 和 ? 后内容，再精准匹配路径后缀，而 str.endswith() 会因查询参数或锚点返回 False；且正则支持忽略大小写和多格式扩展名。

正则匹配 URL 后缀时，为什么 `re.search(r'\.pdf$', url)` 比 `url.endswith('.pdf')` 更可靠？

因为真实网页中的链接常带查询参数或锚点，比如 https://example.com/report.pdf?version=2#page1。用 str.endswith() 会返回 False，而正则 r'\.pdf$' 能正确锚定在“以 .pdf 结尾”（不考虑 fragment 和 query），前提是先去除 # 和 ? 后的内容。实际处理中建议先用 urllib.parse.urlparse() 提取 path 字段再匹配。

常见错误是直接对原始 url 字符串做后缀判断，漏掉参数干扰；更隐蔽的问题是忽略大小写——.PDF、.Pdf 都应被接受，所以正则推荐写成 r'\.(pdf|docx|xlsx)$' 并加 re.IGNORECASE 标志。

用 `requests` 下载前，如何安全判断响应体是否真为文档内容？

仅靠 URL 后缀不可信：服务端可能返回 200 状态但实际是 HTML 登录页、404 重定向页，或 Content-Type 声明为 text/html 却强行塞了 PDF 二进制流。必须检查三件事：

response.status_code == 200（且非重定向状态码如 302）
response.headers.get('Content-Type', '').lower().startswith(('application/pdf', 'application/vnd.openxmlformats-officedocument'))
len(response.content) > 1024（排除极小的错误响应体）

特别注意：有些站点会把 PDF 放在 iframe 或 JS 动态加载，此时 URL 看似合法，但 requests 直接 GET 返回的是外层 HTML。这种得结合 BeautifulSoup 解析页面，找 </code> 或 <code>fetch(...pdf)</code> 调用。</p> <h3>批量下载时，文件名怎么从 URL 安全提取并保留原始后缀？</h3> <p>别直接用 <code>os.path.basename(url)</code>——URL 可能不含路径，或含多层编码（如 <code>%2F</code>）、参数（<code>?t=123</code>）、锚点（<code>#section</code>）。正确流程是：</p> <ul><li>用 <code>urllib.parse.urlparse(url)</code> 解析出 <code>path</code></li> <li>用 <code>urllib.parse.unquote()</code> 对 <code>path</code> 解码</li> <li>用 <code>os.path.basename()</code> 取最后一段，再用正则 <code>r'[^/\\?#]+\.([a-zA-Z0-9]{2,})$'</code> 提取带后缀的文件名（若没匹配到， fallback 到 <code>hashlib.md5(url.encode()).hexdigest()[:8] + '.pdf'</code>）</li> </ul><p>Windows 下还要过滤非法字符（<code><>:"/\|?*</code>），建议统一替换成下划线；Mac/Linux 用户需注意文件名长度限制，超长名建议截断但保留后缀和哈希前缀。</p> <h3>遇到反爬时，<code>requests</code> 抓不到文档，但浏览器能打开，怎么办？</h3> <p>这类情况大概率是服务端校验了 <code>User-Agent</code>、<code>Referer</code> 或要求执行 JS 渲染。先用浏览器开发者工具看 Network 面板里 PDF 请求的完整 headers 和请求方式（GET/POST？带不带 cookies？）。</p> <p>简单修复可加基础头：</p> <pre class="brush:php;toolbar:false">headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36', 'Accept': 'application/pdf,*/*;q=0.8', 'Referer': 'https://example.com/list/' }</pre> <p>如果仍失败，说明该文档由前端 JS 拼接 URL 或动态生成 token（如 <code>/download?id=123&token=abc</code>），这时必须用 <code>playwright</code> 或 <code>selenium</code> 启动真实浏览器，等 JS 执行完再提取最终 URL——否则正则白过滤，<code>requests</code> 白发请求。</p> <p>真正难啃的是文档藏在登录态后、或需滑动验证的场景，这时候正则过滤后缀只是第一步，后续链路完全依赖身份维持和行为模拟，不能只盯着 URL 规则。</p><p>好了，本文到此结束，带大家了解了《Python爬虫如何抓取特定类型的文档_基于正则过滤后缀名实现》，希望本文对你有所帮助！关注golang学习网公众号，给大家分享更多文章知识！</p> </div> <div class="labsList"> </div> <div class="cateBox"> <div class="cateItem"> <a href="/article/588580.html" title="HTML怎么做抽签工具_HTML随机抽签抽奖工具实现【收藏】" class="img_box"> <img src="/uploads/20260504/177791009169f8c14b23d1b.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="HTML怎么做抽签工具_HTML随机抽签抽奖工具实现【收藏】">HTML怎么做抽签工具_HTML随机抽签抽奖工具实现【收藏】 </a> <dl> <dt class="lineOverflow"><a href="/article/588580.html" title="HTML怎么做抽签工具_HTML随机抽签抽奖工具实现【收藏】" class="aBlack">上一篇<i></i></a></dt> <dd class="lineTwoOverflow">HTML怎么做抽签工具_HTML随机抽签抽奖工具实现【收藏】</dd> </dl> </div> <div class="cateItem"> <a href="/article/588582.html" title="途虎养车怎么设置收货地址_途虎商品配送地址管理" class="img_box"> <img src="/uploads/20260504/177791022569f8c1d157d68.jpg" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="途虎养车怎么设置收货地址_途虎商品配送地址管理"> </a> <dl> <dt class="lineOverflow"><a href="/article/588582.html" class="aBlack" title="途虎养车怎么设置收货地址_途虎商品配送地址管理">下一篇<i></i></a></dt> <dd class="lineTwoOverflow">途虎养车怎么设置收货地址_途虎商品配送地址管理</dd> </dl> </div> </div> </div> </div> <div class="leftContBox pt0"> <div class="pdl20"> <div class="contTit"> <a href="/articlelist.html" class="more" title="查看更多">查看更多<i class="iconfont"></i></a> <div class="tit">最新文章</div> </div> </div> <ul class="newArticleList"> <li> <div class="contBox"> <a href="/article/619959.html" class="img_box" title="Python logging 日志重复打印怎么办：从 Handler 叠加到 propagate 一步步排查"> <img src="/uploads/20260615/1781501759-log-duplicate-scene.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python logging 日志重复打印怎么办：从 Handler 叠加到 propagate 一步步排查"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  20小时前  |   <a href="/articletag/51_new_0_1.html" class="aLightGray" title="日志">日志</a> · <a href="/articletag/858_new_0_1.html" class="aLightGray" title="排查">排查</a> · <a href="/articletag/2337_new_0_1.html" class="aLightGray" title="python">python</a> · <a href="/articletag/5619_new_0_1.html" class="aLightGray" title="logging">logging</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="logging">logging</a> <a href="javascript:;" class="aLightGray" title="handler">handler</a> <a href="javascript:;" class="aLightGray" title="日志排查">日志排查</a> <a href="javascript:;" class="aLightGray" title="日志重复">日志重复</a> <a href="javascript:;" class="aLightGray" title="propagate">propagate</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619959.html" class="aBlack" target="_blank" title="Python logging 日志重复打印怎么办：从 Handler 叠加到 propagate 一步步排查">Python logging 日志重复打印怎么办：从 Handler 叠加到 propagate 一步步排查</a> </dt> <dd class="cont2"> <span><i class="view"></i>299浏览</span> <span class="collectBtn user_collection" data-id="619959" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619949.html" class="img_box" title="Python 正则解析日志实战：命名分组、错误行兜底和接口统计"> <img src="/uploads/20260614/1781416907-python-regex-parse-flow.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python 正则解析日志实战：命名分组、错误行兜底和接口统计"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  1天前  |   <a href="/articletag/766_new_0_1.html" class="aLightGray" title="正则表达式">正则表达式</a> · <a href="/articletag/2337_new_0_1.html" class="aLightGray" title="python">python</a> · <a href="/articletag/3145_new_0_1.html" class="aLightGray" title="数据处理">数据处理</a> · <a href="/articletag/5537_new_0_1.html" class="aLightGray" title="日志分析">日志分析</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="正则表达式">正则表达式</a> <a href="javascript:;" class="aLightGray" title="日志解析">日志解析</a> <a href="javascript:;" class="aLightGray" title="命名分组">命名分组</a> <a href="javascript:;" class="aLightGray" title="接口统计">接口统计</a> <a href="javascript:;" class="aLightGray" title="错误行处理">错误行处理</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619949.html" class="aBlack" target="_blank" title="Python 正则解析日志实战：命名分组、错误行兜底和接口统计">Python 正则解析日志实战：命名分组、错误行兜底和接口统计</a> </dt> <dd class="cont2"> <span><i class="view"></i>308浏览</span> <span class="collectBtn user_collection" data-id="619949" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619930.html" class="img_box" title="Python 原子写配置文件实战：tempfile 和 os.replace 防止半截文件"> <img src="/uploads/20260613/1781341998-python-atomic-write-compare.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python 原子写配置文件实战：tempfile 和 os.replace 防止半截文件"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  2天前  |   <a href="/articletag/16_new_0_1.html" class="aLightGray" title="文件处理">文件处理</a> · <a href="/articletag/172_new_0_1.html" class="aLightGray" title="标准库">标准库</a> · <a href="/articletag/377_new_0_1.html" class="aLightGray" title="配置管理">配置管理</a> · <a href="/articletag/2337_new_0_1.html" class="aLightGray" title="python">python</a> · <a href="/articletag/39836_new_0_1.html" class="aLightGray" title="原子写入">原子写入</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="配置文件">配置文件</a> <a href="javascript:;" class="aLightGray" title="TempFile">TempFile</a> <a href="javascript:;" class="aLightGray" title="os.replace">os.replace</a> <a href="javascript:;" class="aLightGray" title="原子写文件">原子写文件</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619930.html" class="aBlack" target="_blank" title="Python 原子写配置文件实战：tempfile 和 os.replace 防止半截文件">Python 原子写配置文件实战：tempfile 和 os.replace 防止半截文件</a> </dt> <dd class="cont2"> <span><i class="view"></i>209浏览</span> <span class="collectBtn user_collection" data-id="619930" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619929.html" class="img_box" title="Python heapq 统计日志 TopK 实战：大文件里找出高频接口"> <img src="/uploads/20260613/1781341637-python-topk-compare.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python heapq 统计日志 TopK 实战：大文件里找出高频接口"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  2天前  |   <a href="/articletag/172_new_0_1.html" class="aLightGray" title="标准库">标准库</a> · <a href="/articletag/2337_new_0_1.html" class="aLightGray" title="python">python</a> · <a href="/articletag/5537_new_0_1.html" class="aLightGray" title="日志分析">日志分析</a> · <a href="/articletag/39834_new_0_1.html" class="aLightGray" title="TopK">TopK</a> · <a href="/articletag/39835_new_0_1.html" class="aLightGray" title="heapq">heapq</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="counter">counter</a> <a href="javascript:;" class="aLightGray" title="heapq">heapq</a> <a href="javascript:;" class="aLightGray" title="TopK">TopK</a> <a href="javascript:;" class="aLightGray" title="日志统计">日志统计</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619929.html" class="aBlack" target="_blank" title="Python heapq 统计日志 TopK 实战：大文件里找出高频接口">Python heapq 统计日志 TopK 实战：大文件里找出高频接口</a> </dt> <dd class="cont2"> <span><i class="view"></i>329浏览</span> <span class="collectBtn user_collection" data-id="619929" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619924.html" class="img_box" title="Python zipfile 批量打包实战：保留目录结构、过滤临时文件和写入校验"> <img src="/uploads/20260613/1781335153-python-zipfile-filter-verify.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python zipfile 批量打包实战：保留目录结构、过滤临时文件和写入校验"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  2天前  |   <a href="/articletag/172_new_0_1.html" class="aLightGray" title="标准库">标准库</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="/articletag/39792_new_0_1.html" class="aLightGray" title="自动化脚本">自动化脚本</a> · <a href="/articletag/39829_new_0_1.html" class="aLightGray" title="zipfile">zipfile</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="文件过滤">文件过滤</a> <a href="javascript:;" class="aLightGray" title="标准库">标准库</a> <a href="javascript:;" class="aLightGray" title="zipfile">zipfile</a> <a href="javascript:;" class="aLightGray" title="压缩包">压缩包</a> <a href="javascript:;" class="aLightGray" title="批量打包">批量打包</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619924.html" class="aBlack" target="_blank" title="Python zipfile 批量打包实战：保留目录结构、过滤临时文件和写入校验">Python zipfile 批量打包实战：保留目录结构、过滤临时文件和写入校验</a> </dt> <dd class="cont2"> <span><i class="view"></i>437浏览</span> <span class="collectBtn user_collection" data-id="619924" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619918.html" class="img_box" title="Python Pillow 图片批量压缩实战：限制宽度、输出 WebP 和校验清晰度"> <img src="/uploads/20260613/1781320609-python-pillow-quality-tradeoff.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python Pillow 图片批量压缩实战：限制宽度、输出 WebP 和校验清晰度"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  2天前  |   <a href="/articletag/5606_new_0_1.html" class="aLightGray" title="图片处理">图片处理</a> · <a href="/articletag/11603_new_0_1.html" class="aLightGray" title="pillow">pillow</a> · <a href="/articletag/27063_new_0_1.html" class="aLightGray" title="webp">webp</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="/articletag/39823_new_0_1.html" class="aLightGray" title="批量压缩">批量压缩</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="webp">webp</a> <a href="javascript:;" class="aLightGray" title="图片压缩">图片压缩</a> <a href="javascript:;" class="aLightGray" title="批量处理">批量处理</a> <a href="javascript:;" class="aLightGray" title="图片优化">图片优化</a> <a href="javascript:;" class="aLightGray" title="Pillow">Pillow</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619918.html" class="aBlack" target="_blank" title="Python Pillow 图片批量压缩实战：限制宽度、输出 WebP 和校验清晰度">Python Pillow 图片批量压缩实战：限制宽度、输出 WebP 和校验清晰度</a> </dt> <dd class="cont2"> <span><i class="view"></i>299浏览</span> <span class="collectBtn user_collection" data-id="619918" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619907.html" class="img_box" title="Python argparse 命令行工具实战：子命令、参数校验和配置合并"> <img src="/uploads/20260613/1781309915-python-argparse-config-merge.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python argparse 命令行工具实战：子命令、参数校验和配置合并"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  3天前  |   <a href="/articletag/172_new_0_1.html" class="aLightGray" title="标准库">标准库</a> · <a href="/articletag/301_new_0_1.html" class="aLightGray" title="命令行">命令行</a> · <a href="/articletag/2337_new_0_1.html" class="aLightGray" title="python">python</a> · <a href="/articletag/39774_new_0_1.html" class="aLightGray" title="软件教程">软件教程</a> · <a href="/articletag/39808_new_0_1.html" class="aLightGray" title="工具开发">工具开发</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="命令行工具">命令行工具</a> <a href="javascript:;" class="aLightGray" title="参数校验">参数校验</a> <a href="javascript:;" class="aLightGray" title="argparse">argparse</a> <a href="javascript:;" class="aLightGray" title="子命令">子命令</a> <a href="javascript:;" class="aLightGray" title="配置合并">配置合并</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619907.html" class="aBlack" target="_blank" title="Python argparse 命令行工具实战：子命令、参数校验和配置合并">Python argparse 命令行工具实战：子命令、参数校验和配置合并</a> </dt> <dd class="cont2"> <span><i class="view"></i>241浏览</span> <span class="collectBtn user_collection" data-id="619907" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619897.html" class="img_box" title="Python CSV 批量导入实战：分批校验、错误行回写和事务提交"> <img src="/uploads/20260613/1781301479-python-csv-error-feedback.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python CSV 批量导入实战：分批校验、错误行回写和事务提交"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  3天前  |   <a href="/articletag/1392_new_0_1.html" class="aLightGray" title="csv">csv</a> · <a href="/articletag/2337_new_0_1.html" class="aLightGray" title="python">python</a> · <a href="/articletag/3145_new_0_1.html" class="aLightGray" title="数据处理">数据处理</a> · <a href="/articletag/39745_new_0_1.html" class="aLightGray" title="后端开发">后端开发</a> · <a href="/articletag/39802_new_0_1.html" class="aLightGray" title="批量导入">批量导入</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="批处理">批处理</a> <a href="javascript:;" class="aLightGray" title="数据校验">数据校验</a> <a href="javascript:;" class="aLightGray" title="事务提交">事务提交</a> <a href="javascript:;" class="aLightGray" title="CSV批量导入">CSV批量导入</a> <a href="javascript:;" class="aLightGray" title="错误行回写">错误行回写</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619897.html" class="aBlack" target="_blank" title="Python CSV 批量导入实战：分批校验、错误行回写和事务提交">Python CSV 批量导入实战：分批校验、错误行回写和事务提交</a> </dt> <dd class="cont2"> <span><i class="view"></i>204浏览</span> <span class="collectBtn user_collection" data-id="619897" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619888.html" class="img_box" title="Python dataclass 配置管理实战：默认值、环境变量覆盖和启动校验"> <img src="/uploads/20260613/1781292692-python-config-check.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python dataclass 配置管理实战：默认值、环境变量覆盖和启动校验"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  3天前  |   <a href="/articletag/377_new_0_1.html" class="aLightGray" title="配置管理">配置管理</a> · <a href="/articletag/39701_new_0_1.html" class="aLightGray" title="工程实践">工程实践</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="/articletag/39795_new_0_1.html" class="aLightGray" title="dataclass">dataclass</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="环境变量">环境变量</a> <a href="javascript:;" class="aLightGray" title="配置管理">配置管理</a> <a href="javascript:;" class="aLightGray" title="dataclass">dataclass</a> <a href="javascript:;" class="aLightGray" title="启动校验">启动校验</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619888.html" class="aBlack" target="_blank" title="Python dataclass 配置管理实战：默认值、环境变量覆盖和启动校验">Python dataclass 配置管理实战：默认值、环境变量覆盖和启动校验</a> </dt> <dd class="cont2"> <span><i class="view"></i>131浏览</span> <span class="collectBtn user_collection" data-id="619888" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619885.html" class="img_box" title="Python pathlib 批量整理文件实战：按扩展名归档和冲突重命名"> <img src="/uploads/20260613/1781290596-python-pathlib-flow.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python pathlib 批量整理文件实战：按扩展名归档和冲突重命名"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  3天前  |   <a href="/articletag/16_new_0_1.html" class="aLightGray" title="文件处理">文件处理</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="/articletag/39791_new_0_1.html" class="aLightGray" title="pathlib">pathlib</a> · <a href="/articletag/39792_new_0_1.html" class="aLightGray" title="自动化脚本">自动化脚本</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="日志">日志</a> <a href="javascript:;" class="aLightGray" title="shutil">shutil</a> <a href="javascript:;" class="aLightGray" title="pathlib">pathlib</a> <a href="javascript:;" class="aLightGray" title="文件归档">文件归档</a> <a href="javascript:;" class="aLightGray" title="批量整理文件">批量整理文件</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619885.html" class="aBlack" target="_blank" title="Python pathlib 批量整理文件实战：按扩展名归档和冲突重命名">Python pathlib 批量整理文件实战：按扩展名归档和冲突重命名</a> </dt> <dd class="cont2"> <span><i class="view"></i>166浏览</span> <span class="collectBtn user_collection" data-id="619885" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619882.html" class="img_box" title="Python 生成器处理大文件实战：逐行读取、过滤和分批写入"> <img src="/uploads/20260613/1781286914-python-stream-pipeline.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python 生成器处理大文件实战：逐行读取、过滤和分批写入"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  3天前  |   <a href="/articletag/16_new_0_1.html" class="aLightGray" title="文件处理">文件处理</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="/articletag/39787_new_0_1.html" class="aLightGray" title="生成器">生成器</a> · <a href="/articletag/39788_new_0_1.html" class="aLightGray" title="数据清洗">数据清洗</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="生成器">生成器</a> <a href="javascript:;" class="aLightGray" title="内存优化">内存优化</a> <a href="javascript:;" class="aLightGray" title="逐行读取">逐行读取</a> <a href="javascript:;" class="aLightGray" title="大文件处理">大文件处理</a> <a href="javascript:;" class="aLightGray" title="批量写入">批量写入</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619882.html" class="aBlack" target="_blank" title="Python 生成器处理大文件实战：逐行读取、过滤和分批写入">Python 生成器处理大文件实战：逐行读取、过滤和分批写入</a> </dt> <dd class="cont2"> <span><i class="view"></i>311浏览</span> <span class="collectBtn user_collection" data-id="619882" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> <li> <div class="contBox"> <a href="/article/619871.html" class="img_box" title="Python 日志链路追踪实战：用 contextvars 自动带上 trace_id"> <img src="/uploads/20260612/1781270532-python-contextvars-log-scope.webp" onerror="this.src='/assets/images/moren/morentu.png'" alt="Python 日志链路追踪实战：用 contextvars 自动带上 trace_id"> </a> <dl> <dd class="cont1"> <span> <a href="/articlelist/19_new_0_1.html" class="aLightGray" title="文章">文章</a> · <a href="/articlelist/86_new_0_1.html" class="aLightGray" title="python教程">python教程</a>   |  3天前  |   <a href="/articletag/51_new_0_1.html" class="aLightGray" title="日志">日志</a> · <a href="/articletag/1598_new_0_1.html" class="aLightGray" title="链路追踪">链路追踪</a> · <a href="/articletag/39719_new_0_1.html" class="aLightGray" title="Python教程">Python教程</a> · <a href="/articletag/39779_new_0_1.html" class="aLightGray" title="contextvars">contextvars</a> · <a href="javascript:;" class="aLightGray" title="Python">Python</a> <a href="javascript:;" class="aLightGray" title="logging">logging</a> <a href="javascript:;" class="aLightGray" title="contextvars">contextvars</a> <a href="javascript:;" class="aLightGray" title="日志追踪">日志追踪</a> <a href="javascript:;" class="aLightGray" title="trace_id">trace_id</a> <a href="javascript:;" class="aLightGray" title="异步上下文">异步上下文</a> </span> </dd> <dt class="lineOverflow"> <a href="/article/619871.html" class="aBlack" target="_blank" title="Python 日志链路追踪实战：用 contextvars 自动带上 trace_id">Python 日志链路追踪实战：用 contextvars 自动带上 trace_id</a> </dt> <dd class="cont2"> <span><i class="view"></i>370浏览</span> <span class="collectBtn user_collection" data-id="619871" data-type="article" title="收藏"><i class="collect"></i>收藏</span> </dd> </dl> </div> </li> </ul> </div> </div> <div class="mainRight">  <div class="rightContBox" style="margin-top: 0px;"> <div class="rightTit"> <a href="/courselist.html" class="more" title="查看更多">查看更多<i class="iconfont"></i></a> <div class="tit lineOverflow">课程推荐</div> </div> <ul class="lessonRecomRList"> <li> <a href="/course/9.html" class="img_box" target="_blank" title="前端进阶之JavaScript设计模式"> <img src="/uploads/20221222/52fd0f23a454c71029c2c72d206ed815.jpg" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="前端进阶之JavaScript设计模式"> </a> <dl> <dt class="lineTwoOverflow"><a href="/course/9.html" target="_blank" class="aBlack" title="前端进阶之JavaScript设计模式">前端进阶之JavaScript设计模式</a></dt> <dd class="cont1 lineTwoOverflow"> 设计模式是开发人员在软件开发过程中面临一般问题时的解决方案，代表了最佳的实践。本课程的主打内容包括JS常见设计模式以及具体应用场景，打造一站式知识长龙服务，适合有JS基础的同学学习。 </dd> <dd class="cont2">543次学习</dd> </dl> </li> <li> <a href="/course/2.html" class="img_box" target="_blank" title="GO语言核心编程课程"> <img src="/uploads/20221221/634ad7404159bfefc6a54a564d437b5f.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="GO语言核心编程课程"> </a> <dl> <dt class="lineTwoOverflow"><a href="/course/2.html" target="_blank" class="aBlack" title="GO语言核心编程课程">GO语言核心编程课程</a></dt> <dd class="cont1 lineTwoOverflow"> 本课程采用真实案例，全面具体可落地，从理论到实践，一步一步将GO核心编程技术、编程思想、底层实现融会贯通，使学习者贴近时代脉搏，做IT互联网时代的弄潮儿。 </dd> <dd class="cont2">516次学习</dd> </dl> </li> <li> <a href="/course/74.html" class="img_box" target="_blank" title="简单聊聊mysql8与网络通信"> <img src="/uploads/20240103/bad35fe14edbd214bee16f88343ac57c.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="简单聊聊mysql8与网络通信"> </a> <dl> <dt class="lineTwoOverflow"><a href="/course/74.html" target="_blank" class="aBlack" title="简单聊聊mysql8与网络通信">简单聊聊mysql8与网络通信</a></dt> <dd class="cont1 lineTwoOverflow"> 如有问题加微信：Le-studyg；在课程中，我们将首先介绍MySQL8的新特性，包括性能优化、安全增强、新数据类型等，帮助学生快速熟悉MySQL8的最新功能。接着，我们将深入解析MySQL的网络通信机制，包括协议、连接管理、数据传输等，让 </dd> <dd class="cont2">500次学习</dd> </dl> </li> <li> <a href="/course/57.html" class="img_box" target="_blank" title="JavaScript正则表达式基础与实战"> <img src="/uploads/20221226/bbe4083bb3cb0dd135fb02c31c3785fb.jpg" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="JavaScript正则表达式基础与实战"> </a> <dl> <dt class="lineTwoOverflow"><a href="/course/57.html" target="_blank" class="aBlack" title="JavaScript正则表达式基础与实战">JavaScript正则表达式基础与实战</a></dt> <dd class="cont1 lineTwoOverflow"> 在任何一门编程语言中,正则表达式,都是一项重要的知识,它提供了高效的字符串匹配与捕获机制,可以极大的简化程序设计。 </dd> <dd class="cont2">487次学习</dd> </dl> </li> <li> <a href="/course/28.html" class="img_box" target="_blank" title="从零制作响应式网站—Grid布局"> <img src="/uploads/20221223/ac110f88206daeab6c0cf38ebf5fe9ed.jpg" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="从零制作响应式网站—Grid布局"> </a> <dl> <dt class="lineTwoOverflow"><a href="/course/28.html" target="_blank" class="aBlack" title="从零制作响应式网站—Grid布局">从零制作响应式网站—Grid布局</a></dt> <dd class="cont1 lineTwoOverflow"> 本系列教程将展示从零制作一个假想的网络科技公司官网，分为导航，轮播，关于我们，成功案例，服务流程，团队介绍，数据部分，公司动态，底部信息等内容区块。网站整体采用CSSGrid布局，支持响应式，有流畅过渡和展现动画。 </dd> <dd class="cont2">485次学习</dd> </dl> </li> </ul> </div> <div class="rightContBox"> <div class="rightTit"> <a href="/ai.html" class="more" title="查看更多">查看更多<i class="iconfont"></i></a> <div class="tit lineOverflow">AI推荐</div> </div> <ul class="lessonRecomRList"> <li> <a href="/ai/13106.html" target="_blank" title="剧云 - 免费 AI 智能中文剧本创作平台" class="img_box"> <img src="/uploads/ai/20260615/d36c7176-icon-2b0cd581ce.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="剧云 - 免费 AI 智能中文剧本创作平台" style="object-fit:cover;width:100%;height:100%;"> </a> <dl> <dt class="lineTwoOverflow"><a href="/ai/13106.html" class="aBlack" target="_blank" title="剧云">剧云</a></dt> <dd class="cont1 lineTwoOverflow"> 剧云是专业中文剧本创作平台，安全稳定运行十余年，集成AI编剧、剧本医生审核、人物小传、剧情关系图、大纲编写、多人协作、Word导入导出、版权管控功能，数据安全防护，轻松高效创作剧本。 </dd> <dd class="cont2">146次使用</dd> </dl> </li> <li> <a href="/ai/13105.html" target="_blank" title="万象有声 - AI 一站式有声内容创作平台" class="img_box"> <img src="/uploads/ai/20260615/50267bac-icon-c146b001b5.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="万象有声 - AI 一站式有声内容创作平台" style="object-fit:cover;width:100%;height:100%;"> </a> <dl> <dt class="lineTwoOverflow"><a href="/ai/13105.html" class="aBlack" target="_blank" title="万象有声">万象有声</a></dt> <dd class="cont1 lineTwoOverflow"> 万象有声，一个专为有声创作者打造的新一代智能有声内容创作平台。平台提供专业的智能拆章、智能画本编辑、AI配音、AI生成音效、后期制作、智能对轨、智能审听等有声创作全流程工具，可以帮助创作者高效、低成本创作出引人入胜的有声作品。立即体验，让有声书制作更简单！ </dd> <dd class="cont2">149次使用</dd> </dl> </li> <li> <a href="/ai/13104.html" target="_blank" title="Red Skill - 小红书推出的 AI Skill 分发平台" class="img_box"> <img src="/uploads/ai/20260615/red-skill-icon-8f32f63e1a.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="Red Skill - 小红书推出的 AI Skill 分发平台" style="object-fit:cover;width:100%;height:100%;"> </a> <dl> <dt class="lineTwoOverflow"><a href="/ai/13104.html" class="aBlack" target="_blank" title="Red Skill">Red Skill</a></dt> <dd class="cont1 lineTwoOverflow"> 小红书创作服务平台为小红书创作者和机构提供视频上传、数据分析、粉丝管理、创作指导等多项运营服务，助力用户解锁更多创作者专属功能，体验高效创作！ </dd> <dd class="cont2">154次使用</dd> </dl> </li> <li> <a href="/ai/13103.html" target="_blank" title="MiMo Code - 小米大模型团队开源的新一代 AI 编程助手" class="img_box"> <img src="/uploads/ai/20260615/mimo-code-icon-df61883944.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="MiMo Code - 小米大模型团队开源的新一代 AI 编程助手" style="object-fit:cover;width:100%;height:100%;"> </a> <dl> <dt class="lineTwoOverflow"><a href="/ai/13103.html" class="aBlack" target="_blank" title="MiMo Code">MiMo Code</a></dt> <dd class="cont1 lineTwoOverflow"> MiMo Code 是小米大模型团队开源的新一代 AI 编程助手，面向开发者提供代码理解、生成与辅助开发能力，适合作为 AI 编程工具收藏和体验。 </dd> <dd class="cont2">254次使用</dd> </dl> </li> <li> <a href="/ai/13102.html" target="_blank" title="TRAE Work - 字节跳动推出的 AI 原生工作台" class="img_box"> <img src="/uploads/ai/20260615/trae-work-icon-14916d46a4.png" onerror="this.onerror='',this.src='/assets/images/moren/morentu.png'" alt="TRAE Work - 字节跳动推出的 AI 原生工作台" style="object-fit:cover;width:100%;height:100%;"> </a> <dl> <dt class="lineTwoOverflow"><a href="/ai/13102.html" class="aBlack" target="_blank" title="TRAE Work">TRAE Work</a></dt> <dd class="cont1 lineTwoOverflow"> TRAE AI IDE | 国内首款 AI 原生集成开发环境，深度集成 Doubao-1.5-pro 与 DeepSeek 模型，支持中文自然语言一键生成完整代码框架，实时预览前端效果并智能修复 BUG。首创 Builder 模式实现需求到代码的自动化开发，兼容 Windows/macOS 系统，官网下载即用。 </dd> <dd class="cont2">281次使用</dd> </dl> </li> </ul> </div>  <div class="rightContBox"> <div class="rightTit"> <a href="/articlelist.html" class="more" title="查看更多">查看更多<i class="iconfont"></i></a> <div class="tit lineOverflow">相关文章</div> </div> <ul class="aboutArticleRList"> <li> <dl> <dt class="lineTwoOverflow"><a href="/article/80964.html" class="aBlack" title="Flask框架安装技巧：让你的开发更高效">Flask框架安装技巧：让你的开发更高效</a></dt> <dd> <span class="left">2024-01-03</span> <span class="right">501浏览</span> </dd> </dl> </li> <li> <dl> <dt class="lineTwoOverflow"><a href="/article/90241.html" class="aBlack" title="Django框架中的并发处理技巧">Django框架中的并发处理技巧</a></dt> <dd> <span class="left">2024-01-22</span> <span class="right">501浏览</span> </dd> </dl> </li> <li> <dl> <dt class="lineTwoOverflow"><a href="/article/88174.html" class="aBlack" title="提升Python包下载速度的方法——正确配置pip的国内源">提升Python包下载速度的方法——正确配置pip的国内源</a></dt> <dd> <span class="left">2024-01-17</span> <span class="right">501浏览</span> </dd> </dl> </li> <li> <dl> <dt class="lineTwoOverflow"><a href="/article/113474.html" class="aBlack" title="Python与C++：哪个编程语言更适合初学者？">Python与C++：哪个编程语言更适合初学者？</a></dt> <dd> <span class="left">2024-03-25</span> <span class="right">501浏览</span> </dd> </dl> </li> <li> <dl> <dt class="lineTwoOverflow"><a href="/article/120624.html" class="aBlack" title="品牌建设技巧">品牌建设技巧</a></dt> <dd> <span class="left">2024-04-06</span> <span class="right">501浏览</span> </dd> </dl> </li> </ul> </div> </div> </div> <div class="footer"> <div class="footerIn"> <div class="footLeft"> <div class="linkBox"> <a href="/about/1.html" target="_blank" class="aBlack" title="关于我们">关于我们</a> <a href="/about/5.html" target="_blank" class="aBlack" title="免责声明">免责声明</a> <a href="#" class="aBlack" title="意见反馈">意见反馈</a> <a href="/about/2.html" class="aBlack" target="_blank" title="联系我们">联系我们</a> <a href="/send.html" class="aBlack" title="广告合作">内容提交</a> <a href="/manual/go/" target="_blank" class="aBlack" title="手册">手册</a> </div> <div class="footTip">Golang学习网：公益在线Go学习平台，帮助Go学习者快速成长！</div> <div class="shareBox"> <span><i class="qq"></i>技术交流群</span> </div> <div class="copyRight"> Copyright 2023 http://www.17golang.com/ All Rights Reserved ｜ <a href="https://beian.miit.gov.cn/" target="_blank" title="备案">苏ICP备2023003363号-1</a> </div> </div> <div class="footRight"> <ul class="encodeList"> <li> <div class="encodeImg"> <img src="/assets/examples/qrcode_for_gh.jpg" alt="Golang学习网"> </div> <div class="tit">关注公众号</div> <div class="tip">Golang学习网</div> </li> <div class="clear"></div> </ul> </div> <div class="clear"></div> </div> </div>  <style> .popupBg .n-error{ color: red; } </style> <div class="popupBg"> <div class="loginBoxBox"> <div class="imgbg"> <img src="/assets/images/leftlogo.jpg" alt=""> </div>  <div class="loginInfo encodeLogin" style="display: none;"> <div class="closeIcon" onclick="$('.popupBg').hide();"></div> <div class="changeLoginType cursorPointer create_wxqrcode" onclick="$('.loginInfo').hide();$('.passwordLogin').show();"> <div class="tip">密码登录在这里</div> </div> <div class="encodeInfo"> <div class="tit"><i></i> 微信扫码登录或注册</div> <div class="encodeImg"> <span id="wx_login_qrcode"><img src="/assets/examples/code.png" alt="二维码"></span>  </div> <div class="tip">打开微信扫一扫，快速登录/注册</div> </div> <div class="beforeLoginTip">登录即同意 <a href="#" class="aBlue" title="用户协议">用户协议</a> 和 <a href="#" class="aBlue" title="隐私政策">隐私政策</a></div> </div>  <div class="loginInfo passwordLogin"> <div class="closeIcon" onclick="$('.popupBg').hide();"></div> <div class="changeLoginType cursorPointer create_wxqrcode" onclick="$('.loginInfo').hide();$('.encodeLogin').show();"> <div class="tip">微信登录更方便</div> </div> <div class="passwordInfo"> <ul class="logintabs selfTabMenu"> <li class="selfTabItem loginFormLi curr">密码登录</li> <li class="selfTabItem registerFormBox ">注册账号</li> </ul> <div class="selfTabContBox"> <div class="selfTabCont loginFormBox" style="display: block;"> <form name="form" id="login-form" class="form-vertical form" method="POST" action="/index/user/login"> <input type="hidden" name="url" value="//17golang.com/article/588581.html"/> <input type="hidden" name="__token__" value="eeaa1e9abe72aada7095e27067b706a6" /> <div class="form-group" style="height:70px;"> <input class="form-control" id="account" type="text" name="account" value="" data-rule="required" placeholder="邮箱/用户名" autocomplete="off"> </div> <div class="form-group" style="height:70px;"> <input class="form-control" id="password" type="password" name="password" data-rule="required;password" placeholder="密码" autocomplete="off"> </div> <div class="codeBox" style="height:70px;"> <div class="form-group" style="height:70px; width:205px; float: left;"> <input type="text" name="captcha" class="form-control" placeholder="验证码" data-rule="required;length(4)" /> </div> <span class="input-group-btn" style="padding:0;border:none;"> <img src="/captcha.html" width="100" height="45" onclick="this.src = '/captcha.html?r=' + Math.random();"/> </span> </div> <div class="other"> <a href="#" class="forgetPwd aGray" onclick="$('.loginInfo').hide();$('.passwordForget').show();" title="忘记密码">忘记密码</a> </div> <div class="loginBtn mt25"> <button type="submit">登录</button> </div> </form> </div> <div class="selfTabCont registerFormBox" style="display: none;"> <form name="form1" id="register-form" class="form-vertical form" method="POST" action="/index/user/register"> <input type="hidden" name="invite_user_id" value="0"/> <input type="hidden" name="url" value="//17golang.com/article/588581.html"/> <input type="hidden" name="__token__" value="eeaa1e9abe72aada7095e27067b706a6" /> <div class="form-group" style="height:70px;"> <input type="text" name="email" id="email2" data-rule="required;email" class="form-control" placeholder="邮箱"> </div> <div class="form-group" style="height:70px;"> <input type="text" id="username" name="username" data-rule="required;username" class="form-control" placeholder="用户名必须3-30个字符"> </div> <div class="form-group" style="height:70px;"> <input type="password" id="password2" name="password" data-rule="required;password" class="form-control" placeholder="密码必须6-30个字符"> </div> <div class="codeBox" style="height:70px;"> <div class="form-group" style="height:70px; width:205px; float: left;"> <input type="text" name="captcha" class="form-control" placeholder="验证码" data-rule="required;length(4)" /> </div> <span class="input-group-btn" style="padding:0;border:none;"> <img src="/captcha.html" width="100" height="45" onclick="this.src = '/captcha.html?r=' + Math.random();"/> </span> </div> <div class="loginBtn"> <button type="submit">注册</button> </div> </form> </div> </div> </div> <div class="beforeLoginTip">登录即同意 <a href="https://www.17golang.com/about/3.html" target="_blank" class="aBlue" title="用户协议">用户协议</a> 和 <a href="https://www.17golang.com/about/4.html" target="_blank" class="aBlue" title="隐私政策">隐私政策</a></div> </div>  <div class="loginInfo passwordForget"> <div class="closeIcon" onclick="$('.popupBg').hide();"></div> <div class="returnLogin cursorPointer" onclick="$('.passwordForget').hide();$('.passwordLogin').show();">返回登录</div> <div class="passwordInfo"> <ul class="logintabs selfTabMenu"> <li class="selfTabItem">重置密码</li> </ul> <div class="selfTabContBox"> <div class="selfTabCont"> <form id="resetpwd-form" class="form-horizontal form-layer nice-validator n-default n-bootstrap form" method="POST" action="/api/user/resetpwd.html" novalidate="novalidate"> <div style="height:70px;"> <input type="text" class="form-control" id="email" name="email" value="" placeholder="输入邮箱" aria-invalid="true"> </div> <div class="codeBox" style="height:70px;"> <div class="form-group" style="height:70px; width:205px; float: left;"> <input type="text" name="captcha" class="form-control" placeholder="验证码" /> </div> <span class="input-group-btn" style="padding:0;border:none;"> <a href="javascript:;" class="btn btn-primary btn-captcha cursorPointer" style="background: #2080F8; border-radius: 4px; color: #fff; padding: 12px; position: absolute;" data-url="/api/ems/send.html" data-type="email" data-event="resetpwd">发送验证码</a> </span> </div> <input type="password" class="form-control" id="newpassword" name="newpassword" value="" placeholder="请输入6-18位密码"> <div class="loginBtn mt25"> <button type="submit">重置密码</button> </div> </form> </div> </div> </div> </div> </div> </div> <script src="/assets/js/juejin-theme.js?v=20260613b" defer></script> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?3dc5666f6478c7bf39cd5c91e597423d"; hm.async = true; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script> <script src="/assets/js/frontend/common.js"></script> </body> </html>

Python爬虫如何抓取特定类型的文档_基于正则过滤后缀名实现

正则匹配 URL 后缀时，为什么 re.search(r'\.pdf$', url) 比 url.endswith('.pdf') 更可靠？

用 requests 下载前，如何安全判断响应体是否真为文档内容？

正则匹配 URL 后缀时，为什么 `re.search(r'\.pdf$', url)` 比 `url.endswith('.pdf')` 更可靠？

用 `requests` 下载前，如何安全判断响应体是否真为文档内容？