<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Sitoi</title>
  
  
  <link href="https://sitoi.cn/atom.xml" rel="self"/>
  
  <link href="https://sitoi.cn/"/>
  <updated>2025-11-12T05:28:30.735Z</updated>
  <id>https://sitoi.cn/</id>
  
  <author>
    <name>Sitoi</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>如何在 Elasticsearch 中更新索引的 Mapping</title>
    <link href="https://sitoi.cn/posts/39218.html"/>
    <id>https://sitoi.cn/posts/39218.html</id>
    <published>2023-10-14T06:26:21.000Z</published>
    <updated>2025-11-12T05:28:30.735Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>在 Elasticsearch 中更新索引的 Mapping 是一个常见的需求，特别是当您需要对字段类型进行修改时。本文将介绍如何通过创建新索引、将搜索请求重定向到新索引、数据迁移（reindex）、测试并最终删除旧索引来有效地完成这个任务。</p><h2 id="一：创建新的索引"><a href="#一：创建新的索引" class="headerlink" title="一：创建新的索引"></a>一：创建新的索引</h2><ol><li><p>分析当前 Mapping：首先，分析您当前索引的 Mapping。了解哪些字段需要修改，以及它们的新类型是什么。</p></li><li><p>创建新索引：使用 Elasticsearch 的索引创建 API，创建一个新的索引，设置新的 Mapping。确保新的 Mapping 与您的需求一致。</p></li></ol><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">PUT /新索引名称</span><br><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;mappings&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;properties&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">      <span class="attr">&quot;字段1&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;新数据类型&quot;</span></span><br><span class="line">      <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">      <span class="attr">&quot;字段2&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;type&quot;</span><span class="punctuation">:</span> <span class="string">&quot;新数据类型&quot;</span></span><br><span class="line">      <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">      <span class="comment">// 添加其他字段映射</span></span><br><span class="line">    <span class="punctuation">&#125;</span></span><br><span class="line">  <span class="punctuation">&#125;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><h2 id="二：数据迁移（Reindex）"><a href="#二：数据迁移（Reindex）" class="headerlink" title="二：数据迁移（Reindex）"></a>二：数据迁移（Reindex）</h2><ol><li>使用_reindex API：使用 Elasticsearch 的_reindex API 将数据从旧索引迁移到新索引。这个 API 将允许您以非破坏性的方式将数据迁移到新的 Mapping 结构。</li></ol><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">POST _reindex</span><br><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;source&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;index&quot;</span><span class="punctuation">:</span> <span class="string">&quot;旧索引&quot;</span></span><br><span class="line">  <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;dest&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;index&quot;</span><span class="punctuation">:</span> <span class="string">&quot;新索引&quot;</span></span><br><span class="line">  <span class="punctuation">&#125;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>这确保了数据在迁移过程中不会丢失，并且可以在新索引中进行新的 Mapping 变更。</p><h2 id="三：测试"><a href="#三：测试" class="headerlink" title="三：测试"></a>三：测试</h2><ol><li><p>验证数据：确保数据在新索引中正确迁移并且格式符合新的 Mapping。执行一些简单的查询和检查以确认数据质量。</p></li><li><p>性能测试：执行性能测试，确保新索引不会导致性能下降。</p></li></ol><h2 id="四：切换别名"><a href="#四：切换别名" class="headerlink" title="四：切换别名"></a>四：切换别名</h2><ol><li>一旦您验证了新索引的正确性，您可以将别名切换到新索引，使新索引成为主要索引</li></ol><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">POST /_aliases</span><br><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;actions&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">    <span class="punctuation">&#123;</span></span><br><span class="line">      <span class="attr">&quot;remove&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;index&quot;</span><span class="punctuation">:</span> <span class="string">&quot;旧索引名称&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;alias&quot;</span><span class="punctuation">:</span> <span class="string">&quot;别名名称&quot;</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="punctuation">&#123;</span></span><br><span class="line">      <span class="attr">&quot;add&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;index&quot;</span><span class="punctuation">:</span> <span class="string">&quot;新索引名称&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;alias&quot;</span><span class="punctuation">:</span> <span class="string">&quot;别名名称&quot;</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">&#125;</span></span><br><span class="line">  <span class="punctuation">]</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><h2 id="五：删除旧索引"><a href="#五：删除旧索引" class="headerlink" title="五：删除旧索引"></a>五：删除旧索引</h2><ol><li><p>备份旧索引：在删除旧索引之前，确保有必要的备份。您可以使用快照和还原功能来备份索引数据。</p></li><li><p>删除旧索引：使用 DELETE 索引 API 来删除旧索引。在确认新索引正常运行并且不再需要旧索引后进行此步骤。</p></li></ol><h2 id="结论"><a href="#结论" class="headerlink" title="结论"></a>结论</h2><p>通过按照这些步骤更新 Elasticsearch 索引的 Mapping，您可以确保数据的一致性，同时保持搜索应用程序的可用性。在进行此操作之前，请确保备份数据，小心谨慎地执行，并在生产环境中测试它，以确保一切正常。更新 Mapping 对于维护和改进 Elasticsearch 索引非常有帮助，但需要慎重操作以避免潜在问题。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;在 Elasticsearch 中更新索引的 Mapping 是一个常见的需求，特别是当您需要对字段类型进行修改时。本文将介绍如何通过创建新</summary>
      
    
    
    
    <category term="数据库" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/"/>
    
    <category term="Elasticsearch" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/Elasticsearch/"/>
    
    
    <category term="Elasticsearch" scheme="https://sitoi.cn/tags/Elasticsearch/"/>
    
  </entry>
  
  <entry>
    <title>简洁登录：在 Linux 终端中去除 &#39;Last login&#39; 信息</title>
    <link href="https://sitoi.cn/posts/10054.html"/>
    <id>https://sitoi.cn/posts/10054.html</id>
    <published>2023-07-28T02:25:20.000Z</published>
    <updated>2025-11-12T05:28:30.735Z</updated>
    
    <content type="html"><![CDATA[<h2 id="引言"><a href="#引言" class="headerlink" title="引言"></a>引言</h2><p>当我们登录到 Linux 系统的终端时，通常会看到一条形如 “Last login: xxx xxx xxx” 的信息。这是系统默认的登录提示，显示上一次登录的时间和日期。虽然这对于某些用户来说可能是有用的，但对于其他用户来说可能会觉得这个信息有些多余。如果您也是希望在新建终端窗口时不显示这个登录提示信息，那么您来对地方了！在本文中，我将向您展示如何简洁地登录 Linux 终端，去除这个’Last login’信息。</p><h2 id="去除-‘Last-login’-信息的方法"><a href="#去除-‘Last-login’-信息的方法" class="headerlink" title="去除 ‘Last login’ 信息的方法"></a>去除 ‘Last login’ 信息的方法</h2><h3 id="创建-hushlogin-的文件"><a href="#创建-hushlogin-的文件" class="headerlink" title="创建 .hushlogin 的文件"></a>创建 .hushlogin 的文件</h3><p>执行以下命令即可创建该文件：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">touch</span> ~/.hushlogin</span><br></pre></td></tr></table></figure><p>这样，您在新建终端窗口时将不再看到 “Last login” 信息，终端登录界面将更加简洁。请记得，这个方法只影响您当前用户的终端登录，其他用户登录的终端仍会显示 “Last login” 信息。</p><p>如果您希望恢复 “Last login” 信息的显示，只需删除<code>.hushlogin</code>文件即可：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">rm</span> ~/.hushlogin</span><br></pre></td></tr></table></figure><p>这样，您在新建终端窗口时将不再看到 “Last login” 信息，终端登录界面将更加简洁。请记得，这个方法只影响您当前用户的终端登录，其他用户登录的终端仍会显示 “Last login” 信息。</p><h2 id="结论"><a href="#结论" class="headerlink" title="结论"></a>结论</h2><p>现在您知道了如何简洁地登录 Linux 终端，去除 annoying 的 ‘Last login’ 信息。您可以使用这种方法，根据您的喜好和使用习惯来优化终端的显示。这样，您的终端窗口将会更加干净整洁，让您专注于更重要的任务。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;引言&quot;&gt;&lt;a href=&quot;#引言&quot; class=&quot;headerlink&quot; title=&quot;引言&quot;&gt;&lt;/a&gt;引言&lt;/h2&gt;&lt;p&gt;当我们登录到 Linux 系统的终端时，通常会看到一条形如 “Last login: xxx xxx xxx” 的信息。这是系统默认的登录提</summary>
      
    
    
    
    <category term="开发环境" scheme="https://sitoi.cn/categories/%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83/"/>
    
    <category term="Linux" scheme="https://sitoi.cn/categories/%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83/Linux/"/>
    
    
    <category term="zsh" scheme="https://sitoi.cn/tags/zsh/"/>
    
  </entry>
  
  <entry>
    <title>高效下载 NGINX 静态网站：利用 wget 进行递归下载</title>
    <link href="https://sitoi.cn/posts/46937.html"/>
    <id>https://sitoi.cn/posts/46937.html</id>
    <published>2023-07-16T01:59:04.000Z</published>
    <updated>2025-11-12T05:28:30.735Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>NGINX 是一款流行的 Web 服务器软件，用于托管静态网站和处理 HTTP 请求。当您需要下载整个 NGINX 静态网站的所有文件时，使用 wget 命令进行递归下载是一种高效的方法。</p><p>递归下载是指通过 wget 命令下载一个网页后，它会自动解析并下载网页中引用的其他文件，包括 CSS 文件、JavaScript 文件、图像文件等，以确保您获取整个网站的完整内容。</p><h2 id="一、安装-wget"><a href="#一、安装-wget" class="headerlink" title="一、安装 wget"></a>一、安装 wget</h2><h3 id="在-Windows-上安装-wget"><a href="#在-Windows-上安装-wget" class="headerlink" title="在 Windows 上安装 wget"></a>在 Windows 上安装 wget</h3><p>如果您使用的是 Windows 操作系统，可以使用 Chocolatey（choco）包管理器来安装 wget。打开命令提示符或 PowerShell，并执行以下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">choco install wget</span><br></pre></td></tr></table></figure><p>该命令将自动下载并安装 wget 工具。安装完成后，您可以在命令提示符或 PowerShell 中使用 wget 命令。</p><h3 id="在-Mac-上安装-wget"><a href="#在-Mac-上安装-wget" class="headerlink" title="在 Mac 上安装 wget"></a>在 Mac 上安装 wget</h3><p>如果您使用的是 Mac 操作系统，可以使用 Homebrew（brew）包管理器来安装 wget。打开终端，并执行以下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">brew install wget</span><br></pre></td></tr></table></figure><p>该命令将自动下载并安装 wget 工具。安装完成后，您可以在终端中使用 wget 命令。</p><h3 id="在-ubuntu-上安装-wget"><a href="#在-ubuntu-上安装-wget" class="headerlink" title="在 ubuntu 上安装 wget"></a>在 ubuntu 上安装 wget</h3><p>首先，确保您的系统已安装 wget 命令。如果没有安装，可以使用以下命令进行安装（适用于 Debian&#x2F;Ubuntu 系统）：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> apt-get install wget</span><br></pre></td></tr></table></figure><p>对于其他 Linux 发行版，可以使用相应的软件包管理器来安装 wget。</p><h2 id="二、使用-wget-命令下载-NGINX-静态网站"><a href="#二、使用-wget-命令下载-NGINX-静态网站" class="headerlink" title="二、使用 wget 命令下载 NGINX 静态网站"></a>二、使用 wget 命令下载 NGINX 静态网站</h2><p>打开终端，并使用以下命令进行递归下载：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">wget -r -np -nH --cut-dirs=1 --reject <span class="string">&quot;index.html*&quot;</span> -P <span class="string">&quot;/path/to/directory&quot;</span> <span class="string">&quot;https://example.com/&quot;</span></span><br></pre></td></tr></table></figure><p>让我们来详细解释一下这些选项的含义：</p><ul><li><code>-r</code> 或 <code>--recursive</code>：递归下载，获取指定 URL 中的所有文件和子目录。</li><li><code>-np</code> 或 <code>--no-parent</code>：不追踪父级目录，不向上跳转到父级目录。</li><li><code>-nH</code> 或 <code>--no-host-directories</code>：不创建主机目录，不在本地创建额外的主机目录。</li><li><code>--cut-dirs=1</code>：切除目录层级，删除下载文件路径中的一个目录层级。</li><li><code>--reject &quot;index.html*&quot;</code>：拒绝下载特定文件，指定要拒绝下载的文件模式。</li><li><code>-P /path/to/directory</code> 或 <code>--directory-prefix=/path/to/directory</code>：指定下载目录，将下载的文件保存到指定路径下。</li></ul><p>使用这些参数，你可以根据需要配置<code>wget</code>命令来递归下载指定 URL 中的文件，跳过已下载的文件，并将下载的文件保存到指定的目录中。</p><p>根据您的需求，可以根据实际情况调整这些选项。</p><h2 id="三、跳过已下载文件"><a href="#三、跳过已下载文件" class="headerlink" title="三、跳过已下载文件"></a>三、跳过已下载文件</h2><p>为了避免重复下载已经下载过的文件，可以使用<code>-N</code>选项或<code>--timestamping</code>选项。这将使 wget 仅下载那些在源服务器上具有新时间戳或已更新的文件。</p><p>例如：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">wget -r -np -nH --cut-dirs=1 --reject <span class="string">&quot;index.html*&quot;</span> -P <span class="string">&quot;/path/to/directory&quot;</span> -N <span class="string">&quot;https://example.com/&quot;</span></span><br></pre></td></tr></table></figure><h2 id="四、使用代理进行下载"><a href="#四、使用代理进行下载" class="headerlink" title="四、使用代理进行下载"></a>四、使用代理进行下载</h2><p>如果您需要通过代理服务器进行下载，可以使用<code>--proxy</code>选项指定代理服务器的地址和端口号。例如：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">wget -r -np -nH --cut-dirs=1 --proxy=127.0.0.1:7890 --reject <span class="string">&quot;index.html*&quot;</span> -P <span class="string">&quot;/path/to/directory&quot;</span> -N   <span class="string">&quot;https://example.com/&quot;</span></span><br></pre></td></tr></table></figure><p>将<code>127.0.0.1</code>替换为您实际使用的代理服务器地址，<code>7890</code>替换为端口号。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>通过以上步骤，您现在已经了解了如何高效地使用 wget 命令进行递归下载 NGINX 静态网站的全部文件，并学会了如何跳过已下载文件以及如何使用代理进行下载。</p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;NGINX 是一款流行的 Web 服务器软件，用于托管静态网站和处理 HTTP 请求。当您需要下载整个 NGINX 静态网站的所有文件时，使</summary>
      
    
    
    
    <category term="网络" scheme="https://sitoi.cn/categories/%E7%BD%91%E7%BB%9C/"/>
    
    
    <category term="wget" scheme="https://sitoi.cn/tags/wget/"/>
    
    <category term="nginx" scheme="https://sitoi.cn/tags/nginx/"/>
    
  </entry>
  
  <entry>
    <title>mitmproxy 配置二级代理访问外网</title>
    <link href="https://sitoi.cn/posts/29556.html"/>
    <id>https://sitoi.cn/posts/29556.html</id>
    <published>2022-02-18T05:46:45.000Z</published>
    <updated>2025-11-12T05:28:30.734Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>做爬虫时常遇到需要使用外网代理的情况，本文以 Google 为例。</p><h2 id="编写-mitmdump-脚本"><a href="#编写-mitmdump-脚本" class="headerlink" title="编写 mitmdump 脚本"></a>编写 mitmdump 脚本</h2><p>劫持 「Google.com」域名的 URL</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># -*- coding: utf-8 -*-</span></span><br><span class="line"><span class="keyword">from</span> mitmproxy.http <span class="keyword">import</span> flow</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">response</span>(<span class="params">flow: flow</span>):</span><br><span class="line">    url = flow.request.url</span><br><span class="line">    <span class="keyword">if</span> <span class="string">&quot;google.com&quot;</span> <span class="keyword">in</span> url:</span><br><span class="line">        <span class="built_in">print</span>(<span class="string">f&quot;mitm 劫持成功,URL= <span class="subst">&#123;url&#125;</span>&quot;</span>)</span><br></pre></td></tr></table></figure><h2 id="命令行启用-upstream-模式"><a href="#命令行启用-upstream-模式" class="headerlink" title="命令行启用 upstream 模式"></a>命令行启用 upstream 模式</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mitmdump --mode upstream:http://127.0.0.1:7890 -p 8000 -q -s mitm_google.py</span><br></pre></td></tr></table></figure><ul><li><a href="http://127.0.0.1:7890/">http://127.0.0.1:7890</a> : 就是使用的二级代理（能访问外网）</li><li>mitm_google.py : 脚本名称</li></ul><h2 id="开启-mode-前后对比"><a href="#开启-mode-前后对比" class="headerlink" title="开启 mode 前后对比"></a>开启 mode 前后对比</h2><blockquote><p>开启前</p></blockquote><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/mitmproxy/mitmproxy-without-mode.png" alt="开启前"></p><blockquote><p>开启后</p></blockquote><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/mitmproxy/mitmproxy-with-mode.png" alt="开启后"></p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;做爬虫时常遇到需要使用外网代理的情况，本文以 Google 为例。&lt;/p&gt;
&lt;h2 id=&quot;编写-mitmdump-脚本&quot;&gt;&lt;a href=</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    
    <category term="mitmproxy" scheme="https://sitoi.cn/tags/mitmproxy/"/>
    
  </entry>
  
  <entry>
    <title>Dailycheckin - 基于「Docker」/「青龙面板」/「群晖」的每日签到脚本（支持多账号）</title>
    <link href="https://sitoi.cn/posts/18115.html"/>
    <id>https://sitoi.cn/posts/18115.html</id>
    <published>2021-03-03T08:48:50.000Z</published>
    <updated>2025-11-12T05:28:30.735Z</updated>
    
    <content type="html"><![CDATA[<div align="center"><img src="https://socialify.git.ci/Sitoi/dailycheckin/image?font=Rokkitt&forks=1&issues=1&language=1&name=1&owner=1&pattern=Circuit%20Board&pulls=1&stargazers=1&theme=Dark"><h1>DailyCheckIn</h1><p>基于「Docker」&#x2F;「青龙面板」&#x2F;「群晖」&#x2F;「本地」的每日签到脚本</p></div><h2 id="✨-特性"><a href="#✨-特性" class="headerlink" title="✨ 特性"></a>✨ 特性</h2><ul><li>📦 支持 Pypi 包安装</li><li>💻 支持多个平台部署</li><li>⚙️ 支持多个平台签到</li><li>📢 支持多个平台通知</li><li>♾️ 支持多个账号签到</li><li>🕙 支持定时任务设置</li><li>🆙 支持项目自动更新</li></ul><h2 id="🦄-教程"><a href="#🦄-教程" class="headerlink" title="🦄 教程"></a>🦄 教程</h2><p><a href="https://sitoi.github.io/dailycheckin/">https://sitoi.github.io/dailycheckin/</a></p><h2 id="🧾-列表"><a href="#🧾-列表" class="headerlink" title="🧾 列表"></a>🧾 列表</h2><p>🟢: 正常运行 🔴: 脚本暂不可用 🔵: 可以执行(需更新) 🟡: 待测试 🟤: 看脸</p><table><thead><tr><th>状态</th><th>任务名称</th><th>名称网站</th><th>检查日期</th><th>备注</th></tr></thead><tbody><tr><td>🟢️</td><td>KGQQ</td><td><a href="https://kg.qq.com/index-pc.html">全民 K 歌</a></td><td>24.02.20</td><td>每日签到获取鲜花 每日大约 120 鲜花左右</td></tr><tr><td>🟢️</td><td>YOUDAO</td><td><a href="https://note.youdao.com/web/">有道云笔记</a></td><td>24.02.20</td><td>每日签到获取存储空间</td></tr><tr><td>🟢️</td><td>TIEBA</td><td><a href="https://tieba.baidu.com/index.html">百度贴吧</a></td><td>24.02.20</td><td>贴吧每日签到</td></tr><tr><td>🟢️</td><td>BILIBILI</td><td><a href="https://www.bilibili.com/">BiliBili</a></td><td>24.02.20</td><td>直播签到，漫画签到，每日经验任务，自动投币，银瓜子换硬币等功能</td></tr><tr><td>🟢️</td><td>V2EX</td><td><a href="https://www.v2ex.com/">V2EX</a></td><td>24.02.20</td><td>铜币奖励</td></tr><tr><td>🟢️</td><td>ACFUN</td><td><a href="https://www.acfun.cn/">AcFun</a></td><td>24.02.20</td><td>每日签到香蕉</td></tr><tr><td>🟢️</td><td>IQIYI</td><td><a href="https://www.iqiyi.com/">爱奇艺</a></td><td>24.02.20</td><td>① 满签得 7 天会员；② 日常任务 4 成长值；③ 爱奇艺刷时长任务，10 成长值；④ 每日签到随机成长值；⑤ 抽白金会员 5 次；⑥ 摇一摇抽奖 3 次；⑦ 抽奖 3 次</td></tr><tr><td>🟢️</td><td>SMZDM</td><td><a href="https://www.smzdm.com/">什么值得买</a></td><td>24.02.20</td><td>签到和抽奖</td></tr><tr><td>🟢️</td><td>ALIYUN</td><td><a href="https://www.aliyundrive.com/drive/">阿里云盘</a></td><td>24.02.20</td><td>签到获取免费会员和空间</td></tr><tr><td>🟢️</td><td>ENSHAN</td><td><a href="https://www.right.com.cn/forum/">恩山无线论坛</a></td><td>24.02.20</td><td>签到获取硬币和积分</td></tr><tr><td>🟢️</td><td>AOLAXING</td><td><a href="http://www.100bt.com/m/creditMall/?gameId=2#task">奥拉星</a></td><td>24.02.20</td><td>签到获取积分</td></tr><tr><td>🟢️</td><td>IMAOTAI</td><td>i 茅台</td><td>24.02.20</td><td>申购生肖茅台</td></tr><tr><td>🟤</td><td>MIMOTION</td><td>小米运动</td><td>24.02.20</td><td>每日小米运动刷步数</td></tr><tr><td>🟢️</td><td>BAIDU</td><td><a href="https://ziyuan.baidu.com/site/index#/">百度站点</a></td><td>24.02.20</td><td>提交网站页面供百度收录</td></tr></tbody></table><h2 id="💬-通知列表"><a href="#💬-通知列表" class="headerlink" title="💬 通知列表"></a>💬 通知列表</h2><ul><li>dingtalk（钉钉）</li><li>企业微信群机器人（企业微信）</li><li>企业微信应用消息（企业微信）</li><li>telegram（TG）</li><li>Bark（iOS）</li><li>server 酱（微信）</li><li>server 酱 TURBO（微信）</li><li>pushplus（微信）</li><li>Cool Push（QQ,微信,邮箱）</li><li>qmsg 酱（QQ）</li><li>飞书（飞书）</li></ul><h2 id="🤝-参与贡献"><a href="#🤝-参与贡献" class="headerlink" title="🤝 参与贡献"></a>🤝 参与贡献</h2><p>我们非常欢迎各种形式的贡献。如果你对贡献代码感兴趣，可以查看我们的 GitHub <a href="https://github.com/sitoi/dailycheckin/issues">Issues</a>，大展身手，向我们展示你的奇思妙想。</p><p><a href="https://github.com/sitoi/dailycheckin/pulls"><img src="https://img.shields.io/badge/%F0%9F%A4%AF_pr_welcome-%E2%86%92-ffcb47?labelColor=black&style=for-the-badge"></a></p><h3 id="💗-感谢我们的贡献者"><a href="#💗-感谢我们的贡献者" class="headerlink" title="💗 感谢我们的贡献者"></a>💗 感谢我们的贡献者</h3><p><a href="https://github.com/sitoi/dailycheckin/graphs/contributors"><img src="https://contrib.rocks/image?repo=sitoi/dailycheckin"></a></p><h2 id="✨-Star-数"><a href="#✨-Star-数" class="headerlink" title="✨ Star 数"></a>✨ Star 数</h2><p><a href="https://starchart.cc/Sitoi/dailycheckin"><img src="https://starchart.cc/Sitoi/dailycheckin.svg"></a></p><hr><h2 id="📝-License"><a href="#📝-License" class="headerlink" title="📝 License"></a>📝 License</h2><p>This project is <a href="./LICENSE">MIT</a> licensed.</p><!-- LINK GROUP -->]]></content>
    
    
      
      
    <summary type="html">&lt;div align=&quot;center&quot;&gt;

&lt;img src=&quot;https://socialify.git.ci/Sitoi/dailycheckin/image?font=Rokkitt&amp;forks=1&amp;issues=1&amp;language=1&amp;name=1&amp;owner=1&amp;pa</summary>
      
    
    
    
    <category term="Git" scheme="https://sitoi.cn/categories/Git/"/>
    
    
    <category term="Python" scheme="https://sitoi.cn/tags/Python/"/>
    
    <category term="签到" scheme="https://sitoi.cn/tags/%E7%AD%BE%E5%88%B0/"/>
    
    <category term="脚本" scheme="https://sitoi.cn/tags/%E8%84%9A%E6%9C%AC/"/>
    
  </entry>
  
  <entry>
    <title>Mac &amp; Windows 软件推荐</title>
    <link href="https://sitoi.cn/posts/13953.html"/>
    <id>https://sitoi.cn/posts/13953.html</id>
    <published>2020-11-13T04:24:51.000Z</published>
    <updated>2025-11-12T05:28:30.732Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>为了记录自己使用的软件，因为公司用的是 Windows 家里用的是 Mac 所以基本上 两个平台的软件都会挑选适合自己的软件。</p><blockquote><p>关于链接什么的有空再补上吧！</p></blockquote><h2 id="软件列表"><a href="#软件列表" class="headerlink" title="软件列表"></a>软件列表</h2><div class="tabs"><div class="nav-tabs"><button type="button" class="tab active">Windows</button><button type="button" class="tab">Mac</button></div><div class="tab-contents"><div class="tab-item-content active"><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/software/Windows.png" alt="Windows"></p><p><strong>终端</strong></p><ul><li>Windows Terminal</li><li>Git Bash</li></ul><p><strong>图床</strong></p><ul><li>PicGo</li></ul><p><strong>下载工具</strong></p><ul><li>迅雷</li><li>IDM</li></ul><p><strong>解压缩</strong></p><ul><li>360 Zip 国际版</li></ul><p><strong>桌面整理</strong></p><ul><li>腾讯桌面整理</li></ul><p><strong>网盘</strong></p><ul><li>百度网盘</li><li>Nextcloud</li></ul><p><strong>录屏软件</strong></p><ul><li>ScreenToGif</li></ul><p><strong>截图工具</strong></p><ul><li>Snipaste</li></ul><p><strong>远程连接</strong></p><ul><li>远程桌面<ul><li>向日葵</li><li>Microsoft Remote Desktops Beta</li></ul></li><li>远程连接<ul><li>Termius</li><li>Xshell</li><li>Xftp</li></ul></li></ul><p><strong>思维导图</strong></p><ul><li>XMind ZEN</li></ul><p><strong>办公软件</strong></p><ul><li>Office 套件<ul><li>Microsoft Word</li><li>Microsoft Excel</li><li>Microsoft PowerPoint</li></ul></li></ul><p><strong>笔记软件</strong></p><ul><li>幕布</li><li>Typora</li><li>Notion</li></ul><p><strong>科学上网</strong></p><ul><li>Clash</li><li>OpenVPN</li></ul><p><strong>音乐播放器</strong></p><ul><li>QQ 音乐</li></ul><p><strong>视频播放器</strong></p><ul><li>在线<ul><li>腾讯视频</li><li>爱奇艺</li><li>优酷</li></ul></li><li>离线<ul><li>PorPlayer</li></ul></li></ul><p><strong>社交软件</strong></p><ul><li>QQ</li><li>微信</li><li>钉钉</li><li>Telegram</li></ul><p><strong>开发工具</strong></p><ul><li>Visual Studio Code</li><li>JetBrain Tools<ul><li>PyCharm</li><li>WebStorm</li></ul></li><li>Postman</li><li>数据库可视化<ul><li>RDM</li><li>Navicat Premium</li></ul></li><li>抓包<ul><li>Fiddler Everywhere</li><li>mitmproxy</li><li>Charles</li></ul></li></ul><p><strong>输入法</strong></p><ul><li>QQ 拼音输入法</li></ul><p><strong>虚拟机&#x2F;模拟器</strong></p><ul><li>VMvare Workstation</li><li>夜神模拟器</li><li>Docker</li></ul><p><strong>工具集</strong></p><ul><li>PowerToys</li><li>QuickLook</li></ul><p><strong>浏览器</strong></p><ul><li>Google Chrome</li><li>Microsoft Edge</li></ul><p><strong>邮箱</strong></p><ul><li>网易邮箱大师</li></ul><p><strong>GTD</strong></p><ul><li>Microsot To DO</li></ul><p><strong>词典</strong></p><ul><li>欧陆词典</li></ul><p><strong>护眼</strong></p><ul><li>f.lux</li></ul></div><div class="tab-item-content"><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/software/Mac.png" alt="Mac"></p><p><strong>PDF 编辑器</strong></p><ul><li>PDF Expert</li></ul><p><strong>视频编辑</strong></p><ul><li>iMovie</li><li>Final Cut Pro</li><li>ArcTime Pro</li></ul><p><strong>图形设计</strong></p><ul><li>PhotoShop</li><li>Imagine</li></ul><p><strong>图床</strong></p><ul><li>PicGo</li></ul><p><strong>终端</strong></p><ul><li>iTerm 2</li></ul><p><strong>窗口管理</strong></p><ul><li>Magnet</li></ul><p><strong>下载工具</strong></p><ul><li>迅雷</li><li>Downie 4</li></ul><p><strong>系统清理</strong></p><ul><li>腾讯柠檬</li></ul><p><strong>录屏软件</strong></p><ul><li>ScreenFlow</li></ul><p><strong>截图工具</strong></p><ul><li>iShot</li></ul><p><strong>日历</strong></p><ul><li>Itsycal</li></ul><p><strong>远程连接</strong></p><ul><li>远程桌面<ul><li>向日葵</li><li>Microsoft Remote Desktops Beta</li><li>Termius</li></ul></li></ul><p><strong>思维导图</strong></p><ul><li>XMind ZEN</li></ul><p><strong>办公软件</strong></p><ul><li>Office 套件<ul><li>Microsoft Word</li><li>Microsoft Excel</li><li>Microsoft PowerPoint</li></ul></li></ul><p><strong>笔记软件</strong></p><ul><li>幕布</li><li>Typora</li><li>Notion</li></ul><p><strong>科学上网</strong></p><ul><li>ClashX</li><li>OpenVPN</li></ul><p><strong>音乐播放器</strong></p><ul><li>Music</li><li>QQ 音乐</li></ul><p><strong>视频播放器</strong></p><ul><li>在线<ul><li>腾讯视频</li><li>爱奇艺</li><li>优酷</li></ul></li><li>离线<ul><li>IINA</li></ul></li></ul><p><strong>社交软件</strong></p><ul><li>QQ</li><li>微信</li><li>钉钉</li><li>Telegram</li></ul><p><strong>开发工具</strong></p><ul><li>Visual Studio Code</li><li>JetBrain Tools<ul><li>PyCharm</li><li>WebStorm</li></ul></li><li>Xcode</li><li>Postman</li><li>抓包<ul><li>Charles</li><li>Fiddler Everywhere</li><li>mitmproxy</li></ul></li><li>数据库可视化<ul><li>Navicat Premium</li><li>RDM</li></ul></li></ul><p><strong>菜单栏整理</strong></p><ul><li>Barteder 3</li></ul><p><strong>虚拟机&#x2F;模拟器</strong></p><ul><li>Parallels Desktop</li><li>夜神模拟器</li><li>Docker</li></ul><p><strong>其他</strong></p><ul><li>Adguard for Safrai</li><li>CheatSheet</li></ul><p><strong>解压缩</strong></p><ul><li>Keka</li></ul><p><strong>浏览器</strong></p><ul><li>Safari</li><li>Google Chrome</li><li>Edge</li></ul><p><strong>邮箱</strong></p><ul><li>网易邮箱大师</li></ul><p><strong>拾色器</strong></p><ul><li>Pikka</li></ul><p><strong>GTD</strong></p><ul><li>Microsot To DO</li></ul><p><strong>游戏</strong></p><ul><li>Steam</li></ul><p><strong>词典</strong></p><ul><li>欧陆词典</li></ul><p><strong>屏幕常亮</strong></p><ul><li>Amphetamine</li></ul></div></div><div class="tab-to-top"><button type="button" aria-label="scroll to top"><i class="fas fa-arrow-up"></i></button></div></div>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;为了记录自己使用的软件，因为公司用的是 Windows 家里用的是 Mac 所以基本上 两个平台的软件都会挑选适合自己的软件。&lt;/p&gt;
&lt;b</summary>
      
    
    
    
    <category term="网站与应用" scheme="https://sitoi.cn/categories/%E7%BD%91%E7%AB%99%E4%B8%8E%E5%BA%94%E7%94%A8/"/>
    
    
    <category term="Mac" scheme="https://sitoi.cn/tags/Mac/"/>
    
    <category term="Windows" scheme="https://sitoi.cn/tags/Windows/"/>
    
  </entry>
  
  <entry>
    <title>MongoDB 集群开启分片操作</title>
    <link href="https://sitoi.cn/posts/1040.html"/>
    <id>https://sitoi.cn/posts/1040.html</id>
    <published>2020-09-25T05:18:29.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<h2 id="开启数据库分片能力"><a href="#开启数据库分片能力" class="headerlink" title="开启数据库分片能力"></a>开启数据库分片能力</h2><ol><li><p>命令行 进入 <code>mongos</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mongo --host=&lt;host&gt; -u &lt;user&gt;</span><br></pre></td></tr></table></figure></li><li><p>切换到 admin 库</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">use admin</span><br></pre></td></tr></table></figure></li><li><p>对数据库启用分片能力</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">db.adminCommand( &#123;</span><br><span class="line">   enableSharding: <span class="string">&quot;&lt;database name&gt;&quot;</span></span><br><span class="line">&#125; )</span><br></pre></td></tr></table></figure><blockquote><p>这一步是对数据库启用分片能力，同一个库的不同 collection 会分布到不同 shard 上，但是一个 collection 只会存在于一个 shard 上</p></blockquote></li></ol><h2 id="开启集合分片"><a href="#开启集合分片" class="headerlink" title="开启集合分片"></a>开启集合分片</h2><div class="note info flat"><p>索引要在开启分片前建好。虽然 MongoDB 说如果是空库，开启分片时会自动创建不存在的索引，但还是建议你事前手动创建好。</p></div><div class="note info flat"><p>索引字段最好是在空库的时候就建好。数据量很大的时候新建索引一定要小心，第一找业务不忙的时候做，第二千万别忘了加 <code>background</code> 参数。</p></div><p>还是要在 <code>admin</code> 库下执行</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">db.shardCollection( &#123;</span><br><span class="line">   <span class="string">&quot;&lt;database_name&gt;.&lt;collection_name&gt;&quot;</span>: &#123;&lt;shardkey&gt;: &lt;shardtype&gt;&#125;</span><br><span class="line">&#125; )</span><br></pre></td></tr></table></figure><blockquote><p>shardtype 描述</p></blockquote><ul><li><code>1</code> ：范围片健</li><li><code>&quot;hashed&quot;</code> ：哈希片键</li></ul><p><strong>范围片健例子</strong></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sh.shardCollection(<span class="string">&quot;blog.sitoi&quot;</span>, &#123; sitoi: 1 &#125; )</span><br></pre></td></tr></table></figure><p><strong>哈希片键例子</strong></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sh.shardCollection(<span class="string">&quot;blog.sitoi&quot;</span>, &#123; sitoi: <span class="string">&quot;hashed&quot;</span> &#125; )</span><br></pre></td></tr></table></figure><h2 id="参考链接"><a href="#参考链接" class="headerlink" title="参考链接"></a>参考链接</h2><ul><li><a href="https://docs.mongodb.com/manual/reference/command/enableSharding/">https://docs.mongodb.com/manual/reference/command/enableSharding/</a></li><li><a href="https://docs.mongodb.com/manual/reference/method/sh.shardCollection/">https://docs.mongodb.com/manual/reference/method/sh.shardCollection/</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;开启数据库分片能力&quot;&gt;&lt;a href=&quot;#开启数据库分片能力&quot; class=&quot;headerlink&quot; title=&quot;开启数据库分片能力&quot;&gt;&lt;/a&gt;开启数据库分片能力&lt;/h2&gt;&lt;ol&gt;
&lt;li&gt;&lt;p&gt;命令行 进入 &lt;code&gt;mongos&lt;/code&gt;&lt;/p&gt;
&lt;f</summary>
      
    
    
    
    <category term="数据库" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/"/>
    
    <category term="MongoDB" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/MongoDB/"/>
    
    
    <category term="MongoDB" scheme="https://sitoi.cn/tags/MongoDB/"/>
    
    <category term="集群" scheme="https://sitoi.cn/tags/%E9%9B%86%E7%BE%A4/"/>
    
    <category term="分片" scheme="https://sitoi.cn/tags/%E5%88%86%E7%89%87/"/>
    
  </entry>
  
  <entry>
    <title>一键更新 package.json 中的包到最新版本</title>
    <link href="https://sitoi.cn/posts/9059.html"/>
    <id>https://sitoi.cn/posts/9059.html</id>
    <published>2020-09-20T14:15:04.000Z</published>
    <updated>2025-11-12T05:28:30.734Z</updated>
    
    <content type="html"><![CDATA[<h2 id="安装-npm-check-updates"><a href="#安装-npm-check-updates" class="headerlink" title="安装 npm-check-updates"></a>安装 npm-check-updates</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">npm i -g npm-check-updates</span><br></pre></td></tr></table></figure><h2 id="检测更新"><a href="#检测更新" class="headerlink" title="检测更新"></a>检测更新</h2><blockquote><p>ncu 是 npm-check-updates 的缩写命令</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ncu -u</span><br></pre></td></tr></table></figure><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">Upgrading /Users/shitao/WebstormProjects/blog/package.json</span><br><span class="line">[====================] 38/38 100%</span><br><span class="line"></span><br><span class="line"> @babel/preset-env      ^7.11.0  →  ^7.11.5</span><br><span class="line"> eslint                  ^7.6.0  →   ^7.9.0</span><br><span class="line"> hexo-generator-search   ^2.4.0  →   ^2.4.1</span><br><span class="line"> hexo-renderer-marked    ^3.0.0  →   ^3.2.0</span><br><span class="line"> hexo-renderer-stylus    ^1.1.0  →   ^2.0.1</span><br><span class="line"> terser                  ^5.0.0  →   ^5.3.2</span><br><span class="line"> workbox-build           ^5.1.3  →   ^5.1.4</span><br><span class="line"></span><br><span class="line">Run npm install to install new versions.</span><br></pre></td></tr></table></figure><p>更新后会自动修改 package.json 里的版本号，但是 package-lock.json 或者 yarn.lock 不会更新，需要重新 npm install 或者 yarn install。</p><h2 id="升级-npm-包"><a href="#升级-npm-包" class="headerlink" title="升级 npm 包"></a>升级 npm 包</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">npm install</span><br></pre></td></tr></table></figure><h2 id="查看-ncu-帮助"><a href="#查看-ncu-帮助" class="headerlink" title="查看 ncu 帮助"></a>查看 ncu 帮助</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br></pre></td><td class="code"><pre><span class="line">❯ ncu --<span class="built_in">help</span></span><br><span class="line">Usage: ncu [options] [filter]</span><br><span class="line"></span><br><span class="line">[filter] is a list or regex of package names to check (all others will be ignored).</span><br><span class="line"></span><br><span class="line">Options:</span><br><span class="line">  --concurrency &lt;n&gt;            Max number of concurrent HTTP requests to</span><br><span class="line">                               registry. (default: 8)</span><br><span class="line">  --configFilePath &lt;path&gt;      Directory of .ncurc config file (default:</span><br><span class="line">                               directory of `packageFile`).</span><br><span class="line">  --configFileName &lt;filename&gt;  Config file name (default: .ncurc.&#123;json,yml,js&#125;)</span><br><span class="line">  --cwd &lt;path&gt;                 Working directory <span class="keyword">in</span> <span class="built_in">which</span> npm will be executed.</span><br><span class="line">  --dep &lt;dep&gt;                  Check one or more sections of dependencies only:</span><br><span class="line">                               prod, dev, peer, optional, bundle</span><br><span class="line">                               (comma-delimited).</span><br><span class="line">  --deprecated                 Include deprecated packages.</span><br><span class="line">  --doctor                     Iteratively installs upgrades and runs tests to</span><br><span class="line">                               identify breaking upgrades. Run <span class="string">&quot;ncu --doctor&quot;</span></span><br><span class="line">                               <span class="keyword">for</span> detailed <span class="built_in">help</span>. Add <span class="string">&quot;-u&quot;</span> to execute.</span><br><span class="line">  --enginesNode                Include only packages that satisfy engines.node</span><br><span class="line">                               as specified <span class="keyword">in</span> the package file.</span><br><span class="line">  -e, --errorLevel &lt;n&gt;         Set the error level. 1: exits with error code 0</span><br><span class="line">                               <span class="keyword">if</span> no errors occur. 2: exits with error code 0</span><br><span class="line">                               <span class="keyword">if</span> no packages need updating (useful <span class="keyword">for</span></span><br><span class="line">                               continuous integration). (default: 1)</span><br><span class="line">  -f, --filter &lt;matches&gt;       Include only package names matching the given</span><br><span class="line">                               string, comma-or-space-delimited list, or</span><br><span class="line">                               /regex/.</span><br><span class="line">  -g, --global                 Check global packages instead of <span class="keyword">in</span> the current</span><br><span class="line">                               project.</span><br><span class="line">  --greatest                   DEPRECATED. Renamed to <span class="string">&quot;--target greatest&quot;</span>.</span><br><span class="line">  -i, --interactive            Enable interactive prompts <span class="keyword">for</span> each dependency;</span><br><span class="line">                               implies -u unless one of the json options are</span><br><span class="line">                               <span class="built_in">set</span>,</span><br><span class="line">  -j, --jsonAll                Output new package file instead of</span><br><span class="line">                               human-readable message.</span><br><span class="line">  --jsonDeps                   Like `jsonAll` but only lists `dependencies`,</span><br><span class="line">                               `devDependencies`, `optionalDependencies`, etc</span><br><span class="line">                               of the new package data.</span><br><span class="line">  --jsonUpgraded               Output upgraded dependencies <span class="keyword">in</span> json.</span><br><span class="line">  -l, --loglevel &lt;n&gt;           Amount to <span class="built_in">log</span>: silent, error, minimal, warn,</span><br><span class="line">                               info, verbose, silly. (default: <span class="string">&quot;warn&quot;</span>)</span><br><span class="line">  -m, --minimal                Do not upgrade newer versions that are already</span><br><span class="line">                               satisfied by the version range according to</span><br><span class="line">                               semver.</span><br><span class="line">  -n, --newest                 DEPRECATED. Renamed to <span class="string">&quot;--target newest&quot;</span>.</span><br><span class="line">  -p, --packageManager &lt;name&gt;  npm, yarn (default: <span class="string">&quot;npm&quot;</span>)</span><br><span class="line">  -o, --ownerChanged           Check <span class="keyword">if</span> the package owner changed between</span><br><span class="line">                               current and upgraded version.</span><br><span class="line">  --packageData &lt;string&gt;       Package file data (you can also use stdin).</span><br><span class="line">  --packageFile &lt;path&gt;         Package file location (default: ./package.json).</span><br><span class="line">  --pre &lt;n&gt;                    Include -alpha, -beta, -rc. (default: 0; default</span><br><span class="line">                               with --newest and --greatest: 1).</span><br><span class="line">  --prefix &lt;path&gt;              Current working directory of npm.</span><br><span class="line">  -r, --registry &lt;url&gt;         Third-party npm registry.</span><br><span class="line">  --removeRange                Remove version ranges from the final package</span><br><span class="line">                               version.</span><br><span class="line">  --semverLevel &lt;value&gt;        DEPRECATED. Renamed to --target.</span><br><span class="line">  -s, --silent                 Don<span class="string">&#x27;t output anything (--loglevel silent).</span></span><br><span class="line"><span class="string">  -t, --target &lt;value&gt;         Target version to upgrade to: latest, newest,</span></span><br><span class="line"><span class="string">                               greatest, minor, patch.</span></span><br><span class="line"><span class="string">  --timeout &lt;ms&gt;               Global timeout in milliseconds. (default: no</span></span><br><span class="line"><span class="string">                               global timeout and 30 seconds per</span></span><br><span class="line"><span class="string">                               npm-registery-fetch).</span></span><br><span class="line"><span class="string">  -u, --upgrade                Overwrite package file with upgraded versions</span></span><br><span class="line"><span class="string">                               instead of just outputting to console.</span></span><br><span class="line"><span class="string">  -x, --reject &lt;matches&gt;       Exclude packages matching the given string,</span></span><br><span class="line"><span class="string">                               comma-or-space-delimited list, or /regex/.</span></span><br><span class="line"><span class="string">  -V, --version                output the version number</span></span><br><span class="line"><span class="string">  -h, --help                   display help for command</span></span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;安装-npm-check-updates&quot;&gt;&lt;a href=&quot;#安装-npm-check-updates&quot; class=&quot;headerlink&quot; title=&quot;安装 npm-check-updates&quot;&gt;&lt;/a&gt;安装 npm-check-updates&lt;/h2&gt;&lt;</summary>
      
    
    
    
    <category term="前端" scheme="https://sitoi.cn/categories/%E5%89%8D%E7%AB%AF/"/>
    
    
    <category term="npm" scheme="https://sitoi.cn/tags/npm/"/>
    
  </entry>
  
  <entry>
    <title>Linux 统计文件夹中文件个数以及目录个数</title>
    <link href="https://sitoi.cn/posts/12507.html"/>
    <id>https://sitoi.cn/posts/12507.html</id>
    <published>2020-09-10T08:29:50.000Z</published>
    <updated>2025-11-12T05:28:30.732Z</updated>
    
    <content type="html"><![CDATA[<h2 id="列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）"><a href="#列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）" class="headerlink" title="列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）"></a>列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）</h2><p>命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">ls</span> -l</span><br></pre></td></tr></table></figure><p>如下结果:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">drwxrwsr-x    2 sitoi sitoi  4096 Sep 10 16:32 10271113/</span><br><span class="line">drwxrwsr-x    2 sitoi sitoi  4096 Sep 10 16:32 10271114/</span><br><span class="line">drwxrwsr-x    2 sitoi sitoi  4096 Sep 10 16:32 10271115/</span><br><span class="line">drwxrwsr-x    2 sitoi sitoi  4096 Sep 10 16:32 10271116/</span><br><span class="line">drwxrwsr-x    2 sitoi sitoi  4096 Sep 10 16:32 10271117/</span><br><span class="line">drwxrwsr-x    2 sitoi sitoi  4096 Sep 10 16:32 10271118/</span><br></pre></td></tr></table></figure><ul><li><p>终端输出的结果是一行一行的字符，每一行字符对应一个目录或者是文件</p></li><li><p>如果是文件的话，该行的字符串信息的第一个字符显示的是 <code>-</code></p></li><li><p>如果是目录的话，该行的字符的第一个显示的是 <code>d</code>，意即 <code>directory</code>，找到这两者之间的区别，运行能够判别</p></li></ul><h2 id="显示目录中的文件"><a href="#显示目录中的文件" class="headerlink" title="显示目录中的文件"></a>显示目录中的文件</h2><p>命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">ls</span> -l | grep <span class="string">&quot;^-&quot;</span></span><br></pre></td></tr></table></figure><p>其中 <code>&quot;^-&quot;</code> 表示字符串的第一个字符为 <code>-</code></p><p>如下结果:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">-rw-rw-r-- 1 sitoi sitoi  52983 Sep 10 16:31 10272015_1.png</span><br><span class="line">-rw-rw-r-- 1 sitoi sitoi 109263 Sep 10 16:31 10272015_2.png</span><br><span class="line">-rw-rw-r-- 1 sitoi sitoi 121148 Sep 10 16:31 10272015_3.png</span><br><span class="line">-rw-rw-r-- 1 sitoi sitoi 127864 Sep 10 16:31 10272015_4.png</span><br><span class="line">-rw-rw-r-- 1 sitoi sitoi 114144 Sep 10 16:31 10272015_5.png</span><br><span class="line">-rw-rw-r-- 1 sitoi sitoi  99405 Sep 10 16:31 10272015_6.png</span><br></pre></td></tr></table></figure><blockquote><p>我们可以用 wc 命令进行统计:</p></blockquote><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">wc</span> [-lwm]</span><br></pre></td></tr></table></figure><p>参数:</p><ul><li><p><code>-l</code> 仅列出行数</p></li><li><p><code>-w</code> 仅列出多少字(英文单字)</p></li><li><p><code>-m</code> 多少字符</p></li></ul><h2 id="统计文件夹中文件个数"><a href="#统计文件夹中文件个数" class="headerlink" title="统计文件夹中文件个数"></a>统计文件夹中文件个数</h2><p>命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">ls</span> -l ./|grep <span class="string">&quot;^-&quot;</span>|<span class="built_in">wc</span> -l</span><br></pre></td></tr></table></figure><p>如下结果:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">6</span><br></pre></td></tr></table></figure><h2 id="统计文件夹中目录个数"><a href="#统计文件夹中目录个数" class="headerlink" title="统计文件夹中目录个数"></a>统计文件夹中目录个数</h2><p>命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">ls</span> -l ./|grep <span class="string">&quot;^d&quot;</span>|<span class="built_in">wc</span> -l</span><br></pre></td></tr></table></figure><p>如下结果:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">6</span><br></pre></td></tr></table></figure><h2 id="统计文件夹下文件个数，包括子文件"><a href="#统计文件夹下文件个数，包括子文件" class="headerlink" title="统计文件夹下文件个数，包括子文件"></a>统计文件夹下文件个数，包括子文件</h2><p>命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">ls</span> -lR | grep <span class="string">&quot;^-&quot;</span>| <span class="built_in">wc</span> -l</span><br></pre></td></tr></table></figure><p>如下结果:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">34699</span><br></pre></td></tr></table></figure><h2 id="统计文件夹下目录个数，包括子目录"><a href="#统计文件夹下目录个数，包括子目录" class="headerlink" title="统计文件夹下目录个数，包括子目录"></a>统计文件夹下目录个数，包括子目录</h2><p>命令:</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">ls</span> -lR | grep <span class="string">&quot;^d&quot;</span>| <span class="built_in">wc</span> -l</span><br></pre></td></tr></table></figure><p>如下结果:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">2034</span><br></pre></td></tr></table></figure><h2 id="参考链接"><a href="#参考链接" class="headerlink" title="参考链接"></a>参考链接</h2><ul><li><a href="https://blog.csdn.net/sganchang/article/details/91432435">https://blog.csdn.net/sganchang/article/details/91432435</a></li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）&quot;&gt;&lt;a href=&quot;#列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）&quot; class=&quot;headerlink&quot; title=&quot;列出当前文件夹（显示不隐藏的文件与文件夹的详细信息）&quot;&gt;&lt;/a&gt;列出当前文件夹（</summary>
      
    
    
    
    <category term="开发环境" scheme="https://sitoi.cn/categories/%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83/"/>
    
    <category term="Linux" scheme="https://sitoi.cn/categories/%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83/Linux/"/>
    
    
    <category term="shell" scheme="https://sitoi.cn/tags/shell/"/>
    
  </entry>
  
  <entry>
    <title>最新 Navicat Premium 15 破解方法详细教程（Windows）</title>
    <link href="https://sitoi.cn/posts/40411.html"/>
    <id>https://sitoi.cn/posts/40411.html</id>
    <published>2020-09-01T02:34:24.000Z</published>
    <updated>2025-11-12T05:28:30.735Z</updated>
    
    <content type="html"><![CDATA[<div class="note danger flat"><p>在破解安装之前，请先卸载电脑中旧版本的所有 <code>Navicat Premium</code> 并重新安装！</p></div><div class="note danger flat"><p>在破解安装之前，请先卸载电脑中旧版本的所有 <code>Navicat Premium</code> 并重新安装！</p></div><div class="note danger flat"><p>在破解安装之前，请先卸载电脑中旧版本的所有 <code>Navicat Premium</code> 并重新安装！</p></div><h2 id="安装-Navicat-Premium-15"><a href="#安装-Navicat-Premium-15" class="headerlink" title="安装 Navicat Premium 15"></a>安装 Navicat Premium 15</h2><p>官网下载地址：<a href="https://www.navicat.com.cn/download/navicat-premium">https://www.navicat.com.cn/download/navicat-premium</a></p><blockquote><p>安装完成后请打开一次软件并关闭</p></blockquote><h2 id="激活-Navicat-Premium-15"><a href="#激活-Navicat-Premium-15" class="headerlink" title="激活 Navicat Premium 15"></a>激活 Navicat Premium 15</h2><h3 id="下载激活软件-Navicat-Keygen-Patch"><a href="#下载激活软件-Navicat-Keygen-Patch" class="headerlink" title="下载激活软件 Navicat Keygen Patch"></a>下载激活软件 Navicat Keygen Patch</h3><p>下载地址：<a href="https://sitoi.lanzous.com/iDXhOg935ng">https://sitoi.lanzous.com/iDXhOg935ng</a> 密码：<code>4wfy</code></p><blockquote><p>无需断网运行激活软件 Navicat_Keygen_Patch（以管理员身份运行）</p></blockquote><h3 id="点击-Patch-替换-navicat-exe"><a href="#点击-Patch-替换-navicat-exe" class="headerlink" title="点击 Patch 替换 navicat.exe"></a>点击 Patch 替换 navicat.exe</h3><p>点击 <code>Patch</code> 选择 <code>Navicat Premium 15</code> 安装路劲下的 <code>navicat.exe</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Navicat_Keygen_Patch.png" alt="Navicat Keygen Patch"></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/navicat.exe.png" alt="navicat.exe"></p><p>显示下图，表示成功</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/navicat.exe_Cracked.png" alt="navicat.exe Cracked"></p><h3 id="生成注册码"><a href="#生成注册码" class="headerlink" title="生成注册码"></a>生成注册码</h3><ol><li><p>关闭 <code>Navicat Premium 15</code> 并重新打开，点击注册</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/registered.png" alt="注册"></p></li><li><p>修改信息（可选）-&gt; 生成注册码 -&gt; 复制注册码</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Generate_registration_code.png" alt="生成注册码"></p></li><li><p>填入注册码后点击激活，这里会提示注册失败</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Activation_code.png" alt="激活"></p></li><li><p>点击手动激活</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Manual_activation.png" alt="手动激活"></p></li><li><p>复制请求码</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Request_code.png" alt="请求码"></p></li><li><p>将请求码粘贴到激活软件 -&gt; 点击生成激活码 -&gt; 复制激活码</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Generate_activation_code.png" alt="生成激活码"></p></li><li><p>粘贴激活码到 <code>Navicat Premium 15</code> 的激活码，点击激活</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Activate_activation_code.png" alt="激活"></p></li><li><p>提示激活成功</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Activated_successfully.png" alt="激活成功"></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/navicat/Perpetual_license.png" alt="永久许可证"></p></li></ol>]]></content>
    
    
      
      
    <summary type="html">&lt;div class=&quot;note danger flat&quot;&gt;&lt;p&gt;在破解安装之前，请先卸载电脑中旧版本的所有 &lt;code&gt;Navicat Premium&lt;/code&gt; 并重新安装！&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;note danger flat&quot;&gt;&lt;p&gt;在破解安装</summary>
      
    
    
    
    <category term="网站与应用" scheme="https://sitoi.cn/categories/%E7%BD%91%E7%AB%99%E4%B8%8E%E5%BA%94%E7%94%A8/"/>
    
    
    <category term="Navicat Premium 15" scheme="https://sitoi.cn/tags/Navicat-Premium-15/"/>
    
    <category term="Navicat Keygen Patch" scheme="https://sitoi.cn/tags/Navicat-Keygen-Patch/"/>
    
    <category term="破解" scheme="https://sitoi.cn/tags/%E7%A0%B4%E8%A7%A3/"/>
    
  </entry>
  
  <entry>
    <title>Selenium ChromeDriver 设置代理 和 设置认证代理</title>
    <link href="https://sitoi.cn/posts/34819.html"/>
    <id>https://sitoi.cn/posts/34819.html</id>
    <published>2020-08-19T05:27:38.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<div class="note info flat"><p><a href="/posts/14489.html">Selenium &amp; ChromeDriver 全平台安装教程（Mac、Windows、Linux）</a></p></div><h2 id="Selenium-ChromeDriver-代理使用，无密码或已设置白名单-IP"><a href="#Selenium-ChromeDriver-代理使用，无密码或已设置白名单-IP" class="headerlink" title="Selenium + ChromeDriver 代理使用，无密码或已设置白名单 IP"></a>Selenium + ChromeDriver 代理使用，无密码或已设置白名单 IP</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line">chromeOptions = webdriver.ChromeOptions()</span><br><span class="line">chromeOptions.add_argument(<span class="string">&quot;--proxy-server=http://127.0.0.1:7890&quot;</span>)</span><br><span class="line">browser = webdriver.Chrome(options=chromeOptions)</span><br><span class="line">browser.get(<span class="string">&quot;https://httpbin.org/get?show_env=1&quot;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;httpbin.png&quot;</span>)</span><br><span class="line">browser.close()</span><br></pre></td></tr></table></figure><blockquote><p>注: <code>--proxy-server=http://host:port</code> 等号两边不能有空格</p></blockquote><h2 id="Selenium-ChromeDriver-代理使用，支持-Http、Https-账号密码"><a href="#Selenium-ChromeDriver-代理使用，支持-Http、Https-账号密码" class="headerlink" title="Selenium + ChromeDriver 代理使用，支持 Http、Https 账号密码"></a>Selenium + ChromeDriver 代理使用，支持 Http、Https 账号密码</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">create_proxyauth_extension</span>(<span class="params">proxy_host, proxy_port, proxy_username, proxy_password, scheme=<span class="string">&quot;http&quot;</span>, plugin_path=<span class="literal">None</span></span>):</span><br><span class="line">    <span class="string">&quot;&quot;&quot;Proxy Auth Extension</span></span><br><span class="line"><span class="string">    args:</span></span><br><span class="line"><span class="string">        proxy_host (str): domain or ip address, ie proxy.domain.com</span></span><br><span class="line"><span class="string">        proxy_port (int): port</span></span><br><span class="line"><span class="string">        proxy_username (str): auth username</span></span><br><span class="line"><span class="string">        proxy_password (str): auth password</span></span><br><span class="line"><span class="string">    kwargs:</span></span><br><span class="line"><span class="string">        scheme (str): proxy scheme, default http</span></span><br><span class="line"><span class="string">        plugin_path (str): absolute path of the extension</span></span><br><span class="line"><span class="string">    return str -&gt; plugin_path</span></span><br><span class="line"><span class="string">    &quot;&quot;&quot;</span></span><br><span class="line">    <span class="keyword">import</span> string</span><br><span class="line">    <span class="keyword">import</span> zipfile</span><br><span class="line"></span><br><span class="line">    <span class="keyword">if</span> plugin_path <span class="keyword">is</span> <span class="literal">None</span>:</span><br><span class="line">        plugin_path = <span class="string">r&quot;./chrome_proxyauth_plugin.zip&quot;</span></span><br><span class="line">    manifest_json = <span class="string">&quot;&quot;&quot;</span></span><br><span class="line"><span class="string">    &#123;</span></span><br><span class="line"><span class="string">        &quot;version&quot;: &quot;1.0.0&quot;,</span></span><br><span class="line"><span class="string">        &quot;manifest_version&quot;: 2,</span></span><br><span class="line"><span class="string">        &quot;name&quot;: &quot;Chrome Proxy&quot;,</span></span><br><span class="line"><span class="string">        &quot;permissions&quot;: [</span></span><br><span class="line"><span class="string">            &quot;proxy&quot;,</span></span><br><span class="line"><span class="string">            &quot;tabs&quot;,</span></span><br><span class="line"><span class="string">            &quot;unlimitedStorage&quot;,</span></span><br><span class="line"><span class="string">            &quot;storage&quot;,</span></span><br><span class="line"><span class="string">            &quot;&lt;all_urls&gt;&quot;,</span></span><br><span class="line"><span class="string">            &quot;webRequest&quot;,</span></span><br><span class="line"><span class="string">            &quot;webRequestBlocking&quot;</span></span><br><span class="line"><span class="string">        ],</span></span><br><span class="line"><span class="string">        &quot;background&quot;: &#123;</span></span><br><span class="line"><span class="string">            &quot;scripts&quot;: [&quot;background.js&quot;]</span></span><br><span class="line"><span class="string">        &#125;,</span></span><br><span class="line"><span class="string">        &quot;minimum_chrome_version&quot;:&quot;22.0.0&quot;</span></span><br><span class="line"><span class="string">    &#125;</span></span><br><span class="line"><span class="string">    &quot;&quot;&quot;</span></span><br><span class="line">    background_js = string.Template(</span><br><span class="line">        <span class="string">&quot;&quot;&quot;</span></span><br><span class="line"><span class="string">        var config = &#123;</span></span><br><span class="line"><span class="string">                mode: &quot;fixed_servers&quot;,</span></span><br><span class="line"><span class="string">                rules: &#123;</span></span><br><span class="line"><span class="string">                  singleProxy: &#123;</span></span><br><span class="line"><span class="string">                    scheme: &quot;$&#123;scheme&#125;&quot;,</span></span><br><span class="line"><span class="string">                    host: &quot;$&#123;host&#125;&quot;,</span></span><br><span class="line"><span class="string">                    port: parseInt($&#123;port&#125;)</span></span><br><span class="line"><span class="string">                  &#125;,</span></span><br><span class="line"><span class="string">                  bypassList: [&quot;foobar.com&quot;]</span></span><br><span class="line"><span class="string">                &#125;</span></span><br><span class="line"><span class="string">              &#125;;</span></span><br><span class="line"><span class="string">        chrome.proxy.settings.set(&#123;value: config, scope: &quot;regular&quot;&#125;, function() &#123;&#125;);</span></span><br><span class="line"><span class="string">        function callbackFn(details) &#123;</span></span><br><span class="line"><span class="string">            return &#123;</span></span><br><span class="line"><span class="string">                authCredentials: &#123;</span></span><br><span class="line"><span class="string">                    username: &quot;$&#123;username&#125;&quot;,</span></span><br><span class="line"><span class="string">                    password: &quot;$&#123;password&#125;&quot;</span></span><br><span class="line"><span class="string">                &#125;</span></span><br><span class="line"><span class="string">            &#125;;</span></span><br><span class="line"><span class="string">        &#125;</span></span><br><span class="line"><span class="string">        chrome.webRequest.onAuthRequired.addListener(</span></span><br><span class="line"><span class="string">                    callbackFn,</span></span><br><span class="line"><span class="string">                    &#123;urls: [&quot;&lt;all_urls&gt;&quot;]&#125;,</span></span><br><span class="line"><span class="string">                    [&#x27;blocking&#x27;]</span></span><br><span class="line"><span class="string">        );</span></span><br><span class="line"><span class="string">        &quot;&quot;&quot;</span></span><br><span class="line">    ).substitute(host=proxy_host, port=proxy_port, username=proxy_username, password=proxy_password, scheme=scheme)</span><br><span class="line">    <span class="keyword">with</span> zipfile.ZipFile(plugin_path, <span class="string">&quot;w&quot;</span>) <span class="keyword">as</span> zp:</span><br><span class="line">        zp.writestr(<span class="string">&quot;manifest.json&quot;</span>, manifest_json)</span><br><span class="line">        zp.writestr(<span class="string">&quot;background.js&quot;</span>, background_js)</span><br><span class="line">    <span class="keyword">return</span> plugin_path</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">proxyauth_plugin_path = create_proxyauth_extension(</span><br><span class="line">    proxy_host=<span class="string">&quot;127.0.0.1&quot;</span>,</span><br><span class="line">    proxy_port=<span class="number">7890</span>,</span><br><span class="line">    proxy_username=<span class="literal">None</span>,</span><br><span class="line">    proxy_password=<span class="literal">None</span></span><br><span class="line">)</span><br><span class="line">options = webdriver.ChromeOptions()</span><br><span class="line">options.add_argument(<span class="string">&quot;--start-maximized&quot;</span>)</span><br><span class="line">options.add_extension(proxyauth_plugin_path)</span><br><span class="line">browser = webdriver.Chrome(options=options)</span><br><span class="line">browser.get(<span class="string">&quot;https://httpbin.org/get?show_env=1&quot;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;httpbin.png&quot;</span>)</span><br><span class="line">browser.close()</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;div class=&quot;note info flat&quot;&gt;&lt;p&gt;&lt;a href=&quot;/posts/14489.html&quot;&gt;Selenium &amp;amp; ChromeDriver 全平台安装教程（Mac、Windows、Linux）&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;

&lt;h2 id=&quot;Se</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    <category term="Selenium" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/Selenium/"/>
    
    
    <category term="Selenium" scheme="https://sitoi.cn/tags/Selenium/"/>
    
    <category term="ChromeDriver" scheme="https://sitoi.cn/tags/ChromeDriver/"/>
    
    <category term="Proxy" scheme="https://sitoi.cn/tags/Proxy/"/>
    
  </entry>
  
  <entry>
    <title>【Python3 爬虫 js 逆向】今日头条 as、cp、_signature 参数</title>
    <link href="https://sitoi.cn/posts/11194.html"/>
    <id>https://sitoi.cn/posts/11194.html</id>
    <published>2020-07-17T02:58:50.000Z</published>
    <updated>2025-11-12T05:28:30.734Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前情提要"><a href="#前情提要" class="headerlink" title="前情提要"></a>前情提要</h2><div class="note danger flat"><p>爬虫具有时效性，此篇文章代码不一定长期有效，但是解决方案通用。</p></div><div class="note info flat"><p>版本信息：2020-07-17</p></div><p>今日头条 <code>web</code> 版的请求主要参数是：<code>as</code>、<code>cp</code>、<code>_signature</code>。</p><ul><li><code>as</code>、<code>cp</code> 比较简单，直接使用 <code>js</code> 源码，或者用 <code>python</code> 编译都可以</li><li><code>_signature</code> 比较复杂</li></ul><h2 id="URL-分析"><a href="#URL-分析" class="headerlink" title="URL 分析"></a>URL 分析</h2><p>随便打开今日头条网页版一个界面，示例这里打开的是 <a href="https://www.toutiao.com/ch/news_hot/">热点分栏</a> 地址：<a href="https://www.toutiao.com/ch/news_hot/">https://www.toutiao.com/ch/news_hot&#x2F;</a></p><p>我们向下滑动页面，不断加载出新的内容</p><p>按 <code>F12</code> 打开开发者工具，选择 <code>Network</code> 中的 <code>XHR</code> 标签，继续下滑头条网页，观察网页请求链接</p><p>以下为三个示例链接，我们分析一下：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=0&amp;max_behot_time_tmp=0&amp;tadrequire=true&amp;as=A1E51F21B0A055D&amp;cp=5F10201525DD4E1&amp;_signature=_02B4Z6wo00f01jcKhsgAAIBAdPSMZ6-fGcI3D4JAANLfaIBd69iVqrqwt-Kzkp68yjCiTBebZn4bKtxcot5cz26TAvNJxqWymSmizGkrEL3-TkzTvjaW14sJJpUdGO-qtIjt.n.qWnE26C8g79</span><br><span class="line">https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594880609&amp;max_behot_time_tmp=1594880609&amp;tadrequire=true&amp;as=A1F58F71E04057E&amp;cp=5F1030A557BEAE1&amp;_signature=_02B4Z6wo00901tH42wgAAIBAkgbRpdhpFFbR.d-AAOt8c3CZDocehB19PuHUmDrMDvCRZp9PXbVULneN4NWmDbAaPPGPWLtRA9--LfxHyF7itVXaG6r5K8bMdDlZeFZqFmVD3ExhcFH9u52b84</span><br><span class="line">https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&amp;_signature=_02B4Z6wo00501-pBU5QAAIBBqb9ZOOz-JLfqRFcAAKWKddCx4Y7Ps7qRC.B89m1IPx7kVtIM9Dy4i2lN8gSXryJypKZG7gVFrub3gVeiJxy8SjWeeg8O1c4-OQN2YJLbXyVanlfiHvufxjHi59</span><br></pre></td></tr></table></figure><p>经过比较发现关键变量有：<code>max_behot_time</code>、<code>as</code>、<code>cp</code>、<code>_signature</code>，接下来我们就对这四个变量进行分析。</p><h2 id="max-behot-time-分析"><a href="#max-behot-time-分析" class="headerlink" title="max_behot_time 分析"></a>max_behot_time 分析</h2><p><code>max_behot_time</code> 的数值看似是时间戳，但是比较发现，并不是访问链接时的真实时间戳。</p><p>推断是由特定函数生成。</p><p>我们观察一下网页请求返回的 <code>json</code> 数据。发现除了返回的新闻内容之外，还有一个 <code>next</code>，包含 <code>max_behot_time</code> 的值。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/max_behot_time.png" alt="max_behot_time"></p><p>通过比较发现，这个 <code>next</code> 中 <code>max_behot_time</code> 的值，正是页面下滑时，下一个请求 <code>url</code> 中 <code>max_behot_time</code>。</p><p>由于头条没有明确的页码，于是判断由 <code>max_behot_time</code> 的数值充当 <code>页码</code>。由于 <code>next</code> 的值可以直接获取，我们就不必分析其生成函数了。</p><h2 id="as、cp-分析"><a href="#as、cp-分析" class="headerlink" title="as、cp 分析"></a>as、cp 分析</h2><p>按 <code>F12</code> 打开开发者工具，选择 <code>Network</code> 按 <code>Ctrl + F</code> 进入全局搜索，搜索 <code>as</code> 。</p><p>因为词太短，我们发现了上百条数据。想找 <code>as</code> 的生成函数犹如大海捞针。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/as.png" alt="as"></p><p>换个思路，我们可以查一下 <code>max_behot_time</code>，在关键函数周围观察一下有没有 <code>as</code>、<code>cp</code> 的生成函数。</p><p>按 <code>F12</code> 打开开发者工具，选择 <code>Network</code> 按 <code>Ctrl + F</code> 进入全局搜索，搜索 <code>max_behot_time</code>。只有一条函数，格式化代码后观察：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/findas.png" alt="as"></p><p>我们不必看 <code>max_behot_time</code>，正好它下方有 <code>as</code>、<code>cp</code> 的函数。为了判断是不是我们要的值，我们在函数结尾处打断点，刷新网页，查看 <code>as</code>、<code>cp</code> 的数值。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/getHoney.png" alt="getHoney"></p><p>正是我们需要的 as、cp 的值，再观察函数，由 e 函数生成，即上图画红圈部分。关键函数为 <code>ascp.getHoney</code> ，我们把鼠标放在 <code>ascp.getHoney</code> 上跳转到相关函数。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/ascpmd5.png" alt="ascpmd5"></p><p>这里就是 <code>as</code>、<code>cp</code> 的计算函数了。 <code>i = md5(t)</code> 使用的是 md5 加密，感兴趣的朋友可以深入研究一下。我们可以直接将 js 代码转换为 <code>python</code> 代码，方便调用。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> hashlib</span><br><span class="line"><span class="keyword">import</span> time</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">get_honey</span>():</span><br><span class="line">    t = <span class="built_in">int</span>(time.time())</span><br><span class="line">    e = <span class="built_in">hex</span>(t).upper()[<span class="number">2</span>:]</span><br><span class="line">    md = hashlib.md5()</span><br><span class="line">    md.update(<span class="built_in">str</span>(t).encode(<span class="string">&#x27;utf-8&#x27;</span>))</span><br><span class="line">    i = <span class="built_in">str</span>(md.hexdigest()).upper()</span><br><span class="line">    <span class="keyword">if</span> <span class="built_in">len</span>(e) != <span class="number">8</span>:</span><br><span class="line">        <span class="keyword">return</span> &#123;<span class="string">&#x27;as&#x27;</span>: <span class="string">&quot;479BB4B7254C150&quot;</span>, <span class="string">&#x27;cp&#x27;</span>: <span class="string">&quot;7E0AC8874BB0985&quot;</span>&#125;</span><br><span class="line">    s = r = <span class="string">&#x27;&#x27;</span></span><br><span class="line">    <span class="keyword">for</span> k <span class="keyword">in</span> <span class="built_in">range</span>(<span class="number">0</span>, <span class="number">5</span>):</span><br><span class="line">        s = s + i[:<span class="number">5</span>][k] + e[k]</span><br><span class="line">        r = r + e[k+<span class="number">3</span>] + i[-<span class="number">5</span>:][k]</span><br><span class="line">    <span class="keyword">return</span> &#123;<span class="string">&#x27;as&#x27;</span>: <span class="string">&quot;A1&quot;</span> + s + e[-<span class="number">3</span>:], <span class="string">&#x27;cp&#x27;</span>: e[:<span class="number">3</span>] + r + <span class="string">&quot;E1&quot;</span>&#125;</span><br></pre></td></tr></table></figure><p>到这里我们就获取到了 <code>as</code>、<code>cp</code> 的值了。</p><h2 id="signature-分析"><a href="#signature-分析" class="headerlink" title="_signature 分析"></a>_signature 分析</h2><p>按 <code>F12</code> 打开开发者工具，选择 <code>Network</code> 按 <code>Ctrl + F</code> 进入全局搜索，搜索 <code>_signature</code>。</p><p>我们看到两条结果。两条都看一下：第一条是构造函数，第二条只是调用了值。我们分析第一条。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/signature.png" alt="signature"></p><p>在关键函数结尾行打断点，刷新页面。等待页面解析完成后，鼠标放在 <code>_signature</code> 上，看到了我们想要的值。仔细观察，<code>_signature</code> 的值由 <code>tacSign</code> 函数生成。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/tacsign.png" alt="tacsign"></p><p>鼠标放在 <code>tacSign</code> 上，点击上方的 <code>f tacSign(e,t)</code> 跳转到相关函数。见下图</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/tacsign_func.png" alt="tacsign_func"></p><p>把上一个函数打的断点取消！然后在 <code>tacSign</code> 函数结尾行打断点，点击下图蓝色箭头 <code>F8</code> ，刷新界面。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/f8.png" alt="f8"></p><p>可以看到 <code>i</code> 是我们想要的值，由 <code>window.byted_acrawler.sign(o)</code> 生成，参数 <code>o</code> 为访问链接。</p><div class="note info flat"><p>正常流程为先获取 <code>as</code>、<code>cp</code> 值，然后构造链接作为参数 <code>o</code> 调用 <code>window.byted_acrawler.sign</code> 得到 <code>_signature</code></p></div><p>鼠标放在 <code>window.byted_acrawler.sign</code> 上，点击弹出的 <code>f e()</code>，跳转到目标函数。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/arguments.png" alt="arguments"></p><p>跳转到这里，看到这个千万别蒙蔽，这只是一个超级大的函数而已，大概 500 行。</p><p>我们不必完全看懂，把整个 js 文件考出来即可。</p><p>自己拷贝就好，我这里不贴完整代码了，近 500 行，只放一下开头和结尾</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">var</span> _typeof = <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; <span class="string">&quot;symbol&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span>.<span class="property">iterator</span> ? <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">: <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> f &amp;&amp; <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; f.<span class="property">constructor</span> === <span class="title class_">Symbol</span> &amp;&amp; f !== <span class="title class_">Symbol</span>.<span class="property"><span class="keyword">prototype</span></span> ? <span class="string">&quot;symbol&quot;</span> : <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">;</span><br><span class="line"><span class="variable constant_">TAC</span> = <span class="keyword">function</span>(<span class="params"></span>) &#123;</span><br><span class="line">    <span class="keyword">function</span> <span class="title function_">f</span>(<span class="params">f, a, b, d, c, r</span>) &#123;</span><br><span class="line">        ...</span><br><span class="line">    &#125;</span><br><span class="line">&#125;(),</span><br><span class="line"><span class="title function_">TAC</span>(<span class="string">&quot;484e4f4a4......&quot;</span>, []);</span><br></pre></td></tr></table></figure><p>我们把上述代码保存为单独的文件，比如 <code>sign.js</code></p><p>在结尾加上两行代码测试一下输出：</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">sign = <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">sign</span>(&#123;</span><br><span class="line">  <span class="attr">url</span>: <span class="string">&quot;https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&quot;</span>,</span><br><span class="line">&#125;);</span><br><span class="line"><span class="variable language_">console</span>.<span class="title function_">log</span>(sign);</span><br></pre></td></tr></table></figure><p>我是在 <code>Pycharm</code> 中安装了 <code>node.js</code> 插件，所以可以在 <code>Pycharm</code> 中直接运行。</p><h2 id="js-运行报错"><a href="#js-运行报错" class="headerlink" title="js 运行报错"></a>js 运行报错</h2><h3 id="window-is-not-defined"><a href="#window-is-not-defined" class="headerlink" title="window is not defined"></a>window is not defined</h3><p><strong>错误信息</strong></p><p>运行 js，报错信息如下：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/windowisnotdefined.png" alt="window is not defined"></p><p><strong>解决方案</strong></p><p>在开头添加一下 <code>window</code></p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="variable language_">window</span> = <span class="variable language_">global</span>;</span><br></pre></td></tr></table></figure><h3 id="Cannot-read-property-‘href’-of-undefined"><a href="#Cannot-read-property-‘href’-of-undefined" class="headerlink" title="Cannot read property ‘href’ of undefined"></a>Cannot read property ‘href’ of undefined</h3><p><strong>错误信息</strong></p><p>运行 js，报错信息如下：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/hrefofundefined.png" alt="href of undefined"></p><p><strong>解决方案</strong></p><p>用<code>jsdom</code> 模拟环境</p><p>安装好 <code>node.js</code> 后，在命令行模式下使用 <code>npm install jsdom</code> 安装。</p><p>安装好后，写一个最简单的界面，然后添加头条的 <code>href</code>。</p><p>那么头条的 <code>href</code> 在哪里呢？我们打开头条页面，按 <code>F12</code> 打开开发者工具,选择 <code>Console</code>，输入 <code>window.location</code> 后回车，可见下图：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/window.location.png" alt="window.location"></p><p>我们在 <code>window.location</code> 中添加 <code>href</code> 即可，为了更安全，我们把 <code>location</code> 中其他参数也添加进去。</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> jsdom = <span class="built_in">require</span>(<span class="string">&quot;jsdom&quot;</span>);</span><br><span class="line"><span class="keyword">const</span> &#123; <span class="variable constant_">JSDOM</span> &#125; = jsdom;</span><br><span class="line"><span class="keyword">const</span> dom = <span class="keyword">new</span> <span class="title function_">JSDOM</span>(<span class="string">`&lt;!DOCTYPE html&gt;&lt;p&gt;Hello world&lt;/p&gt;`</span>);</span><br><span class="line"><span class="variable language_">window</span> = <span class="variable language_">global</span>;</span><br><span class="line"><span class="keyword">var</span> <span class="variable language_">document</span> = dom.<span class="property">window</span>.<span class="property">document</span>;</span><br><span class="line"><span class="keyword">var</span> params = &#123;</span><br><span class="line">    <span class="attr">location</span>:&#123;</span><br><span class="line">        <span class="attr">hash</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">host</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">hostname</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">href</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">origin</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">pathname</span>: <span class="string">&quot;/&quot;</span>,</span><br><span class="line">        <span class="attr">port</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">protocol</span>: <span class="string">&quot;https:&quot;</span>,</span><br><span class="line">        <span class="attr">search</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">&#125;;</span><br><span class="line"><span class="title class_">Object</span>.<span class="title function_">assign</span>(<span class="variable language_">window</span>,params);</span><br><span class="line"><span class="variable language_">window</span>.<span class="property">document</span> = <span class="variable language_">document</span>;</span><br><span class="line"><span class="comment">// ----------这里是复制的近 500 行代码 ----------</span></span><br><span class="line"><span class="keyword">var</span> _typeof = <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; <span class="string">&quot;symbol&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span>.<span class="property">iterator</span> ? <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">: <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> f &amp;&amp; <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; f.<span class="property">constructor</span> === <span class="title class_">Symbol</span> &amp;&amp; f !== <span class="title class_">Symbol</span>.<span class="property"><span class="keyword">prototype</span></span> ? <span class="string">&quot;symbol&quot;</span> : <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">;</span><br><span class="line"><span class="variable constant_">TAC</span> = <span class="keyword">function</span>(<span class="params"></span>) &#123;</span><br><span class="line">    <span class="keyword">function</span> <span class="title function_">f</span>(<span class="params">f, a, b, d, c, r</span>) &#123;</span><br><span class="line">        ...</span><br><span class="line">    &#125;</span><br><span class="line">&#125;(),</span><br><span class="line"><span class="title function_">TAC</span>(<span class="string">&quot;484e4f4a4......&quot;</span>, []);</span><br><span class="line"><span class="comment">// ----------------------------------------</span></span><br><span class="line"></span><br><span class="line">sign = <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">sign</span>(&#123;<span class="attr">url</span>:<span class="string">&quot;https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&quot;</span>&#125;);</span><br><span class="line"><span class="variable language_">console</span>.<span class="title function_">log</span>(sign);</span><br></pre></td></tr></table></figure><h3 id="Cannot-read-property-‘userAgent’-of-undefined"><a href="#Cannot-read-property-‘userAgent’-of-undefined" class="headerlink" title="Cannot read property ‘userAgent’ of undefined"></a>Cannot read property ‘userAgent’ of undefined</h3><p><strong>错误信息</strong></p><p>运行 js，报错信息如下：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/userAgentofundefined.png" alt="userAgent of undefined"></p><p><strong>解决方案</strong></p><p>打开头条页面，按 <code>F12</code> 打开开发者工具，选择 <code>Console</code>，输入 <code>window.navigator</code> 后回车，可见下图：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/window.navigator.png" alt="window.navigator"></p><p>在 <code>window.navigator </code>中添加 <code>userAgent</code> 即可，为了更安全，把 <code>navigator</code> 中其他参数也添加进去</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> jsdom = <span class="built_in">require</span>(<span class="string">&quot;jsdom&quot;</span>);</span><br><span class="line"><span class="keyword">const</span> &#123; <span class="variable constant_">JSDOM</span> &#125; = jsdom;</span><br><span class="line"><span class="keyword">const</span> dom = <span class="keyword">new</span> <span class="title function_">JSDOM</span>(<span class="string">`&lt;!DOCTYPE html&gt;&lt;p&gt;Hello world&lt;/p&gt;`</span>);</span><br><span class="line"><span class="variable language_">window</span> = <span class="variable language_">global</span>;</span><br><span class="line"><span class="keyword">var</span> <span class="variable language_">document</span> = dom.<span class="property">window</span>.<span class="property">document</span>;</span><br><span class="line"><span class="keyword">var</span> params = &#123;</span><br><span class="line">    <span class="attr">location</span>:&#123;</span><br><span class="line">        <span class="attr">hash</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">host</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">hostname</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">href</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">origin</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">pathname</span>: <span class="string">&quot;/&quot;</span>,</span><br><span class="line">        <span class="attr">port</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">protocol</span>: <span class="string">&quot;https:&quot;</span>,</span><br><span class="line">        <span class="attr">search</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">    <span class="attr">navigator</span>:&#123;</span><br><span class="line">        <span class="attr">appCodeName</span>: <span class="string">&quot;Mozilla&quot;</span>,</span><br><span class="line">        <span class="attr">appName</span>: <span class="string">&quot;Netscape&quot;</span>,</span><br><span class="line">        <span class="attr">appVersion</span>: <span class="string">&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">cookieEnabled</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">deviceMemory</span>: <span class="number">8</span>,</span><br><span class="line">        <span class="attr">doNotTrack</span>: <span class="literal">null</span>,</span><br><span class="line">        <span class="attr">hardwareConcurrency</span>: <span class="number">4</span>,</span><br><span class="line">        <span class="attr">language</span>: <span class="string">&quot;zh-CN&quot;</span>,</span><br><span class="line">        <span class="attr">languages</span>: [<span class="string">&quot;zh-CN&quot;</span>, <span class="string">&quot;zh&quot;</span>],</span><br><span class="line">        <span class="attr">maxTouchPoints</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">onLine</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">platform</span>: <span class="string">&quot;Win32&quot;</span>,</span><br><span class="line">        <span class="attr">product</span>: <span class="string">&quot;Gecko&quot;</span>,</span><br><span class="line">        <span class="attr">productSub</span>: <span class="string">&quot;20030107&quot;</span>,</span><br><span class="line">        <span class="attr">userAgent</span>: <span class="string">&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">vendor</span>: <span class="string">&quot;Google Inc.&quot;</span>,</span><br><span class="line">        <span class="attr">vendorSub</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">&#125;;</span><br><span class="line"><span class="title class_">Object</span>.<span class="title function_">assign</span>(<span class="variable language_">window</span>,params);</span><br><span class="line"><span class="variable language_">window</span>.<span class="property">document</span> = <span class="variable language_">document</span>;</span><br><span class="line"><span class="comment">// ----------这里是复制的近 500 行代码 ----------</span></span><br><span class="line"><span class="keyword">var</span> _typeof = <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; <span class="string">&quot;symbol&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span>.<span class="property">iterator</span> ? <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">: <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> f &amp;&amp; <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; f.<span class="property">constructor</span> === <span class="title class_">Symbol</span> &amp;&amp; f !== <span class="title class_">Symbol</span>.<span class="property"><span class="keyword">prototype</span></span> ? <span class="string">&quot;symbol&quot;</span> : <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">;</span><br><span class="line"><span class="variable constant_">TAC</span> = <span class="keyword">function</span>(<span class="params"></span>) &#123;</span><br><span class="line">    <span class="keyword">function</span> <span class="title function_">f</span>(<span class="params">f, a, b, d, c, r</span>) &#123;</span><br><span class="line">        ...</span><br><span class="line">    &#125;</span><br><span class="line">&#125;(),</span><br><span class="line"><span class="title function_">TAC</span>(<span class="string">&quot;484e4f4a4......&quot;</span>, []);</span><br><span class="line"><span class="comment">// ----------------------------------------</span></span><br><span class="line"></span><br><span class="line">sign = <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">sign</span>(&#123;<span class="attr">url</span>:<span class="string">&quot;https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&quot;</span>&#125;);</span><br><span class="line"><span class="variable language_">console</span>.<span class="title function_">log</span>(sign);</span><br></pre></td></tr></table></figure><p>运行一下，成功输出结果如下：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">_02B4Z6wo00f0122Q2eAAAIBAkOB99iCRDv9tkt1AAIR91a</span><br></pre></td></tr></table></figure><p>但是，这只是 <code>_signature</code> 的一部分。是不是遗漏了什么？</p><p>再全局搜索 <code>window.byted_acrawler</code>，在网页源码中发现有一段 <code>js</code> 生成的代码：</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="variable language_">window</span>.<span class="property">byted_acrawler</span> &amp;&amp;</span><br><span class="line">  <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">init</span>(&#123;</span><br><span class="line">    <span class="attr">aid</span>: <span class="number">24</span>,</span><br><span class="line">    <span class="attr">dfp</span>: <span class="literal">true</span>,</span><br><span class="line">    <span class="attr">intercept</span>: <span class="literal">true</span>, <span class="comment">// 开启拦截器后，所有符合下面列表条件的 url 都会自动加上 _signature 参数</span></span><br><span class="line">    <span class="comment">// SDK 会拦截所有使用 XMLHTTPRequest 发送的请求，包括第三方库发出的，所以请严格设置 enablePathList</span></span><br><span class="line">    <span class="attr">enablePathList</span>: [<span class="string">&quot;/c/ugc/video/publish/&quot;</span>],</span><br><span class="line">    <span class="attr">urlRewriteRules</span>: [</span><br><span class="line">      [<span class="string">&quot;/c/ugc/video/publish/&quot;</span>, <span class="string">&quot;https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/c/ugc/video/publish/&quot;</span>],</span><br><span class="line">    ],</span><br><span class="line">  &#125;);</span><br></pre></td></tr></table></figure><p>把上述代码 <code>sdk</code> 拦截去掉，然后插入 <code>sign.js</code> 中运行一下：</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> jsdom = <span class="built_in">require</span>(<span class="string">&quot;jsdom&quot;</span>);</span><br><span class="line"><span class="keyword">const</span> &#123; <span class="variable constant_">JSDOM</span> &#125; = jsdom;</span><br><span class="line"><span class="keyword">const</span> dom = <span class="keyword">new</span> <span class="title function_">JSDOM</span>(<span class="string">`&lt;!DOCTYPE html&gt;&lt;p&gt;Hello world&lt;/p&gt;`</span>);</span><br><span class="line"><span class="variable language_">window</span> = <span class="variable language_">global</span>;</span><br><span class="line"><span class="keyword">var</span> <span class="variable language_">document</span> = dom.<span class="property">window</span>.<span class="property">document</span>;</span><br><span class="line"><span class="keyword">var</span> params = &#123;</span><br><span class="line">    <span class="attr">location</span>:&#123;</span><br><span class="line">        <span class="attr">hash</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">host</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">hostname</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">href</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">origin</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">pathname</span>: <span class="string">&quot;/&quot;</span>,</span><br><span class="line">        <span class="attr">port</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">protocol</span>: <span class="string">&quot;https:&quot;</span>,</span><br><span class="line">        <span class="attr">search</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">    <span class="attr">navigator</span>:&#123;</span><br><span class="line">        <span class="attr">appCodeName</span>: <span class="string">&quot;Mozilla&quot;</span>,</span><br><span class="line">        <span class="attr">appName</span>: <span class="string">&quot;Netscape&quot;</span>,</span><br><span class="line">        <span class="attr">appVersion</span>: <span class="string">&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">cookieEnabled</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">deviceMemory</span>: <span class="number">8</span>,</span><br><span class="line">        <span class="attr">doNotTrack</span>: <span class="literal">null</span>,</span><br><span class="line">        <span class="attr">hardwareConcurrency</span>: <span class="number">4</span>,</span><br><span class="line">        <span class="attr">language</span>: <span class="string">&quot;zh-CN&quot;</span>,</span><br><span class="line">        <span class="attr">languages</span>: [<span class="string">&quot;zh-CN&quot;</span>, <span class="string">&quot;zh&quot;</span>],</span><br><span class="line">        <span class="attr">maxTouchPoints</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">onLine</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">platform</span>: <span class="string">&quot;Win32&quot;</span>,</span><br><span class="line">        <span class="attr">product</span>: <span class="string">&quot;Gecko&quot;</span>,</span><br><span class="line">        <span class="attr">productSub</span>: <span class="string">&quot;20030107&quot;</span>,</span><br><span class="line">        <span class="attr">userAgent</span>: <span class="string">&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">vendor</span>: <span class="string">&quot;Google Inc.&quot;</span>,</span><br><span class="line">        <span class="attr">vendorSub</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">&#125;;</span><br><span class="line"><span class="title class_">Object</span>.<span class="title function_">assign</span>(<span class="variable language_">window</span>,params);</span><br><span class="line"><span class="variable language_">window</span>.<span class="property">document</span> = <span class="variable language_">document</span>;</span><br><span class="line"><span class="comment">// ----------这里是复制的近 500 行代码 ----------</span></span><br><span class="line"><span class="keyword">var</span> _typeof = <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; <span class="string">&quot;symbol&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span>.<span class="property">iterator</span> ? <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">: <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> f &amp;&amp; <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; f.<span class="property">constructor</span> === <span class="title class_">Symbol</span> &amp;&amp; f !== <span class="title class_">Symbol</span>.<span class="property"><span class="keyword">prototype</span></span> ? <span class="string">&quot;symbol&quot;</span> : <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">;</span><br><span class="line"><span class="variable constant_">TAC</span> = <span class="keyword">function</span>(<span class="params"></span>) &#123;</span><br><span class="line">    <span class="keyword">function</span> <span class="title function_">f</span>(<span class="params">f, a, b, d, c, r</span>) &#123;</span><br><span class="line">        ...</span><br><span class="line">    &#125;</span><br><span class="line">&#125;(),</span><br><span class="line"><span class="title function_">TAC</span>(<span class="string">&quot;484e4f4a4......&quot;</span>, []);</span><br><span class="line"><span class="comment">// ----------------------------------------</span></span><br><span class="line"></span><br><span class="line"><span class="variable language_">window</span>.<span class="property">byted_acrawler</span> &amp;&amp; <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">init</span>(&#123;</span><br><span class="line">    <span class="attr">aid</span>: <span class="number">24</span>,</span><br><span class="line">    <span class="attr">dfp</span>: <span class="literal">true</span>,</span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line">sign = <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">sign</span>(&#123;<span class="attr">url</span>:<span class="string">&quot;https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&quot;</span>&#125;);</span><br><span class="line"><span class="variable language_">console</span>.<span class="title function_">log</span>(sign);</span><br></pre></td></tr></table></figure><h3 id="Cannot-read-property-‘width’-of-undefined"><a href="#Cannot-read-property-‘width’-of-undefined" class="headerlink" title="Cannot read property ‘width’ of undefined"></a>Cannot read property ‘width’ of undefined</h3><p><strong>错误信息</strong></p><p>运行 js，报错信息如下：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/widthofundefined.png" alt="width of undefined"></p><p><strong>解决方案</strong></p><p>打开头条页面，按 <code>F12</code> 打开开发者工具，选择 <code>Console</code>，输入 <code>window.screen</code> 后回车，可见下图：</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/toutiao/window.screen.png" alt="window.screen"></p><p>在 <code>window.screen</code> 中添加 <code>width</code> 即可，为了更安全，把 <code>screen</code> 中其他参数也添加进去。</p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> jsdom = <span class="built_in">require</span>(<span class="string">&quot;jsdom&quot;</span>);</span><br><span class="line"><span class="keyword">const</span> &#123; <span class="variable constant_">JSDOM</span> &#125; = jsdom;</span><br><span class="line"><span class="keyword">const</span> dom = <span class="keyword">new</span> <span class="title function_">JSDOM</span>(<span class="string">`&lt;!DOCTYPE html&gt;&lt;p&gt;Hello world&lt;/p&gt;`</span>);</span><br><span class="line"><span class="variable language_">window</span> = <span class="variable language_">global</span>;</span><br><span class="line"><span class="keyword">var</span> <span class="variable language_">document</span> = dom.<span class="property">window</span>.<span class="property">document</span>;</span><br><span class="line"><span class="keyword">var</span> params = &#123;</span><br><span class="line">    <span class="attr">location</span>:&#123;</span><br><span class="line">        <span class="attr">hash</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">host</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">hostname</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">href</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">origin</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">pathname</span>: <span class="string">&quot;/&quot;</span>,</span><br><span class="line">        <span class="attr">port</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">protocol</span>: <span class="string">&quot;https:&quot;</span>,</span><br><span class="line">        <span class="attr">search</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">    <span class="attr">navigator</span>:&#123;</span><br><span class="line">        <span class="attr">appCodeName</span>: <span class="string">&quot;Mozilla&quot;</span>,</span><br><span class="line">        <span class="attr">appName</span>: <span class="string">&quot;Netscape&quot;</span>,</span><br><span class="line">        <span class="attr">appVersion</span>: <span class="string">&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">cookieEnabled</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">deviceMemory</span>: <span class="number">8</span>,</span><br><span class="line">        <span class="attr">doNotTrack</span>: <span class="literal">null</span>,</span><br><span class="line">        <span class="attr">hardwareConcurrency</span>: <span class="number">4</span>,</span><br><span class="line">        <span class="attr">language</span>: <span class="string">&quot;zh-CN&quot;</span>,</span><br><span class="line">        <span class="attr">languages</span>: [<span class="string">&quot;zh-CN&quot;</span>, <span class="string">&quot;zh&quot;</span>],</span><br><span class="line">        <span class="attr">maxTouchPoints</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">onLine</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">platform</span>: <span class="string">&quot;Win32&quot;</span>,</span><br><span class="line">        <span class="attr">product</span>: <span class="string">&quot;Gecko&quot;</span>,</span><br><span class="line">        <span class="attr">productSub</span>: <span class="string">&quot;20030107&quot;</span>,</span><br><span class="line">        <span class="attr">userAgent</span>: <span class="string">&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">vendor</span>: <span class="string">&quot;Google Inc.&quot;</span>,</span><br><span class="line">        <span class="attr">vendorSub</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">    <span class="string">&quot;screen&quot;</span>:&#123;</span><br><span class="line">        <span class="attr">availHeight</span>: <span class="number">1040</span>,</span><br><span class="line">        <span class="attr">availLeft</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">availTop</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">availWidth</span>: <span class="number">1920</span>,</span><br><span class="line">        <span class="attr">colorDepth</span>: <span class="number">24</span>,</span><br><span class="line">        <span class="attr">height</span>: <span class="number">1080</span>,</span><br><span class="line">        <span class="attr">pixelDepth</span>: <span class="number">24</span>,</span><br><span class="line">        <span class="attr">width</span>: <span class="number">1920</span>,</span><br><span class="line">    &#125;</span><br><span class="line">&#125;;</span><br><span class="line"><span class="title class_">Object</span>.<span class="title function_">assign</span>(<span class="variable language_">window</span>,params);</span><br><span class="line"><span class="variable language_">window</span>.<span class="property">document</span> = <span class="variable language_">document</span>;</span><br><span class="line"><span class="comment">// ----------这里是复制的近 500 行代码 ----------</span></span><br><span class="line"><span class="keyword">var</span> _typeof = <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; <span class="string">&quot;symbol&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span>.<span class="property">iterator</span> ? <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">: <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> f &amp;&amp; <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; f.<span class="property">constructor</span> === <span class="title class_">Symbol</span> &amp;&amp; f !== <span class="title class_">Symbol</span>.<span class="property"><span class="keyword">prototype</span></span> ? <span class="string">&quot;symbol&quot;</span> : <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">;</span><br><span class="line"><span class="variable constant_">TAC</span> = <span class="keyword">function</span>(<span class="params"></span>) &#123;</span><br><span class="line">    <span class="keyword">function</span> <span class="title function_">f</span>(<span class="params">f, a, b, d, c, r</span>) &#123;</span><br><span class="line">        ...</span><br><span class="line">    &#125;</span><br><span class="line">&#125;(),</span><br><span class="line"><span class="title function_">TAC</span>(<span class="string">&quot;484e4f4a4......&quot;</span>, []);</span><br><span class="line"><span class="comment">// ----------------------------------------</span></span><br><span class="line"></span><br><span class="line"><span class="variable language_">window</span>.<span class="property">byted_acrawler</span> &amp;&amp; <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">init</span>(&#123;</span><br><span class="line">    <span class="attr">aid</span>: <span class="number">24</span>,</span><br><span class="line">    <span class="attr">dfp</span>: <span class="literal">true</span>,</span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line">sign = <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">sign</span>(&#123;<span class="attr">url</span>:<span class="string">&quot;https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&quot;</span>&#125;);</span><br><span class="line"><span class="variable language_">console</span>.<span class="title function_">log</span>(sign);</span><br></pre></td></tr></table></figure><p>再运行一下，没有报错了。返回值如下：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">_02B4Z6wo00f01erPHLwAAIBCF7-4qJivPAXqzRgAACWl0b</span><br></pre></td></tr></table></figure><h3 id="signature-长度不一致"><a href="#signature-长度不一致" class="headerlink" title="_signature 长度不一致"></a>_signature 长度不一致</h3><p>多方调查发现：是真实网页是带 <code>cookie</code> 访问的，我们的模拟环境没有 <code>cookie</code></p><p><strong>解决方案</strong></p><p>在模拟环境中添加 <code>cookie</code></p><figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">const</span> jsdom = <span class="built_in">require</span>(<span class="string">&quot;jsdom&quot;</span>);</span><br><span class="line"><span class="keyword">const</span> &#123; <span class="variable constant_">JSDOM</span> &#125; = jsdom;</span><br><span class="line"><span class="keyword">const</span> dom = <span class="keyword">new</span> <span class="title function_">JSDOM</span>(<span class="string">`&lt;!DOCTYPE html&gt;&lt;p&gt;Hello world&lt;/p&gt;`</span>);</span><br><span class="line"><span class="variable language_">window</span> = <span class="variable language_">global</span>;</span><br><span class="line"><span class="keyword">var</span> <span class="variable language_">document</span> = dom.<span class="property">window</span>.<span class="property">document</span>;</span><br><span class="line"><span class="keyword">var</span> params = &#123;</span><br><span class="line">    <span class="attr">location</span>:&#123;</span><br><span class="line">        <span class="attr">hash</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">host</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">hostname</span>: <span class="string">&quot;www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">href</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">origin</span>: <span class="string">&quot;https://www.toutiao.com&quot;</span>,</span><br><span class="line">        <span class="attr">pathname</span>: <span class="string">&quot;/&quot;</span>,</span><br><span class="line">        <span class="attr">port</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">        <span class="attr">protocol</span>: <span class="string">&quot;https:&quot;</span>,</span><br><span class="line">        <span class="attr">search</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">    <span class="attr">navigator</span>:&#123;</span><br><span class="line">        <span class="attr">appCodeName</span>: <span class="string">&quot;Mozilla&quot;</span>,</span><br><span class="line">        <span class="attr">appName</span>: <span class="string">&quot;Netscape&quot;</span>,</span><br><span class="line">        <span class="attr">appVersion</span>: <span class="string">&quot;5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">cookieEnabled</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">deviceMemory</span>: <span class="number">8</span>,</span><br><span class="line">        <span class="attr">doNotTrack</span>: <span class="literal">null</span>,</span><br><span class="line">        <span class="attr">hardwareConcurrency</span>: <span class="number">4</span>,</span><br><span class="line">        <span class="attr">language</span>: <span class="string">&quot;zh-CN&quot;</span>,</span><br><span class="line">        <span class="attr">languages</span>: [<span class="string">&quot;zh-CN&quot;</span>, <span class="string">&quot;zh&quot;</span>],</span><br><span class="line">        <span class="attr">maxTouchPoints</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">onLine</span>: <span class="literal">true</span>,</span><br><span class="line">        <span class="attr">platform</span>: <span class="string">&quot;Win32&quot;</span>,</span><br><span class="line">        <span class="attr">product</span>: <span class="string">&quot;Gecko&quot;</span>,</span><br><span class="line">        <span class="attr">productSub</span>: <span class="string">&quot;20030107&quot;</span>,</span><br><span class="line">        <span class="attr">userAgent</span>: <span class="string">&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36&quot;</span>,</span><br><span class="line">        <span class="attr">vendor</span>: <span class="string">&quot;Google Inc.&quot;</span>,</span><br><span class="line">        <span class="attr">vendorSub</span>: <span class="string">&quot;&quot;</span>,</span><br><span class="line">    &#125;,</span><br><span class="line">    <span class="string">&quot;screen&quot;</span>:&#123;</span><br><span class="line">        <span class="attr">availHeight</span>: <span class="number">1040</span>,</span><br><span class="line">        <span class="attr">availLeft</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">availTop</span>: <span class="number">0</span>,</span><br><span class="line">        <span class="attr">availWidth</span>: <span class="number">1920</span>,</span><br><span class="line">        <span class="attr">colorDepth</span>: <span class="number">24</span>,</span><br><span class="line">        <span class="attr">height</span>: <span class="number">1080</span>,</span><br><span class="line">        <span class="attr">pixelDepth</span>: <span class="number">24</span>,</span><br><span class="line">        <span class="attr">width</span>: <span class="number">1920</span>,</span><br><span class="line">    &#125;</span><br><span class="line">&#125;;</span><br><span class="line"><span class="title class_">Object</span>.<span class="title function_">assign</span>(<span class="variable language_">window</span>,params);</span><br><span class="line"><span class="variable language_">window</span>.<span class="property">document</span> = <span class="variable language_">document</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">function</span> <span class="title function_">setCookie</span>(<span class="params">name, value, seconds</span>) &#123;</span><br><span class="line">    seconds = seconds || <span class="number">0</span>;</span><br><span class="line">    <span class="keyword">var</span> expires = <span class="string">&quot;&quot;</span>;</span><br><span class="line">    <span class="keyword">if</span> (seconds != <span class="number">0</span> ) &#123;</span><br><span class="line">    <span class="keyword">var</span> date = <span class="keyword">new</span> <span class="title class_">Date</span>();</span><br><span class="line">    date.<span class="title function_">setTime</span>(date.<span class="title function_">getTime</span>()+(seconds*<span class="number">1000</span>));</span><br><span class="line">    expires = <span class="string">&quot;; expires=&quot;</span>+date.<span class="title function_">toGMTString</span>();</span><br><span class="line">    &#125;</span><br><span class="line">    <span class="variable language_">document</span>.<span class="property">cookie</span> = name+<span class="string">&quot;=&quot;</span>+<span class="built_in">escape</span>(value)+expires+<span class="string">&quot;; path=/&quot;</span>;</span><br><span class="line">&#125;</span><br><span class="line"><span class="comment">// 把自己浏览器的真实 cookie 复制过来即可</span></span><br><span class="line">cookies = <span class="string">&quot;s_v_web_id=xxxxxxxxxxxxxxxxxxxxxxxxxx&quot;</span>;</span><br><span class="line"><span class="keyword">for</span>(<span class="keyword">let</span> cookie <span class="keyword">of</span> cookies.<span class="title function_">split</span>(<span class="string">&quot;;&quot;</span>))&#123;</span><br><span class="line">    tmp = cookie.<span class="title function_">split</span>(<span class="string">&quot;=&quot;</span>);</span><br><span class="line">    <span class="title function_">setCookie</span>(tmp[<span class="number">0</span>],tmp[<span class="number">1</span>],<span class="number">1800</span>);</span><br><span class="line">&#125;</span><br><span class="line"></span><br><span class="line"><span class="comment">// ----------这里是复制的近 500 行代码 ----------</span></span><br><span class="line"><span class="keyword">var</span> _typeof = <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; <span class="string">&quot;symbol&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span>.<span class="property">iterator</span> ? <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">: <span class="keyword">function</span>(<span class="params">f</span>) &#123;</span><br><span class="line">    <span class="keyword">return</span> f &amp;&amp; <span class="string">&quot;function&quot;</span> == <span class="keyword">typeof</span> <span class="title class_">Symbol</span> &amp;&amp; f.<span class="property">constructor</span> === <span class="title class_">Symbol</span> &amp;&amp; f !== <span class="title class_">Symbol</span>.<span class="property"><span class="keyword">prototype</span></span> ? <span class="string">&quot;symbol&quot;</span> : <span class="keyword">typeof</span> f</span><br><span class="line">&#125;</span><br><span class="line">;</span><br><span class="line"><span class="variable constant_">TAC</span> = <span class="keyword">function</span>(<span class="params"></span>) &#123;</span><br><span class="line">    <span class="keyword">function</span> <span class="title function_">f</span>(<span class="params">f, a, b, d, c, r</span>) &#123;</span><br><span class="line">        ...</span><br><span class="line">    &#125;</span><br><span class="line">&#125;(),</span><br><span class="line"><span class="title function_">TAC</span>(<span class="string">&quot;484e4f4a4......&quot;</span>, []);</span><br><span class="line"><span class="comment">// ----------------------------------------</span></span><br><span class="line"></span><br><span class="line"><span class="variable language_">window</span>.<span class="property">byted_acrawler</span> &amp;&amp; <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">init</span>(&#123;</span><br><span class="line">    <span class="attr">aid</span>: <span class="number">24</span>,</span><br><span class="line">    <span class="attr">dfp</span>: <span class="literal">true</span>,</span><br><span class="line">&#125;)</span><br><span class="line"></span><br><span class="line">sign = <span class="variable language_">window</span>.<span class="property">byted_acrawler</span>.<span class="title function_">sign</span>(&#123;<span class="attr">url</span>:<span class="string">&quot;https://www.toutiao.com/api/pc/feed/?category=news_hot&amp;utm_source=toutiao&amp;widen=1&amp;max_behot_time=1594869246&amp;max_behot_time_tmp=1594869246&amp;tadrequire=true&amp;as=A1B5EF51300180F&amp;cp=5F10A138508FDE1&quot;</span>&#125;);</span><br><span class="line"><span class="variable language_">console</span>.<span class="title function_">log</span>(sign);</span><br></pre></td></tr></table></figure><p>运行一下，终于！得到了完整的 <code>_signature</code> 值:</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">_02B4Z6wo00f01jc-omQAAIBByk4GctL5ZYo3PKbAANLr40OHsHg8RRe1BK03uca1smyI5DA3wElBPDGI.KcAotMiY1IOIhstbtN3bZIM9xRX0NzP.PoYAaq0JjXmU5cIgLSE03L.57r1BQkJe6</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前情提要&quot;&gt;&lt;a href=&quot;#前情提要&quot; class=&quot;headerlink&quot; title=&quot;前情提要&quot;&gt;&lt;/a&gt;前情提要&lt;/h2&gt;&lt;div class=&quot;note danger flat&quot;&gt;&lt;p&gt;爬虫具有时效性，此篇文章代码不一定长期有效，但是解决方案通用。&lt;</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    <category term="js 逆向" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/js-%E9%80%86%E5%90%91/"/>
    
    
    <category term="js 逆向" scheme="https://sitoi.cn/tags/js-%E9%80%86%E5%90%91/"/>
    
    <category term="今日头条" scheme="https://sitoi.cn/tags/%E4%BB%8A%E6%97%A5%E5%A4%B4%E6%9D%A1/"/>
    
  </entry>
  
  <entry>
    <title>MongoEngine 常用语法汇总</title>
    <link href="https://sitoi.cn/posts/24744.html"/>
    <id>https://sitoi.cn/posts/24744.html</id>
    <published>2020-06-13T14:54:17.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<h2 id="MongoEngine-查询"><a href="#MongoEngine-查询" class="headerlink" title="MongoEngine 查询"></a>MongoEngine 查询</h2><h3 id="过滤查询"><a href="#过滤查询" class="headerlink" title="过滤查询"></a>过滤查询</h3><p>可以通过调用 <code>QuerySet</code> 对象的关键字参数来对数据查询进行过滤，关键字查询中的键和你想要查询的 <code>Document</code> 中的字段一致：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">users = User.objects(name=<span class="string">&#x27;sitoi&#x27;</span>)</span><br></pre></td></tr></table></figure><p>对于内嵌 document 的字段可以使用 <code>__</code> 来代替对象属性访问语法中的 . 进行访问：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pages = Page.objects(author__country=<span class="string">&#x27;chine&#x27;</span>)</span><br></pre></td></tr></table></figure><h3 id="查询操作符"><a href="#查询操作符" class="headerlink" title="查询操作符"></a>查询操作符</h3><p>在查询中也可以使用操作符，只要将其加在关键字的双下划线 <code>__</code> 之后即可：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">young_users = Users.objects(age__lte=<span class="number">18</span>)</span><br></pre></td></tr></table></figure><p>可用的运算符如下：</p><table><thead><tr><th align="center">符号</th><th align="center">含义</th></tr></thead><tbody><tr><td align="center">ne</td><td align="center">不等于 <code>≠</code></td></tr><tr><td align="center">lt</td><td align="center">小于 <code>&lt;</code></td></tr><tr><td align="center">lte</td><td align="center">小于等于 <code>≤</code></td></tr><tr><td align="center">gt</td><td align="center">大于 <code>&gt;</code></td></tr><tr><td align="center">gte</td><td align="center">大于等于 <code>≥</code></td></tr><tr><td align="center">not</td><td align="center">否定一个标准的检查，需要用在其他操作符之前(e.g. <code>Q(age__not__mod=5)</code>)</td></tr><tr><td align="center">in</td><td align="center">值在 <code>list</code> 中</td></tr><tr><td align="center">nin</td><td align="center">值不在 <code>list</code> 中</td></tr><tr><td align="center">mod</td><td align="center"><code>value % x == y</code>, 其中 <code>x</code> 和 <code>y</code> 为给定的值</td></tr><tr><td align="center">all</td><td align="center"><code>list</code> 里面所有的值</td></tr><tr><td align="center">size</td><td align="center">数组的大小</td></tr><tr><td align="center">exists</td><td align="center">存在这个值</td></tr></tbody></table><h3 id="字符串查询"><a href="#字符串查询" class="headerlink" title="字符串查询"></a>字符串查询</h3><p>以下操作符可以快捷的进行<code>正则查询</code>：</p><table><thead><tr><th align="center">符号</th><th align="center">含义</th></tr></thead><tbody><tr><td align="center">exact</td><td align="center">字符串型字段完全匹配这个值</td></tr><tr><td align="center">iexact</td><td align="center">字符串型字段完全匹配这个值（大小写敏感）</td></tr><tr><td align="center">contains</td><td align="center">字符串字段包含这个值</td></tr><tr><td align="center">icontains</td><td align="center">字符串字段包含这个值（大小写敏感）</td></tr><tr><td align="center">startswith</td><td align="center">字符串字段由这个值开头</td></tr><tr><td align="center">istartswith</td><td align="center">字符串字段由这个值开头（大小写敏感）</td></tr><tr><td align="center">endswith</td><td align="center">字符串字段由这个值结尾</td></tr><tr><td align="center">iendswith</td><td align="center">字符串字段由这个值结尾（大小写敏感）</td></tr><tr><td align="center">match</td><td align="center">执行 <code>$elemMatch</code> 操作，所以你可以使用一个数组中的 <code>document</code> 实例</td></tr></tbody></table><h3 id="列表查询"><a href="#列表查询" class="headerlink" title="列表查询"></a>列表查询</h3><p>对于大多数字段，这种语法会查询出那些字段与给出的值相匹配的 <code>document</code>，但是当一个字段引用 <code>ListField</code> 的时候，而只会提供一条数据，那么包含这条数据的就会被匹配上：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Page</span>(<span class="title class_ inherited__">Document</span>):</span><br><span class="line">    tags = ListField(StringField())</span><br><span class="line"></span><br><span class="line">Page.objects(tags=<span class="string">&#x27;coding&#x27;</span>)</span><br></pre></td></tr></table></figure><h3 id="原始查询"><a href="#原始查询" class="headerlink" title="原始查询"></a>原始查询</h3><p>你可以通过 <code>__raw__</code> 参数来使用一个原始的 <code>PyMongo</code> 语句来进行查询，这样可以进行原始的完整查询：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Page.objects(__raw__=&#123;<span class="string">&#x27;tags&#x27;</span>: <span class="string">&#x27;coding&#x27;</span>&#125;)</span><br></pre></td></tr></table></figure><h3 id="限制和跳过结果"><a href="#限制和跳过结果" class="headerlink" title="限制和跳过结果"></a>限制和跳过结果</h3><p>就像传统的 <code>ORM</code> 一样，你有时候需要限制返回的结果的数量，或者需要跳过一定数量的结果。<code>QuerySet</code> 里面可以使用 <code>limit()</code> 和 <code>skip()</code> 这两个方法来实现，但是更推荐使用数组切割的语法：</p><h4 id="限制前-5-个"><a href="#限制前-5-个" class="headerlink" title="限制前 5 个"></a>限制前 5 个</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">users = User.objects[:<span class="number">5</span>]</span><br></pre></td></tr></table></figure><h4 id="跳过前-5-个"><a href="#跳过前-5-个" class="headerlink" title="跳过前 5 个"></a>跳过前 5 个</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">users = User.objects[<span class="number">5</span>:]</span><br></pre></td></tr></table></figure><h4 id="取-10-到-15-个"><a href="#取-10-到-15-个" class="headerlink" title="取 10 到 15 个"></a>取 10 到 15 个</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">users = User.objects[<span class="number">10</span>:<span class="number">15</span>]</span><br></pre></td></tr></table></figure><p>你可以指定让查询返回一个结果。如果这个条在数据库中不存在，那么会引发 <code>IndexError</code> 错误 。使用 <code>first()</code> 方法在数据不存在的时候会返回 <code>None</code>：</p><h3 id="默认-Document-查询"><a href="#默认-Document-查询" class="headerlink" title="默认 Document 查询"></a>默认 Document 查询</h3><p>默认情况下，<code>Document</code> 的 <code>objects</code> 属性返回一个一个 <code>QuerySet</code> 对象，它并没有进行任何筛选和过滤，它返回的是所有的数据对象。这一点可以通过给一个 <code>document</code> 定义一个方法来修改 一个 <code>queryset</code> 。这个方法需要两参数 <code>__doc_cls</code> 和 <code>queryset</code> 。第一个参数是定义这个方法的 <code>Document</code> 类名（从这个意义上来说，这个方法像是一个 <code>classmethod()</code> 而不是一般的方法），第二个参数是初始化的 <code>queryset</code>。这个方法需要使用 <code>queryset_manager()</code> 来装饰来它，使得它被认可。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">BlogPost</span>(<span class="title class_ inherited__">Document</span>):</span><br><span class="line">    title = StringField()</span><br><span class="line">    date = DateTimeField()</span><br><span class="line"></span><br><span class="line"><span class="meta">    @queryset_manager</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">objects</span>(<span class="params">doc_cls, queryset</span>):</span><br><span class="line">        <span class="keyword">return</span> queryset.order_by(<span class="string">&#x27;-date&#x27;</span>)</span><br></pre></td></tr></table></figure><p>你不用调用 <code>objects</code> 方法，你可以自定义更多的管理方法，例如：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">BlogPost</span>(<span class="title class_ inherited__">Document</span>):</span><br><span class="line">    title = StringField()</span><br><span class="line">    published = BooleanField()</span><br><span class="line"></span><br><span class="line"><span class="meta">    @queryset_manager</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">live_posts</span>(<span class="params">doc_cls, queryset</span>):</span><br><span class="line">        <span class="keyword">return</span> queryset.<span class="built_in">filter</span>(published=<span class="literal">True</span>)</span><br><span class="line"></span><br><span class="line">BlogPost(title=<span class="string">&#x27;test1&#x27;</span>, published=<span class="literal">False</span>).save()</span><br><span class="line">BlogPost(title=<span class="string">&#x27;test2&#x27;</span>, published=<span class="literal">True</span>).save()</span><br><span class="line"><span class="keyword">assert</span> <span class="built_in">len</span>(BlogPost.objects) == <span class="number">2</span></span><br><span class="line"><span class="keyword">assert</span> <span class="built_in">len</span>(BlogPost.live_posts()) == <span class="number">1</span></span><br></pre></td></tr></table></figure><h3 id="自定义-QuerySets"><a href="#自定义-QuerySets" class="headerlink" title="自定义 QuerySets"></a>自定义 QuerySets</h3><p>当你想自己定义一些方法来过滤 <code>document</code> 的时候，继承 <code>QuerySet</code> 类对你来说就是个好的方法。为了在 <code>document</code> 里面使用一个自定义的 <code>QuerySet</code> 类，你可以在 document 里的 meta 字典里设置 <code>queryset_class</code> 的值来实现它。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">AwesomerQuerySet</span>(<span class="title class_ inherited__">QuerySet</span>):</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">get_awesome</span>(<span class="params">self</span>):</span><br><span class="line">        <span class="keyword">return</span> <span class="variable language_">self</span>.<span class="built_in">filter</span>(awesome=<span class="literal">True</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">Page</span>(<span class="title class_ inherited__">Document</span>):</span><br><span class="line">    meta = &#123;<span class="string">&#x27;queryset_class&#x27;</span>: AwesomerQuerySet&#125;</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">Page.objects.get_awesome()</span><br></pre></td></tr></table></figure><h3 id="Aggregation-聚合"><a href="#Aggregation-聚合" class="headerlink" title="Aggregation 聚合"></a>Aggregation 聚合</h3><p>MongoDB 提供了开箱即用的聚合方法，但没有 <code>RDBMS</code> 提供的那样多。<code>MongoEngine</code> 提供了一个包装过的内置的方法，同时自身提供了一些方法，它实现了在数据库服务上执行的 <code>Javascript</code> 代码的功能。</p><h5 id="结果计数"><a href="#结果计数" class="headerlink" title="结果计数"></a>结果计数</h5><p>就像限制和跳过结果一样， <code>QuerySet</code> 对象提供了用来计数的方法 - <code>count()</code>，不过还有一个更 <code>Pythonic</code> 的方法来实现：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">num_users = <span class="built_in">len</span>(User.objects)</span><br></pre></td></tr></table></figure><h5 id="更多功能"><a href="#更多功能" class="headerlink" title="更多功能"></a>更多功能</h5><p>当你想为 <code>document</code> 的特定的字段的数量计数的时候，可以使用 <code>sum()</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">yearly_expense = Employee.objects.<span class="built_in">sum</span>(<span class="string">&#x27;salary&#x27;</span>)</span><br></pre></td></tr></table></figure><p>当你想求某个字段的平均值的时候，可以使用 <code>average()</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mean_age = User.objects.average(<span class="string">&#x27;age&#x27;</span>)</span><br></pre></td></tr></table></figure><p>MongoEngine 提供了一个方法来获取一个在集合里 <code>item</code> 的频率 - <code>item_frequencies()</code>。下面一个例子可以生成 <code>tag-clouds</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">Article</span>(<span class="title class_ inherited__">Document</span>):</span><br><span class="line">    tag = ListField(StringField())</span><br><span class="line"></span><br><span class="line">tag_freqs = Article.objects.item_frequencies(<span class="string">&#x27;tag&#x27;</span>, normalize=<span class="literal">True</span>)</span><br><span class="line"></span><br><span class="line"><span class="keyword">from</span> operator <span class="keyword">import</span> itemgetter</span><br><span class="line">top_tags = <span class="built_in">sorted</span>(tag_freqs.items(), key=itemgetter(<span class="number">1</span>), reverse=<span class="literal">True</span>)[:<span class="number">10</span>]</span><br></pre></td></tr></table></figure><h3 id="高级查询"><a href="#高级查询" class="headerlink" title="高级查询"></a>高级查询</h3><p>有时候使用关键字参数返回的 <code>QuerySet</code> 不能完全满足你的查询需要。例如有时候你需要将约束条件进行 <code>and</code>，<code>or</code> 的操作。你可以使用 <code>MongoEngine</code> 提供的 <code>Q</code> 类来实现，一个 <code>Q</code> 类代表了一个查询的一部分，里面的参数设置与你查询 <code>document</code> 的时候相同。建立一个复杂查询的时候，你需要用 <code>&amp;</code> 或 <code>|</code> 操作符将 <code>Q</code> 对象连结起来。例如：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> mongoengine.queryset.visitor <span class="keyword">import</span> Q</span><br><span class="line"></span><br><span class="line">Post.objects(Q(published=<span class="literal">True</span>) | Q(publish_date__lte=datetime.now()))</span><br><span class="line"></span><br><span class="line">Post.objects((Q(featured=<span class="literal">True</span>) &amp; Q(hits__gte=<span class="number">1000</span>)) | Q(hits__gte=<span class="number">5000</span>))</span><br></pre></td></tr></table></figure><h3 id="Atomic-updates（原子更新）"><a href="#Atomic-updates（原子更新）" class="headerlink" title="Atomic updates（原子更新）"></a>Atomic updates（原子更新）</h3><p>MongoDB 文档 可以通过 <code>QuerySet</code> 上的 <code>update_one()</code>、<code>update()</code>、<code>modify()</code> 方法自动更新。下面几种操作符可以被用到这几种方法上：</p><table><thead><tr><th align="center">符号</th><th align="center">含义</th></tr></thead><tbody><tr><td align="center">set</td><td align="center">设置成一个指定的值</td></tr><tr><td align="center">unset</td><td align="center">删除一个指定的值</td></tr><tr><td align="center">inc</td><td align="center">将值加上一个给定的数</td></tr><tr><td align="center">dec</td><td align="center">将值减去一个给定的数</td></tr><tr><td align="center">push</td><td align="center">在 <code>list</code> 中添加一个值</td></tr><tr><td align="center">push_all</td><td align="center">在 <code>list</code> 中添加一个值</td></tr><tr><td align="center">pop</td><td align="center">移除 list 的第一项或最后一项（根据 <code>pop__&lt;field&gt;=val</code> 中 <code>val</code> 的值决定删除第一项还是最后一项，一般情况下，<code>val</code> 为负则删除第一项，为正则删除最后一项，参见：<a href="https://docs.mongodb.com/manual/reference/operator/update/pop/">mongodb $pop</a></td></tr><tr><td align="center">pull</td><td align="center">从 <code>list</code> 里面移除一个值</td></tr><tr><td align="center">pull_all</td><td align="center">从 <code>list</code> 里面移除个值</td></tr><tr><td align="center">add_to_set</td><td align="center">当要添加的值不在 <code>list</code> 中时，添加这个值</td></tr></tbody></table><p>原子更新的语法类似于查询语法，区别在于修饰操作符位于字段之前，而不是之后：</p><p>给文档中的 <code>page_views</code> 字段上加一个给定的数字</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">post = BlogPost(title=<span class="string">&#x27;Test&#x27;</span>, page_views=<span class="number">0</span>, tags=[<span class="string">&#x27;database&#x27;</span>])</span><br><span class="line">post.save()</span><br><span class="line">BlogPost.objects(<span class="built_in">id</span>=post.<span class="built_in">id</span>).update_one(inc__page_views=<span class="number">1</span>)</span><br><span class="line">post.reload()  <span class="comment"># the document has been changed, so we need to reload it</span></span><br><span class="line">post.page_views</span><br></pre></td></tr></table></figure><p>运行输出结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">1</span><br></pre></td></tr></table></figure><p>将文档中的 <code>title</code> 更新为 <code>Example Post</code></p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">BlogPost.objects(<span class="built_in">id</span>=post.<span class="built_in">id</span>).update_one(set__title=<span class="string">&#x27;Example Post&#x27;</span>)</span><br><span class="line">post.reload()</span><br><span class="line">post.title</span><br></pre></td></tr></table></figure><p>运行输出结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#x27;Example Post&#x27;</span><br></pre></td></tr></table></figure><p>往文档中 <code>tags</code> 的列表中添加一个值</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">BlogPost.objects(<span class="built_in">id</span>=post.<span class="built_in">id</span>).update_one(push__tags=<span class="string">&#x27;nosql&#x27;</span>)</span><br><span class="line">post.reload()</span><br><span class="line">post.tags</span><br></pre></td></tr></table></figure><p>运行输出结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[&#x27;database&#x27;, &#x27;nosql&#x27;]</span><br></pre></td></tr></table></figure><p>如果没有修饰操作符，则默认为 <code>$set</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">BlogPost.objects(<span class="built_in">id</span>=post.<span class="built_in">id</span>).update(title=<span class="string">&#x27;Example Post&#x27;</span>)</span><br><span class="line">BlogPost.objects(<span class="built_in">id</span>=post.<span class="built_in">id</span>).update(set__title=<span class="string">&#x27;Example Post&#x27;</span>)</span><br></pre></td></tr></table></figure><h3 id="服务器端-JavaScript-执行"><a href="#服务器端-JavaScript-执行" class="headerlink" title="服务器端 JavaScript 执行"></a>服务器端 JavaScript 执行</h3><p>可以写 <code>Javascript</code> 函数，然后发送到服务器来执行。它返回结果是 <code>Javascript</code> 函数的返回值。这个功能是通过 <code>QuerySet()</code> 对象的 <code>exec_js()</code> 方法实现。传递一个包含一个 <code>Javascript</code> 函数的字符串作为第一个参数。</p><p>其余位置的参数的名字字段将作为您的 <code>Javascript</code> 函数的参数传递过去。</p><p>在 <code>JavaScript</code> 函数范围中，一些变量可用：</p><ul><li><p><code>collection</code> – 对应使用的 <code>Document</code> 类的集合的名称</p></li><li><p><code>query</code> – 一个 <code>QuerySet</code> 对象</p></li><li><p><code>options</code> – 一个对象，它包含要传递给 <code>exec_js()</code> 函数的一些参数</p></li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">sum_field</span>(<span class="params">document, field_name, include_negatives=<span class="literal">True</span></span>):</span><br><span class="line">    code = <span class="string">&quot;&quot;&quot;</span></span><br><span class="line"><span class="string">    function(sumField) &#123;</span></span><br><span class="line"><span class="string">        var total = 0.0;</span></span><br><span class="line"><span class="string">        db[collection].find(query).forEach(function(doc) &#123;</span></span><br><span class="line"><span class="string">            var val = doc[sumField];</span></span><br><span class="line"><span class="string">            if (val &gt;= 0.0 || options.includeNegatives) &#123;</span></span><br><span class="line"><span class="string">                total += val;</span></span><br><span class="line"><span class="string">            &#125;</span></span><br><span class="line"><span class="string">        &#125;);</span></span><br><span class="line"><span class="string">        return total;</span></span><br><span class="line"><span class="string">    &#125;</span></span><br><span class="line"><span class="string">    &quot;&quot;&quot;</span></span><br><span class="line">    options = &#123;<span class="string">&#x27;includeNegatives&#x27;</span>: include_negatives&#125;</span><br><span class="line">    <span class="keyword">return</span> document.objects.exec_js(code, field_name, **options)</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;MongoEngine-查询&quot;&gt;&lt;a href=&quot;#MongoEngine-查询&quot; class=&quot;headerlink&quot; title=&quot;MongoEngine 查询&quot;&gt;&lt;/a&gt;MongoEngine 查询&lt;/h2&gt;&lt;h3 id=&quot;过滤查询&quot;&gt;&lt;a href=&quot;#过</summary>
      
    
    
    
    <category term="数据库" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/"/>
    
    <category term="MongoDB" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/MongoDB/"/>
    
    
    <category term="PyMongo" scheme="https://sitoi.cn/tags/PyMongo/"/>
    
    <category term="MongoEngine" scheme="https://sitoi.cn/tags/MongoEngine/"/>
    
    <category term="CRUD" scheme="https://sitoi.cn/tags/CRUD/"/>
    
    <category term="NoSQL" scheme="https://sitoi.cn/tags/NoSQL/"/>
    
    <category term="Python" scheme="https://sitoi.cn/tags/Python/"/>
    
  </entry>
  
  <entry>
    <title>PyMongo 常用语法汇总</title>
    <link href="https://sitoi.cn/posts/37062.html"/>
    <id>https://sitoi.cn/posts/37062.html</id>
    <published>2020-06-10T14:53:42.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<h2 id="建立基本连接"><a href="#建立基本连接" class="headerlink" title="建立基本连接"></a>建立基本连接</h2><p>首先我们需要建立一个连接，连接 MongoDB 时，我们需要使用 PyMongo 库中的 MongoClient 来建立连接，默认连接的地址是 <code>mongodb://localhost:27017</code></p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pymongo <span class="keyword">import</span> MongoClient</span><br><span class="line"></span><br><span class="line">clinet = MongoClient(<span class="string">&quot;mongodb://localhost:27017&quot;</span>)</span><br><span class="line">db = clinet[<span class="string">&quot;demo&quot;</span>]</span><br><span class="line">col = db[<span class="string">&quot;demo&quot;</span>]</span><br></pre></td></tr></table></figure><p>首先通过上面的代码创建 数据库对象和集合对象。</p><ul><li>数据库连接实例 <code>MongoClient</code></li><li>数据库实例 <code>demo</code></li><li>集合实例 <code>demo</code></li></ul><h2 id="基本命令"><a href="#基本命令" class="headerlink" title="基本命令"></a>基本命令</h2><h3 id="查看数据库信息"><a href="#查看数据库信息" class="headerlink" title="查看数据库信息"></a>查看数据库信息</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">server_info = clinet.server_info()</span><br></pre></td></tr></table></figure><p>输出信息：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;version&quot;</span><span class="punctuation">:</span> <span class="string">&quot;4.2.6&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;gitVersion&quot;</span><span class="punctuation">:</span> <span class="string">&quot;20364840b8f1af16917e4c23c1b5f5efd8b352f8&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;modules&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="punctuation">]</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;allocator&quot;</span><span class="punctuation">:</span> <span class="string">&quot;tcmalloc&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;javascriptEngine&quot;</span><span class="punctuation">:</span> <span class="string">&quot;mozjs&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;sysInfo&quot;</span><span class="punctuation">:</span> <span class="string">&quot;deprecated&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;versionArray&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="number">4</span><span class="punctuation">,</span> <span class="number">2</span><span class="punctuation">,</span> <span class="number">6</span><span class="punctuation">,</span> <span class="number">0</span><span class="punctuation">]</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;openssl&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;running&quot;</span><span class="punctuation">:</span> <span class="string">&quot;OpenSSL 1.1.1  11 Sep 2018&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;compiled&quot;</span><span class="punctuation">:</span> <span class="string">&quot;OpenSSL 1.1.1  11 Sep 2018&quot;</span></span><br><span class="line">  <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;buildEnvironment&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;distmod&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ubuntu1804&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;distarch&quot;</span><span class="punctuation">:</span> <span class="string">&quot;x86_64&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;cc&quot;</span><span class="punctuation">:</span> <span class="string">&quot;/opt/mongodbtoolchain/v3/bin/gcc: gcc (GCC) 8.2.0&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;ccflags&quot;</span><span class="punctuation">:</span> <span class="string">&quot;-fno-omit-frame-pointer -fno-strict-aliasing -ggdb -pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -Werror -O2 -Wno-unused-local-typedefs -Wno-unused-function -Wno-deprecated-declarations -Wno-unused-const-variable -Wno-unused-but-set-variable -Wno-missing-braces -fstack-protector-strong -fno-builtin-memcmp&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;cxx&quot;</span><span class="punctuation">:</span> <span class="string">&quot;/opt/mongodbtoolchain/v3/bin/g++: g++ (GCC) 8.2.0&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;cxxflags&quot;</span><span class="punctuation">:</span> <span class="string">&quot;-Woverloaded-virtual -Wno-maybe-uninitialized -fsized-deallocation -std=c++17&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;linkflags&quot;</span><span class="punctuation">:</span> <span class="string">&quot;-pthread -Wl,-z,now -rdynamic -Wl,--fatal-warnings -fstack-protector-strong -fuse-ld=gold -Wl,--build-id -Wl,--hash-style=gnu -Wl,-z,noexecstack -Wl,--warn-execstack -Wl,-z,relro&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;target_arch&quot;</span><span class="punctuation">:</span> <span class="string">&quot;x86_64&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;target_os&quot;</span><span class="punctuation">:</span> <span class="string">&quot;linux&quot;</span></span><br><span class="line">  <span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;bits&quot;</span><span class="punctuation">:</span> <span class="number">64</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;debug&quot;</span><span class="punctuation">:</span> <span class="literal"><span class="keyword">false</span></span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;maxBsonObjectSize&quot;</span><span class="punctuation">:</span> <span class="number">16777216</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;storageEngines&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span><span class="string">&quot;biggie&quot;</span><span class="punctuation">,</span> <span class="string">&quot;devnull&quot;</span><span class="punctuation">,</span> <span class="string">&quot;ephemeralForTest&quot;</span><span class="punctuation">,</span> <span class="string">&quot;wiredTiger&quot;</span><span class="punctuation">]</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;ok&quot;</span><span class="punctuation">:</span> <span class="number">1</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><h3 id="显示当前数据库服务器上的数据库名"><a href="#显示当前数据库服务器上的数据库名" class="headerlink" title="显示当前数据库服务器上的数据库名"></a>显示当前数据库服务器上的数据库名</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">database_names = clinet.list_database_names()</span><br></pre></td></tr></table></figure><p>输出信息：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[&#x27;admin&#x27;, &#x27;config&#x27;, &#x27;demo&#x27;, &#x27;local&#x27;]</span><br></pre></td></tr></table></figure><blockquote><p>如果没有 <code>demo</code> 数据库是因为在没有插入数据的情况下是不会被创建的，只有第一次插入数据，会自动的创建数据库以及对应的集合。</p></blockquote><h3 id="显示当前数据库上的全部集合名"><a href="#显示当前数据库上的全部集合名" class="headerlink" title="显示当前数据库上的全部集合名"></a>显示当前数据库上的全部集合名</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">collection_names = db.list_collection_names()</span><br></pre></td></tr></table></figure><p>输出信息：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[&#x27;demo&#x27;]</span><br></pre></td></tr></table></figure><h2 id="插入文档"><a href="#插入文档" class="headerlink" title="插入文档"></a>插入文档</h2><h3 id="插入一个文档"><a href="#插入一个文档" class="headerlink" title="插入一个文档"></a>插入一个文档</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">demo = &#123;</span><br><span class="line">    <span class="string">&quot;author&quot;</span>: <span class="string">&quot;Sitoi&quot;</span>,</span><br><span class="line">    <span class="string">&quot;age&quot;</span>: <span class="number">22</span>,</span><br><span class="line">    <span class="string">&quot;title&quot;</span>: <span class="string">&quot;Sitoi-blog&quot;</span>,</span><br><span class="line">    <span class="string">&quot;tags&quot;</span>: [<span class="string">&quot;man&quot;</span>, <span class="string">&quot;spider&quot;</span>]</span><br><span class="line">&#125;</span><br><span class="line">demo_id = col.insert_one(demo).inserted_id</span><br></pre></td></tr></table></figure><p>输出信息：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">5ee3806bb6c75d29c94aa9fc</span><br></pre></td></tr></table></figure><h3 id="插入多个文档"><a href="#插入多个文档" class="headerlink" title="插入多个文档"></a>插入多个文档</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">demos = [</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;blog&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">18</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;blog&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;bash&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">30</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;bash&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;python&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">50</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;language&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;mongodb&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">80</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;NoSQL&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;pymongo&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">97</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;Python for MongoDB&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">]</span><br><span class="line">demo_ids = col.insert_many(demos).inserted_ids</span><br></pre></td></tr></table></figure><p>输出信息：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;)]</span><br></pre></td></tr></table></figure><h2 id="查询文档"><a href="#查询文档" class="headerlink" title="查询文档"></a>查询文档</h2><h3 id="查询单个文档"><a href="#查询单个文档" class="headerlink" title="查询单个文档"></a>查询单个文档</h3><p>返回查询的<code>第一条</code>，<code>find_one</code> 里面可以填写<code>查询条件</code></p><p>参数说明：</p><ul><li><code>filter</code>：查询条件</li><li><code>projection</code>：映射条件</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">result = col.find_one(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br></pre></td></tr></table></figure><h3 id="查询多个文档"><a href="#查询多个文档" class="headerlink" title="查询多个文档"></a>查询多个文档</h3><p>返回一个对象,<code>find</code> 里面可以填写 <code>查询条件</code></p><ul><li><code>filter</code>：查询条件</li><li><code>projection</code>：映射条件</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">result = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&lt;pymongo.cursor.Cursor object at 0x7f695ed56c88&gt;</span><br></pre></td></tr></table></figure><blockquote><p>通过<code>For</code>循环</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">results = col.find()</span><br><span class="line"><span class="keyword">for</span> result <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(result)</span><br></pre></td></tr></table></figure><p>遍历结果：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;<span class="string">&#x27;_id&#x27;</span>: ObjectId(<span class="string">&#x27;5ee3806bb6c75d29c94aa9fc&#x27;</span>), <span class="string">&#x27;author&#x27;</span>: <span class="string">&#x27;Sitoi&#x27;</span>, <span class="string">&#x27;age&#x27;</span>: <span class="number">22</span>, <span class="string">&#x27;title&#x27;</span>: <span class="string">&#x27;Sitoi-blog&#x27;</span>, <span class="string">&#x27;tags&#x27;</span>: [<span class="string">&#x27;man&#x27;</span>, <span class="string">&#x27;spider&#x27;</span>]&#125;</span><br><span class="line">&#123;<span class="string">&#x27;_id&#x27;</span>: ObjectId(<span class="string">&#x27;5ee3806bb6c75d29c94aa9fd&#x27;</span>), <span class="string">&#x27;author&#x27;</span>: <span class="string">&#x27;blog&#x27;</span>, <span class="string">&#x27;age&#x27;</span>: <span class="number">18</span>, <span class="string">&#x27;title&#x27;</span>: <span class="string">&#x27;blog&#x27;</span>, <span class="string">&#x27;text&#x27;</span>: <span class="string">&#x27;Sitoi Blog&#x27;</span>&#125;</span><br><span class="line">&#123;<span class="string">&#x27;_id&#x27;</span>: ObjectId(<span class="string">&#x27;5ee3806bb6c75d29c94aa9fe&#x27;</span>), <span class="string">&#x27;author&#x27;</span>: <span class="string">&#x27;bash&#x27;</span>, <span class="string">&#x27;age&#x27;</span>: <span class="number">30</span>, <span class="string">&#x27;title&#x27;</span>: <span class="string">&#x27;bash&#x27;</span>, <span class="string">&#x27;text&#x27;</span>: <span class="string">&#x27;Sitoi Blog&#x27;</span>&#125;</span><br><span class="line">&#123;<span class="string">&#x27;_id&#x27;</span>: ObjectId(<span class="string">&#x27;5ee3806bb6c75d29c94aa9ff&#x27;</span>), <span class="string">&#x27;author&#x27;</span>: <span class="string">&#x27;python&#x27;</span>, <span class="string">&#x27;age&#x27;</span>: <span class="number">50</span>, <span class="string">&#x27;title&#x27;</span>: <span class="string">&#x27;language&#x27;</span>, <span class="string">&#x27;text&#x27;</span>: <span class="string">&#x27;Sitoi Blog&#x27;</span>&#125;</span><br><span class="line">&#123;<span class="string">&#x27;_id&#x27;</span>: ObjectId(<span class="string">&#x27;5ee3806bb6c75d29c94aaa00&#x27;</span>), <span class="string">&#x27;author&#x27;</span>: <span class="string">&#x27;mongodb&#x27;</span>, <span class="string">&#x27;age&#x27;</span>: <span class="number">80</span>, <span class="string">&#x27;title&#x27;</span>: <span class="string">&#x27;NoSQL&#x27;</span>, <span class="string">&#x27;text&#x27;</span>: <span class="string">&#x27;Sitoi Blog&#x27;</span>&#125;</span><br><span class="line">&#123;<span class="string">&#x27;_id&#x27;</span>: ObjectId(<span class="string">&#x27;5ee3806bb6c75d29c94aaa01&#x27;</span>), <span class="string">&#x27;author&#x27;</span>: <span class="string">&#x27;pymongo&#x27;</span>, <span class="string">&#x27;age&#x27;</span>: <span class="number">97</span>, <span class="string">&#x27;title&#x27;</span>: <span class="string">&#x27;Python for MongoDB&#x27;</span>, <span class="string">&#x27;text&#x27;</span>: <span class="string">&#x27;Sitoi Blog&#x27;</span>&#125;</span><br></pre></td></tr></table></figure><h3 id="指定返回哪些字段"><a href="#指定返回哪些字段" class="headerlink" title="指定返回哪些字段"></a>指定返回哪些字段</h3><p>通过 <code>projection</code> 参数控制返回的结果包含哪些字段</p><h4 id="示例一：所有字段"><a href="#示例一：所有字段" class="headerlink" title="示例一：所有字段"></a>示例一：所有字段</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">results = col.find()</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;, &#x27;age&#x27;: 30, &#x27;title&#x27;: &#x27;bash&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), &#x27;author&#x27;: &#x27;python&#x27;, &#x27;age&#x27;: 50, &#x27;title&#x27;: &#x27;language&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), &#x27;author&#x27;: &#x27;mongodb&#x27;, &#x27;age&#x27;: 80, &#x27;title&#x27;: &#x27;NoSQL&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;), &#x27;author&#x27;: &#x27;pymongo&#x27;, &#x27;age&#x27;: 97, &#x27;title&#x27;: &#x27;Python for MongoDB&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h4 id="示例二：用字典指定要显示的哪几个字段"><a href="#示例二：用字典指定要显示的哪几个字段" class="headerlink" title="示例二：用字典指定要显示的哪几个字段"></a>示例二：用字典指定要显示的哪几个字段</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">projection = &#123;<span class="string">&quot;_id&quot;</span>: <span class="literal">True</span>, <span class="string">&quot;author&quot;</span>: <span class="literal">True</span>&#125;</span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), &#x27;author&#x27;: &#x27;python&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), &#x27;author&#x27;: &#x27;mongodb&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;), &#x27;author&#x27;: &#x27;pymongo&#x27;&#125;</span><br></pre></td></tr></table></figure><h4 id="示例三：用字典指定去掉哪些字段"><a href="#示例三：用字典指定去掉哪些字段" class="headerlink" title="示例三：用字典指定去掉哪些字段"></a>示例三：用字典指定去掉哪些字段</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">projection = &#123;<span class="string">&quot;_id&quot;</span>: <span class="literal">False</span>, <span class="string">&quot;author&quot;</span>: <span class="literal">False</span>&#125;</span><br><span class="line">results = col.find(query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br><span class="line">&#123;&#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;age&#x27;: 30, &#x27;title&#x27;: &#x27;bash&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;age&#x27;: 50, &#x27;title&#x27;: &#x27;language&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;age&#x27;: 80, &#x27;title&#x27;: &#x27;NoSQL&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;age&#x27;: 97, &#x27;title&#x27;: &#x27;Python for MongoDB&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h4 id="示例四：用列表指定要显示哪几个字段"><a href="#示例四：用列表指定要显示哪几个字段" class="headerlink" title="示例四：用列表指定要显示哪几个字段"></a>示例四：用列表指定要显示哪几个字段</h4><blockquote><p><code>_id</code> 不指定为 <code>False</code> 则必定返回</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">projection = [<span class="string">&quot;author&quot;</span>, <span class="string">&quot;title&quot;</span>]</span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;title&#x27;: &#x27;blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;, &#x27;title&#x27;: &#x27;bash&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), &#x27;author&#x27;: &#x27;python&#x27;, &#x27;title&#x27;: &#x27;language&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), &#x27;author&#x27;: &#x27;mongodb&#x27;, &#x27;title&#x27;: &#x27;NoSQL&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;), &#x27;author&#x27;: &#x27;pymongo&#x27;, &#x27;title&#x27;: &#x27;Python for MongoDB&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="指定查询条件"><a href="#指定查询条件" class="headerlink" title="指定查询条件"></a>指定查询条件</h3><table><thead><tr><th align="center">符号</th><th align="center">含义</th><th align="center">示例</th></tr></thead><tbody><tr><td align="center">$lt</td><td align="center">小于</td><td align="center">{‘age’: {‘$lt’: 18}}</td></tr><tr><td align="center">$gt</td><td align="center">大于</td><td align="center">{‘age’: {‘$gt’: 18}}</td></tr><tr><td align="center">$lte</td><td align="center">小于等于</td><td align="center">{‘age’: {‘$lte’: 18}}</td></tr><tr><td align="center">$gte</td><td align="center">大于等于</td><td align="center">{‘age’: {‘$gte’: 18}}</td></tr><tr><td align="center">$ne</td><td align="center">不等于</td><td align="center">{‘age’: {‘$ne’: 18}}</td></tr><tr><td align="center">$in</td><td align="center">在范围内</td><td align="center">{‘age’: {‘$in’: [18, 22]}}</td></tr><tr><td align="center">$nin</td><td align="center">不在范围内</td><td align="center">{‘age’: {‘$nin’: [18, 22]}}</td></tr><tr><td align="center">$all</td><td align="center">条件内所有值</td><td align="center">{‘age’: {‘$all’: [18, 22]}}</td></tr></tbody></table><h4 id="示例：指定范围，大于等于，小于等于"><a href="#示例：指定范围，大于等于，小于等于" class="headerlink" title="示例：指定范围，大于等于，小于等于"></a>示例：指定范围，大于等于，小于等于</h4><blockquote><p>10 &lt;&#x3D; 年龄 &lt;&#x3D; 30</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$gte&quot;</span>: <span class="number">10</span>, <span class="string">&quot;$lte&quot;</span>: <span class="number">30</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;, &#x27;age&#x27;: 30, &#x27;title&#x27;: &#x27;bash&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="并列查询"><a href="#并列查询" class="headerlink" title="并列查询"></a>并列查询</h3><h4 id="示例一：不同字段，并列条件"><a href="#示例一：不同字段，并列条件" class="headerlink" title="示例一：不同字段，并列条件"></a>示例一：不同字段，并列条件</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;Sitoi&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">22</span>&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br></pre></td></tr></table></figure><h4 id="示例二：相同字段，并列条件"><a href="#示例二：相同字段，并列条件" class="headerlink" title="示例二：相同字段，并列条件"></a>示例二：相同字段，并列条件</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 错误：</span></span><br><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$gt&quot;</span>: <span class="number">10</span>&#125;, <span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$lt&quot;</span>: <span class="number">20</span>&#125;&#125;</span><br><span class="line"><span class="comment"># 正确：</span></span><br><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$gte&quot;</span>: <span class="number">10</span>, <span class="string">&quot;$lte&quot;</span>: <span class="number">20</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="或条件查询"><a href="#或条件查询" class="headerlink" title="或条件查询"></a>或条件查询</h3><h4 id="示例一：不同字段，或条件"><a href="#示例一：不同字段，或条件" class="headerlink" title="示例一：不同字段，或条件"></a>示例一：不同字段，或条件</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;$or&quot;</span>: [&#123;<span class="string">&quot;age&quot;</span>: <span class="number">22</span>&#125;, &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;blog&quot;</span>&#125;]&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h4 id="示例二：相同字段，或条件"><a href="#示例二：相同字段，或条件" class="headerlink" title="示例二：相同字段，或条件"></a>示例二：相同字段，或条件</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;$or&quot;</span>: [&#123;<span class="string">&quot;age&quot;</span>: <span class="number">22</span>&#125;, &#123;<span class="string">&quot;age&quot;</span>: <span class="number">18</span>&#125;]&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="字段是否存在"><a href="#字段是否存在" class="headerlink" title="字段是否存在"></a>字段是否存在</h3><h4 id="示例一：字段不存在"><a href="#示例一：字段不存在" class="headerlink" title="示例一：字段不存在"></a>示例一：字段不存在</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;text&quot;</span>: <span class="literal">None</span>&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br></pre></td></tr></table></figure><h4 id="示例二：字段存在"><a href="#示例二：字段存在" class="headerlink" title="示例二：字段存在"></a>示例二：字段存在</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;text&quot;</span>: &#123;<span class="string">&quot;$ne&quot;</span>: <span class="literal">None</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;, &#x27;age&#x27;: 30, &#x27;title&#x27;: &#x27;bash&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), &#x27;author&#x27;: &#x27;python&#x27;, &#x27;age&#x27;: 50, &#x27;title&#x27;: &#x27;language&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), &#x27;author&#x27;: &#x27;mongodb&#x27;, &#x27;age&#x27;: 80, &#x27;title&#x27;: &#x27;NoSQL&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;), &#x27;author&#x27;: &#x27;pymongo&#x27;, &#x27;age&#x27;: 97, &#x27;title&#x27;: &#x27;Python for MongoDB&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="正则查询"><a href="#正则查询" class="headerlink" title="正则查询"></a>正则查询</h3><p>这里使用 <code>$regex</code> 来指定正则匹配，<code>^S.*</code> 代表以 <code>S</code> 开头的正则表达式。<br>这里将一些功能符号再归类为下表。</p><table><thead><tr><th align="center">符号</th><th align="center">含义</th><th align="center">示例</th><th align="center">示例含义</th></tr></thead><tbody><tr><td align="center">$regex</td><td align="center">匹配正则表达式</td><td align="center"><code>{&#39;author&#39;: {&#39;$regex&#39;: &#39;^S.*&#39;}}</code></td><td align="center"><code>author</code> 以 <code>S</code> 开头</td></tr><tr><td align="center">$exists</td><td align="center">属性是否存在</td><td align="center"><code>{&#39;author&#39;: {&#39;$exists&#39;: True}}</code></td><td align="center"><code>author</code> 属性存在</td></tr><tr><td align="center">$type</td><td align="center">类型判断</td><td align="center"><code>{&#39;age&#39;: {&#39;$type&#39;: &#39;int&#39;}}</code></td><td align="center"><code>age</code> 的类型为 <code>int</code></td></tr><tr><td align="center">$mod</td><td align="center">数字模操作</td><td align="center"><code>{&#39;age&#39;: {&#39;$mod&#39;: [5, 0]}}</code></td><td align="center">年龄模 <code>5</code> 余 <code>0</code></td></tr><tr><td align="center">$text</td><td align="center">文本查询</td><td align="center"><code>{&#39;$text&#39;: {&#39;$search&#39;: &#39;Sitoi&#39;}}</code></td><td align="center">text 类型的属性中包含 <code>Sitoi</code> 字符串</td></tr><tr><td align="center">$where</td><td align="center">高级条件查询</td><td align="center"><code>{&#39;$where&#39;: &#39;obj.fans_count == obj.follows_count&#39;}</code></td><td align="center">自身粉丝数等于关注数</td></tr></tbody></table><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;name&quot;</span>: &#123;<span class="string">&quot;$regex&quot;</span>: <span class="string">&quot;^M.*&quot;</span>&#125;&#125;</span><br><span class="line">projection = &#123;&#125;</span><br><span class="line">result = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br></pre></td></tr></table></figure><h3 id="计数"><a href="#计数" class="headerlink" title="计数"></a>计数</h3><p>根据查询条件计数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">count = col.count_documents(<span class="built_in">filter</span>=query)</span><br></pre></td></tr></table></figure><p>文档条数：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">6</span><br></pre></td></tr></table></figure><h3 id="排序"><a href="#排序" class="headerlink" title="排序"></a>排序</h3><p>用 <code>List</code> 嵌套 <code>tuple</code> 的方式即可：[(<code>字段名 1</code>，<code>排序方式 1</code>),(<code>字段名 2</code>，<code>排序方式 2</code>)]</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">result = col.find().sort([(<span class="string">&quot;author&quot;</span>, <span class="number">1</span>), (<span class="string">&quot;title&quot;</span>, <span class="number">1</span>)])</span><br></pre></td></tr></table></figure><p>排序结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;, &#x27;age&#x27;: 30, &#x27;title&#x27;: &#x27;bash&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), &#x27;author&#x27;: &#x27;mongodb&#x27;, &#x27;age&#x27;: 80, &#x27;title&#x27;: &#x27;NoSQL&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;), &#x27;author&#x27;: &#x27;pymongo&#x27;, &#x27;age&#x27;: 97, &#x27;title&#x27;: &#x27;Python for MongoDB&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), &#x27;author&#x27;: &#x27;python&#x27;, &#x27;age&#x27;: 50, &#x27;title&#x27;: &#x27;language&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="跳过"><a href="#跳过" class="headerlink" title="跳过"></a>跳过</h3><p>跳过一个</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">result = col.find().skip(<span class="number">1</span>)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fd&#x27;), &#x27;author&#x27;: &#x27;blog&#x27;, &#x27;age&#x27;: 18, &#x27;title&#x27;: &#x27;blog&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fe&#x27;), &#x27;author&#x27;: &#x27;bash&#x27;, &#x27;age&#x27;: 30, &#x27;title&#x27;: &#x27;bash&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9ff&#x27;), &#x27;author&#x27;: &#x27;python&#x27;, &#x27;age&#x27;: 50, &#x27;title&#x27;: &#x27;language&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa00&#x27;), &#x27;author&#x27;: &#x27;mongodb&#x27;, &#x27;age&#x27;: 80, &#x27;title&#x27;: &#x27;NoSQL&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aaa01&#x27;), &#x27;author&#x27;: &#x27;pymongo&#x27;, &#x27;age&#x27;: 97, &#x27;title&#x27;: &#x27;Python for MongoDB&#x27;, &#x27;text&#x27;: &#x27;Sitoi Blog&#x27;&#125;</span><br></pre></td></tr></table></figure><h3 id="限制"><a href="#限制" class="headerlink" title="限制"></a>限制</h3><p>限制最多返回多少个</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">result = col.find().limit(<span class="number">1</span>)</span><br></pre></td></tr></table></figure><p>查询结果：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">&#123;&#x27;_id&#x27;: ObjectId(&#x27;5ee3806bb6c75d29c94aa9fc&#x27;), &#x27;author&#x27;: &#x27;Sitoi&#x27;, &#x27;age&#x27;: 22, &#x27;title&#x27;: &#x27;Sitoi-blog&#x27;, &#x27;tags&#x27;: [&#x27;man&#x27;, &#x27;spider&#x27;]&#125;</span><br></pre></td></tr></table></figure><h2 id="更新文档"><a href="#更新文档" class="headerlink" title="更新文档"></a>更新文档</h2><h3 id="更新单个文档"><a href="#更新单个文档" class="headerlink" title="更新单个文档"></a>更新单个文档</h3><p><code>update_one</code> 只更新第一个文档。</p><p>参数说明：</p><ul><li><code>filter</code>：需要更新的数据的查询条件</li><li><code>update</code>：包含更新的方式，以及更新的内容</li><li><code>upsert</code>：不存在是否插入，更新的数据</li></ul><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: <span class="number">18</span>&#125;</span><br><span class="line">update = &#123;<span class="string">&quot;$set&quot;</span>: &#123;<span class="string">&quot;age&quot;</span>: <span class="number">20</span>&#125;&#125;</span><br><span class="line">modified_count = col.update_one(<span class="built_in">filter</span>=query, update=update, upsert=<span class="literal">False</span>).modified_count</span><br></pre></td></tr></table></figure><p>更新条数：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">1</span><br></pre></td></tr></table></figure><h3 id="更新多个文档"><a href="#更新多个文档" class="headerlink" title="更新多个文档"></a>更新多个文档</h3><blockquote><p>使用方法和 <code>update_one</code> 一致</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;</span><br><span class="line">update = &#123;<span class="string">&quot;$set&quot;</span>: &#123;<span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi PyMongo&quot;</span>&#125;&#125;</span><br><span class="line">modified_count = col.update_many(<span class="built_in">filter</span>=query, update=update, upsert=<span class="literal">False</span>).modified_count</span><br></pre></td></tr></table></figure><p>更新条数：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">5</span><br></pre></td></tr></table></figure><h2 id="删除文档"><a href="#删除文档" class="headerlink" title="删除文档"></a>删除文档</h2><h3 id="删除单个文档"><a href="#删除单个文档" class="headerlink" title="删除单个文档"></a>删除单个文档</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;Sitoi&quot;</span>&#125;</span><br><span class="line">result = col.delete_one(<span class="built_in">filter</span>=query).deleted_count</span><br></pre></td></tr></table></figure><p>删除条数：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">1</span><br></pre></td></tr></table></figure><h3 id="删除多个文档"><a href="#删除多个文档" class="headerlink" title="删除多个文档"></a>删除多个文档</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">query = &#123;&#125;</span><br><span class="line">results = col.delete_many(<span class="built_in">filter</span>=query).deleted_count</span><br></pre></td></tr></table></figure><p>删除条数：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">5</span><br></pre></td></tr></table></figure><h2 id="附录（代码）"><a href="#附录（代码）" class="headerlink" title="附录（代码）"></a>附录（代码）</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br><span class="line">93</span><br><span class="line">94</span><br><span class="line">95</span><br><span class="line">96</span><br><span class="line">97</span><br><span class="line">98</span><br><span class="line">99</span><br><span class="line">100</span><br><span class="line">101</span><br><span class="line">102</span><br><span class="line">103</span><br><span class="line">104</span><br><span class="line">105</span><br><span class="line">106</span><br><span class="line">107</span><br><span class="line">108</span><br><span class="line">109</span><br><span class="line">110</span><br><span class="line">111</span><br><span class="line">112</span><br><span class="line">113</span><br><span class="line">114</span><br><span class="line">115</span><br><span class="line">116</span><br><span class="line">117</span><br><span class="line">118</span><br><span class="line">119</span><br><span class="line">120</span><br><span class="line">121</span><br><span class="line">122</span><br><span class="line">123</span><br><span class="line">124</span><br><span class="line">125</span><br><span class="line">126</span><br><span class="line">127</span><br><span class="line">128</span><br><span class="line">129</span><br><span class="line">130</span><br><span class="line">131</span><br><span class="line">132</span><br><span class="line">133</span><br><span class="line">134</span><br><span class="line">135</span><br><span class="line">136</span><br><span class="line">137</span><br><span class="line">138</span><br><span class="line">139</span><br><span class="line">140</span><br><span class="line">141</span><br><span class="line">142</span><br><span class="line">143</span><br><span class="line">144</span><br><span class="line">145</span><br><span class="line">146</span><br><span class="line">147</span><br><span class="line">148</span><br><span class="line">149</span><br><span class="line">150</span><br><span class="line">151</span><br><span class="line">152</span><br><span class="line">153</span><br><span class="line">154</span><br><span class="line">155</span><br><span class="line">156</span><br><span class="line">157</span><br><span class="line">158</span><br><span class="line">159</span><br><span class="line">160</span><br><span class="line">161</span><br><span class="line">162</span><br><span class="line">163</span><br><span class="line">164</span><br><span class="line">165</span><br><span class="line">166</span><br><span class="line">167</span><br><span class="line">168</span><br><span class="line">169</span><br><span class="line">170</span><br><span class="line">171</span><br><span class="line">172</span><br><span class="line">173</span><br><span class="line">174</span><br><span class="line">175</span><br><span class="line">176</span><br><span class="line">177</span><br><span class="line">178</span><br><span class="line">179</span><br><span class="line">180</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> pymongo <span class="keyword">import</span> MongoClient</span><br><span class="line"></span><br><span class="line">clinet = MongoClient(<span class="string">&quot;mongodb://localhost:27017&quot;</span>)</span><br><span class="line"></span><br><span class="line">db = clinet[<span class="string">&quot;demo&quot;</span>]</span><br><span class="line">col = db[<span class="string">&quot;demo&quot;</span>]</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 查看数据库信息&quot;</span>)</span><br><span class="line">server_info = clinet.server_info()</span><br><span class="line"><span class="built_in">print</span>(server_info)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 显示当前数据库服务器上的数据库名&quot;</span>)</span><br><span class="line">database_names = clinet.list_database_names()</span><br><span class="line"><span class="built_in">print</span>(database_names)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 显示当前数据库上的全部集合名&quot;</span>)</span><br><span class="line">collection_names = db.list_collection_names()</span><br><span class="line"><span class="built_in">print</span>(collection_names)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 插入一个文档&quot;</span>)</span><br><span class="line">demo = &#123;</span><br><span class="line">    <span class="string">&quot;author&quot;</span>: <span class="string">&quot;Sitoi&quot;</span>,</span><br><span class="line">    <span class="string">&quot;age&quot;</span>: <span class="number">22</span>,</span><br><span class="line">    <span class="string">&quot;title&quot;</span>: <span class="string">&quot;Sitoi-blog&quot;</span>,</span><br><span class="line">    <span class="string">&quot;tags&quot;</span>: [<span class="string">&quot;man&quot;</span>, <span class="string">&quot;spider&quot;</span>]</span><br><span class="line">&#125;</span><br><span class="line">demo_id = col.insert_one(demo).inserted_id</span><br><span class="line"><span class="built_in">print</span>(demo_id)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 插入多个文档&quot;</span>)</span><br><span class="line">demos = [</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;blog&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">18</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;blog&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;bash&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">30</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;bash&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;python&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">50</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;language&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;mongodb&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">80</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;NoSQL&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">    &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;pymongo&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">97</span>, <span class="string">&quot;title&quot;</span>: <span class="string">&quot;Python for MongoDB&quot;</span>, <span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;,</span><br><span class="line">]</span><br><span class="line">demo_ids = col.insert_many(demos).inserted_ids</span><br><span class="line"><span class="built_in">print</span>(demo_ids)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 查询单个文档&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">result = col.find_one(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="built_in">print</span>(result)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 查询多个文档&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">result = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> result:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 所有字段&quot;</span>)</span><br><span class="line"></span><br><span class="line">results = col.find()</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 用字典指定要显示的哪几个字段&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">projection = &#123;<span class="string">&quot;_id&quot;</span>: <span class="literal">True</span>, <span class="string">&quot;author&quot;</span>: <span class="literal">True</span>&#125;</span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 用字典指定去掉哪些字段&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">projection = &#123;<span class="string">&quot;_id&quot;</span>: <span class="literal">False</span>, <span class="string">&quot;author&quot;</span>: <span class="literal">False</span>&#125;</span><br><span class="line">results = col.find(query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 用列表指定要显示哪几个字段&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">projection = [<span class="string">&quot;author&quot;</span>, <span class="string">&quot;title&quot;</span>]</span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 指定范围，大于等于，小于等于&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$gte&quot;</span>: <span class="number">10</span>, <span class="string">&quot;$lte&quot;</span>: <span class="number">30</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 不同字段，并列条件&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;Sitoi&quot;</span>, <span class="string">&quot;age&quot;</span>: <span class="number">22</span>&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 相同字段，并列条件&quot;</span>)</span><br><span class="line"><span class="comment"># 错误：</span></span><br><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$gt&quot;</span>: <span class="number">50</span>&#125;, <span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$lt&quot;</span>: <span class="number">100</span>&#125;&#125;</span><br><span class="line"><span class="comment"># 正确：</span></span><br><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: &#123;<span class="string">&quot;$gte&quot;</span>: <span class="number">10</span>, <span class="string">&quot;$lte&quot;</span>: <span class="number">20</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 不同字段，或条件&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;$or&quot;</span>: [&#123;<span class="string">&quot;age&quot;</span>: <span class="number">22</span>&#125;, &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;blog&quot;</span>&#125;]&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 相同字段，或条件&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;$or&quot;</span>: [&#123;<span class="string">&quot;age&quot;</span>: <span class="number">22</span>&#125;, &#123;<span class="string">&quot;age&quot;</span>: <span class="number">18</span>&#125;]&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 字段不存在&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;text&quot;</span>: <span class="literal">None</span>&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 字段存在&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;text&quot;</span>: &#123;<span class="string">&quot;$ne&quot;</span>: <span class="literal">None</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 正则查询&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;author&quot;</span>: &#123;<span class="string">&quot;$regex&quot;</span>: <span class="string">&quot;^S.*&quot;</span>&#125;&#125;</span><br><span class="line">projection = <span class="literal">None</span></span><br><span class="line">results = col.find(<span class="built_in">filter</span>=query, projection=projection)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 计数&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">count = col.count_documents(<span class="built_in">filter</span>=query)</span><br><span class="line"><span class="built_in">print</span>(count)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 排序&quot;</span>)</span><br><span class="line">results = col.find().sort([(<span class="string">&quot;author&quot;</span>, <span class="number">1</span>), (<span class="string">&quot;title&quot;</span>, -<span class="number">1</span>)])</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 跳过&quot;</span>)</span><br><span class="line">results = col.find().skip(<span class="number">1</span>)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 限制&quot;</span>)</span><br><span class="line">results = col.find().limit(<span class="number">1</span>)</span><br><span class="line"><span class="keyword">for</span> one <span class="keyword">in</span> results:</span><br><span class="line">    <span class="built_in">print</span>(one)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 更新单个文档&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;age&quot;</span>: <span class="number">18</span>&#125;</span><br><span class="line">update = &#123;<span class="string">&quot;$set&quot;</span>: &#123;<span class="string">&quot;age&quot;</span>: <span class="number">20</span>&#125;&#125;</span><br><span class="line">modified_count = col.update_one(<span class="built_in">filter</span>=query, update=update, upsert=<span class="literal">False</span>).modified_count</span><br><span class="line"><span class="built_in">print</span>(modified_count)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 更新多个文档&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi Blog&quot;</span>&#125;</span><br><span class="line">update = &#123;<span class="string">&quot;$set&quot;</span>: &#123;<span class="string">&quot;text&quot;</span>: <span class="string">&quot;Sitoi PyMongo&quot;</span>&#125;&#125;</span><br><span class="line">modified_count = col.update_many(<span class="built_in">filter</span>=query, update=update, upsert=<span class="literal">False</span>).modified_count</span><br><span class="line"><span class="built_in">print</span>(modified_count)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 删除单个文档&quot;</span>)</span><br><span class="line">query = &#123;<span class="string">&quot;author&quot;</span>: <span class="string">&quot;Sitoi&quot;</span>&#125;</span><br><span class="line">delete_count = col.delete_one(<span class="built_in">filter</span>=query).deleted_count</span><br><span class="line"><span class="built_in">print</span>(delete_count)</span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;&gt;&gt;&gt; 删除多个文档&quot;</span>)</span><br><span class="line">query = &#123;&#125;</span><br><span class="line">delete_count = col.delete_many(<span class="built_in">filter</span>=query).deleted_count</span><br><span class="line"><span class="built_in">print</span>(delete_count)</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;建立基本连接&quot;&gt;&lt;a href=&quot;#建立基本连接&quot; class=&quot;headerlink&quot; title=&quot;建立基本连接&quot;&gt;&lt;/a&gt;建立基本连接&lt;/h2&gt;&lt;p&gt;首先我们需要建立一个连接，连接 MongoDB 时，我们需要使用 PyMongo 库中的 MongoClie</summary>
      
    
    
    
    <category term="数据库" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/"/>
    
    <category term="MongoDB" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/MongoDB/"/>
    
    
    <category term="MongoDB" scheme="https://sitoi.cn/tags/MongoDB/"/>
    
    <category term="PyMongo" scheme="https://sitoi.cn/tags/PyMongo/"/>
    
    <category term="CRUD" scheme="https://sitoi.cn/tags/CRUD/"/>
    
    <category term="NoSQL" scheme="https://sitoi.cn/tags/NoSQL/"/>
    
    <category term="Python" scheme="https://sitoi.cn/tags/Python/"/>
    
  </entry>
  
  <entry>
    <title>Fedora 安装 MongoDB 教程</title>
    <link href="https://sitoi.cn/posts/37161.html"/>
    <id>https://sitoi.cn/posts/37161.html</id>
    <published>2020-06-09T12:04:00.000Z</published>
    <updated>2025-11-12T05:28:30.732Z</updated>
    
    <content type="html"><![CDATA[<h1 id="MongoDB-安装"><a href="#MongoDB-安装" class="headerlink" title="MongoDB 安装"></a>MongoDB 安装</h1><h2 id="安装环境"><a href="#安装环境" class="headerlink" title="安装环境"></a>安装环境</h2><ul><li>Fedora 29</li></ul><h2 id="安装步骤"><a href="#安装步骤" class="headerlink" title="安装步骤"></a>安装步骤</h2><ol><li>安装 <code>mongodb</code> 和 <code>mongodb-server</code></li><li>启动服务</li></ol><h3 id="安装-MongoDB-和-MongoDB-Server"><a href="#安装-MongoDB-和-MongoDB-Server" class="headerlink" title="安装 MongoDB 和 MongoDB-Server"></a>安装 MongoDB 和 MongoDB-Server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> dnf install mongodb mongodb-server</span><br></pre></td></tr></table></figure><h3 id="启动服务"><a href="#启动服务" class="headerlink" title="启动服务"></a>启动服务</h3><p>启动 <code>mongod</code> 服务</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> systemctl start mongod.service</span><br></pre></td></tr></table></figure><p>设置开机自动启动</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> systemctl <span class="built_in">enable</span> mongod.service</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;MongoDB-安装&quot;&gt;&lt;a href=&quot;#MongoDB-安装&quot; class=&quot;headerlink&quot; title=&quot;MongoDB 安装&quot;&gt;&lt;/a&gt;MongoDB 安装&lt;/h1&gt;&lt;h2 id=&quot;安装环境&quot;&gt;&lt;a href=&quot;#安装环境&quot; class=&quot;head</summary>
      
    
    
    
    <category term="数据库" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/"/>
    
    <category term="MongoDB" scheme="https://sitoi.cn/categories/%E6%95%B0%E6%8D%AE%E5%BA%93/MongoDB/"/>
    
    
    <category term="安装文档" scheme="https://sitoi.cn/tags/%E5%AE%89%E8%A3%85%E6%96%87%E6%A1%A3/"/>
    
    <category term="Linux" scheme="https://sitoi.cn/tags/Linux/"/>
    
    <category term="MongoDB" scheme="https://sitoi.cn/tags/MongoDB/"/>
    
    <category term="NoSQL" scheme="https://sitoi.cn/tags/NoSQL/"/>
    
  </entry>
  
  <entry>
    <title>Squid for Windows 安装教程</title>
    <link href="https://sitoi.cn/posts/53752.html"/>
    <id>https://sitoi.cn/posts/53752.html</id>
    <published>2020-06-09T07:21:59.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<h2 id="下载-Squid-for-Windows"><a href="#下载-Squid-for-Windows" class="headerlink" title="下载 Squid for Windows"></a>下载 Squid for Windows</h2><p>下载地址：<a href="https://squid.diladele.com/">https://squid.diladele.com/</a></p><h2 id="安装-squid-服务"><a href="#安装-squid-服务" class="headerlink" title="安装 squid 服务"></a>安装 squid 服务</h2><ol><li><p>双击 <code>squid.msi</code> 安装 squid</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162038253.png" alt="squid.msi"></p></li><li><p>点击 <code>Next</code> 进入下一步</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162314774.png" alt="安装 Squid"></p></li><li><p>勾选 <code>I accept</code> -&gt; 点击 <code>Next</code> 进入下一步</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162334357.png" alt="接受协议"></p></li><li><p>选择安装路径 -&gt; 点击 <code>Next</code> 进入下一步</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162407927.png" alt="选择安装路径"></p></li><li><p>点击 <code>Install</code> 进行安装</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162417741.png" alt="开始安装"></p></li><li><p>等待安装完成</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162435862.png" alt="安装中"></p></li><li><p>安装完成</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162507517.png" alt="完成安装"></p></li><li><p>检测 电脑右下方 <code>Squid for Windows</code> 是否启动</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200608162538460.png" alt="安装完成并运行"></p></li></ol><h2 id="安装完成"><a href="#安装完成" class="headerlink" title="安装完成"></a>安装完成</h2><p>使用浏览器打开 <a href="http://127.0.0.1:3128/">http://127.0.0.1:3128</a>，出现以下页面就表示安装成功。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/squid/image-20200609153622275.png" alt="Squid Web"></p>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;下载-Squid-for-Windows&quot;&gt;&lt;a href=&quot;#下载-Squid-for-Windows&quot; class=&quot;headerlink&quot; title=&quot;下载 Squid for Windows&quot;&gt;&lt;/a&gt;下载 Squid for Windows&lt;/h2&gt;&lt;</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    <category term="Squid" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/Squid/"/>
    
    
    <category term="爬虫" scheme="https://sitoi.cn/tags/%E7%88%AC%E8%99%AB/"/>
    
    <category term="Squid" scheme="https://sitoi.cn/tags/Squid/"/>
    
    <category term="代理" scheme="https://sitoi.cn/tags/%E4%BB%A3%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>多机分布式环境下的 selenium 集群</title>
    <link href="https://sitoi.cn/posts/19006.html"/>
    <id>https://sitoi.cn/posts/19006.html</id>
    <published>2020-06-09T04:22:14.000Z</published>
    <updated>2025-11-12T05:28:30.735Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>还是爬虫里面的某个场景需要重度使用 selenium 场景 ，所以需要起这样一个集群</p><blockquote><p>对比</p></blockquote><ul><li>单机版</li><li>Docker 单机版</li><li>Docker 单机集群版</li><li>Docker 分布式集群版</li></ul><p>我们在使用 <code>selenium</code> 的时候，我们一般就使用以上的环境和模式</p><h2 id="单机版"><a href="#单机版" class="headerlink" title="单机版"></a>单机版</h2><p>单机怎么操作呢，下载相应的 <code>webdriver</code>，安装配置参考教程 <a href="/posts/14489.html">Selenium &amp; ChromeDriver 全平台安装教程（Mac、Windows、Linux）</a></p><p>对于小型的使用环境，比如单线程操作，我们安装好环境直接使用就可以了</p><blockquote><p>使用案例（Chrome）</p></blockquote><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line">browser = webdriver.Chrome()</span><br><span class="line">browser.get(<span class="string">&#x27;https://sitoi.cn/&#x27;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;sitoi.cn.png&quot;</span>)</span><br><span class="line">browser.close()</span><br></pre></td></tr></table></figure><h2 id="Docker-单机版"><a href="#Docker-单机版" class="headerlink" title="Docker 单机版"></a>Docker 单机版</h2><p>保证安装好 <code>docker</code> 和 <code>docker-compose</code> ，这里就直接使用 <code>docker-compose.yml</code> 文件起一个实例</p><ol><li><p>编写 <code>docker-compose.yml</code> 文件，内容如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">version:</span> <span class="string">&quot;3&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="attr">services:</span></span><br><span class="line">  <span class="attr">chrome:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/standalone-chrome:latest</span></span><br><span class="line">    <span class="attr">restart:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">environment:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">GRID_TIMEOUT=40</span></span><br><span class="line">    <span class="attr">ports:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="number">4444</span><span class="string">:4444</span></span><br></pre></td></tr></table></figure></li><li><p>在 <code>docker-compose.yml</code> 所在的目录，运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> docker-compose up -d</span><br></pre></td></tr></table></figure></li><li><p>检测服务是否启动成功</p><p>浏览器打开 <a href="http://127.0.0.1:4444/">http://127.0.0.1:4444/</a> 就可以看到我们的 hub 界面了，端口对应 <code>docker-compose.yml</code> 文件内的 <code>ports</code></p></li><li><p>使用案例（Chrome）</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line">browser = webdriver.Remote(</span><br><span class="line">    command_executor=<span class="string">&#x27;http://127.0.0.1:4444/wd/hub&#x27;</span>,</span><br><span class="line">    desired_capabilities=&#123;</span><br><span class="line">        <span class="string">&#x27;browserName&#x27;</span>: <span class="string">&#x27;chrome&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;version&#x27;</span>: <span class="string">&#x27;&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;platform&#x27;</span>: <span class="string">&#x27;ANY&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;goog:chromeOptions&#x27;</span>: &#123;</span><br><span class="line">            <span class="string">&#x27;extensions&#x27;</span>: [],</span><br><span class="line">            <span class="string">&#x27;args&#x27;</span>: [<span class="string">&#x27;--no-sandbox&#x27;</span>, <span class="string">&#x27;-headless&#x27;</span>, <span class="string">&#x27;--disable-dev-shm-usage&#x27;</span>]&#125;</span><br><span class="line">    &#125;</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line">browser.get(<span class="string">&quot;https://sitoi.cn&quot;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;sitoi.cn.png&quot;</span>)</span><br><span class="line">browser.quit()</span><br></pre></td></tr></table></figure></li></ol><blockquote><p>如果你要开很多个也可以，前台挂个 <code>nginx</code> 然后启用多个之后集群</p></blockquote><h2 id="Docker-单机集群版"><a href="#Docker-单机集群版" class="headerlink" title="Docker 单机集群版"></a>Docker 单机集群版</h2><ol><li><p>编写 <code>docker-compose.yml</code> 文件，内容如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">version:</span> <span class="string">&quot;2&quot;</span></span><br><span class="line"><span class="attr">services:</span></span><br><span class="line">  <span class="attr">hub:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/hub:latest</span></span><br><span class="line">    <span class="attr">ports:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">&quot;0.0.0.0:4445:4444&quot;</span></span><br><span class="line"></span><br><span class="line">  <span class="attr">chrome:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/node-chrome:latest</span></span><br><span class="line">    <span class="attr">restart:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">depends_on:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">hub</span></span><br><span class="line">    <span class="attr">environment:</span></span><br><span class="line">      <span class="attr">HUB_HOST:</span> <span class="string">hub</span></span><br><span class="line"></span><br><span class="line">  <span class="attr">firefox:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/node-firefox:latest</span></span><br><span class="line">    <span class="attr">restart:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">depends_on:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">hub</span></span><br><span class="line">    <span class="attr">environment:</span></span><br><span class="line">      <span class="attr">HUB_HOST:</span> <span class="string">hub</span></span><br></pre></td></tr></table></figure></li><li><p>在 <code>docker-compose.yml</code> 所在的目录，运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> docker-compose up -d</span><br></pre></td></tr></table></figure></li><li><p>检测服务是否启动成功</p><p>浏览器打开 <a href="http://127.0.0.1:4445/">http://127.0.0.1:4445/</a> 就可以看到我们的 hub 界面了，端口对应 <code>docker-compose.yml</code> 文件内的 <code>ports</code></p></li><li><p>使用案例（Firefox）</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line">browser = webdriver.Remote(</span><br><span class="line">    command_executor=<span class="string">&#x27;http://127.0.0.1:4445/wd/hub&#x27;</span>,</span><br><span class="line">    desired_capabilities=&#123;</span><br><span class="line">        <span class="string">&#x27;browserName&#x27;</span>: <span class="string">&#x27;firefox&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;version&#x27;</span>: <span class="string">&#x27;&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;platform&#x27;</span>: <span class="string">&#x27;ANY&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;goog:firefoxOptions&#x27;</span>: &#123;</span><br><span class="line">            <span class="string">&#x27;extensions&#x27;</span>: [],</span><br><span class="line">            <span class="string">&#x27;args&#x27;</span>: [<span class="string">&#x27;--no-sandbox&#x27;</span>, <span class="string">&#x27;-headless&#x27;</span>, <span class="string">&#x27;--disable-dev-shm-usage&#x27;</span>]&#125;</span><br><span class="line">    &#125;</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line">browser.get(<span class="string">&quot;https://sitoi.cn&quot;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;sitoi.cn.png&quot;</span>)</span><br><span class="line">browser.quit()</span><br></pre></td></tr></table></figure></li></ol><h2 id="多机集群"><a href="#多机集群" class="headerlink" title="多机集群"></a>多机集群</h2><p>为了解决单机（单机集群）横向扩展不足的问题，我们可以搭建分布式的 selenium 集群，将 hub 节点 和 node 节点拆分开，方便以后的横向扩展，可以通过添加机器来解决单机内存等性能问题。</p><h3 id="部署-hub-节点"><a href="#部署-hub-节点" class="headerlink" title="部署 hub 节点"></a>部署 hub 节点</h3><p>hub 节点假设在 A 机上，IP 地址为: <code>10.10.1.1</code></p><ol><li><p>编写 <code>docker-compose.yml</code> 文件，内容如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">version:</span> <span class="string">&quot;2&quot;</span></span><br><span class="line"><span class="attr">services:</span></span><br><span class="line">  <span class="attr">hub:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/hub:latest</span></span><br><span class="line">    <span class="attr">restart:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">ports:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">&quot;0.0.0.0:4446:4444&quot;</span></span><br></pre></td></tr></table></figure></li><li><p>在 <code>docker-compose.yml</code> 所在的目录，运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> docker-compose up -d</span><br></pre></td></tr></table></figure></li><li><p>检测服务是否启动成功</p><p>浏览器打开 <a href="http://10.10.1.1:4446/">http://10.10.1.1:4446/</a> 就可以看到我们的 hub 界面了，端口对应 <code>docker-compose.yml</code> 文件内的 <code>ports</code></p></li></ol><h3 id="部署-node-节点（chrome）"><a href="#部署-node-节点（chrome）" class="headerlink" title="部署 node 节点（chrome）"></a>部署 node 节点（chrome）</h3><p>node 节点（chrome）假设 B 机上，IP 地址为：<code>10.10.2.1</code></p><ol><li><p>编写 <code>docker-compose.yml</code> 文件，内容如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">version:</span> <span class="string">&quot;2&quot;</span></span><br><span class="line"><span class="attr">services:</span></span><br><span class="line">  <span class="attr">chrome:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/node-chrome:latest</span></span><br><span class="line">    <span class="attr">restart:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">environment:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">HUB_HOST=10.10.1.1</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">HUB_PORT=4446</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">REMOTE_HOST=http://10.10.2.1:5556</span></span><br><span class="line">    <span class="attr">ports:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="number">0.0</span><span class="number">.0</span><span class="number">.0</span><span class="string">:5556:5555</span></span><br></pre></td></tr></table></figure><div class="note info flat"><p><code>HUB_HOST</code> 填写 hub 节点的 IP 地址<br><code>HUB_PORT</code> 填写 hub 节点的 PORT 端口<br><code>REMOTE_HOST</code> 填写地址为 node 节点的 IP 地址，和端口号</p></div></li><li><p>在 <code>docker-compose.yml</code> 所在的目录，运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> docker-compose up -d</span><br></pre></td></tr></table></figure></li></ol><h3 id="部署-node-节点（firefox）"><a href="#部署-node-节点（firefox）" class="headerlink" title="部署 node 节点（firefox）"></a>部署 node 节点（firefox）</h3><p>node 节点（firefox）假设 C 机上,ip 地址为：<code>10.10.3.1</code></p><ol><li><p>编写 <code>docker-compose.yml</code> 文件，内容如下：</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">version:</span> <span class="string">&quot;2&quot;</span></span><br><span class="line"><span class="attr">services:</span></span><br><span class="line">  <span class="attr">firefox:</span></span><br><span class="line">    <span class="attr">image:</span> <span class="string">selenium/node-firefox:latest</span></span><br><span class="line">    <span class="attr">restart:</span> <span class="string">always</span></span><br><span class="line">    <span class="attr">environment:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">HUB_HOST=10.10.1.1</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">HUB_PORT=4446</span></span><br><span class="line">      <span class="bullet">-</span> <span class="string">REMOTE_HOST=http://10.10.3.1:5557</span></span><br><span class="line">    <span class="attr">ports:</span></span><br><span class="line">      <span class="bullet">-</span> <span class="number">0.0</span><span class="number">.0</span><span class="number">.0</span><span class="string">:5557:5555</span></span><br></pre></td></tr></table></figure></li><li><p>在 <code>docker-compose.yml</code> 所在的目录，运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> docker-compose up -d</span><br></pre></td></tr></table></figure></li><li><p>使用案例（Firefox）</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line">browser = webdriver.Remote(</span><br><span class="line">    command_executor=<span class="string">&#x27;http://10.10.1.1:4446/wd/hub&#x27;</span>,</span><br><span class="line">    desired_capabilities=&#123;</span><br><span class="line">        <span class="string">&#x27;browserName&#x27;</span>: <span class="string">&#x27;firefox&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;version&#x27;</span>: <span class="string">&#x27;&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;platform&#x27;</span>: <span class="string">&#x27;ANY&#x27;</span>,</span><br><span class="line">        <span class="string">&#x27;goog:firefoxOptions&#x27;</span>: &#123;</span><br><span class="line">            <span class="string">&#x27;extensions&#x27;</span>: [],</span><br><span class="line">            <span class="string">&#x27;args&#x27;</span>: [<span class="string">&#x27;--no-sandbox&#x27;</span>, <span class="string">&#x27;-headless&#x27;</span>, <span class="string">&#x27;--disable-dev-shm-usage&#x27;</span>]&#125;</span><br><span class="line">    &#125;</span><br><span class="line">)</span><br><span class="line"></span><br><span class="line">browser.get(<span class="string">&quot;https://sitoi.cn&quot;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;sitoi.cn.png&quot;</span>)</span><br><span class="line">browser.quit()</span><br></pre></td></tr></table></figure></li></ol><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>使用 docker 极大屏蔽了部署 selenium 会遇到的系统差异问题，步骤简洁，易于配置。</li><li>不再需要本地安装 selenium 环境，直接使用远程的环境即可，环境更统一</li></ul>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;还是爬虫里面的某个场景需要重度使用 selenium 场景 ，所以需要起这样一个集群&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;对比&lt;/p&gt;
</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    <category term="Selenium" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/Selenium/"/>
    
    
    <category term="Docker" scheme="https://sitoi.cn/tags/Docker/"/>
    
    <category term="Selenium" scheme="https://sitoi.cn/tags/Selenium/"/>
    
    <category term="Selenium Grid" scheme="https://sitoi.cn/tags/Selenium-Grid/"/>
    
    <category term="Chrome" scheme="https://sitoi.cn/tags/Chrome/"/>
    
    <category term="Firefox" scheme="https://sitoi.cn/tags/Firefox/"/>
    
    <category term="webdriver" scheme="https://sitoi.cn/tags/webdriver/"/>
    
  </entry>
  
  <entry>
    <title>Selenium &amp; ChromeDriver 全平台安装教程（Mac、Windows、Linux）</title>
    <link href="https://sitoi.cn/posts/14489.html"/>
    <id>https://sitoi.cn/posts/14489.html</id>
    <published>2020-06-08T14:37:35.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>本教程教你如何在 Windows、Mac、Linux 安装 Selenium 并配置对应的 WebDriver，并以 ChromeDriver 为例。</p><h2 id="安装-Selenium（全平台通用）"><a href="#安装-Selenium（全平台通用）" class="headerlink" title="安装 Selenium（全平台通用）"></a>安装 Selenium（全平台通用）</h2><p>使用 Pypi 包管理器安装，运行如下命令：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pip3 install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple --user</span><br></pre></td></tr></table></figure><p>输出如下：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">Looking <span class="keyword">in</span> indexes: https://pypi.tuna.tsinghua.edu.cn/simple</span><br><span class="line">Collecting selenium</span><br><span class="line">  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/80/d6/4294f0b4bce4de0abf13e17190289f9d0613b0a44e5dd6a7f5ca98459853/selenium-3.141.0-py2.py3-none-any.whl (904kB)</span><br><span class="line">    100% |████████████████████████████████| 911kB 11.7MB/s</span><br><span class="line">Requirement already satisfied: urllib3 <span class="keyword">in</span> ./Library/Python/3.7/lib/python/site-packages (from selenium) (1.24.3)</span><br><span class="line">Installing collected packages: selenium</span><br><span class="line">Successfully installed selenium-3.141.0</span><br></pre></td></tr></table></figure><p>看到 <code>Successfully installed selenium-3.141.0</code> 就表示安装成功了</p><h2 id="配置-ChromeDrvier"><a href="#配置-ChromeDrvier" class="headerlink" title="配置 ChromeDrvier"></a>配置 ChromeDrvier</h2><h3 id="查看-Chrome-版本号"><a href="#查看-Chrome-版本号" class="headerlink" title="查看 Chrome 版本号"></a>查看 Chrome 版本号</h3><p>打开 Chrome 浏览器，在地址栏输入：<code>chrome://version</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/chrome_version.jpg" alt="Chrome Version"></p><p>获取对应的版本号、操作系统信息，用于之后下载对应的 <code>ChromeDriver</code> 版本</p><h3 id="下载-ChromeDriver"><a href="#下载-ChromeDriver" class="headerlink" title="下载 ChromeDriver"></a>下载 ChromeDriver</h3><blockquote><p>下载地址：</p></blockquote><ul><li>淘宝下载地址(推荐)：<a href="https://npm.taobao.org/mirrors/chromedriver/">https://npm.taobao.org/mirrors/chromedriver/</a></li><li>官网下载地址：<a href="https://chromedriver.chromium.org/downloads">https://chromedriver.chromium.org/downloads</a></li></ul><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/chromedrivers.png" alt="ChromeDriver 版本列表"></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/chromedriver_version.png" alt="ChromeDriver 平台版本"></p><h3 id="安装-ChromeDriver"><a href="#安装-ChromeDriver" class="headerlink" title="安装 ChromeDriver"></a>安装 ChromeDriver</h3><h4 id="MAC-版本"><a href="#MAC-版本" class="headerlink" title="MAC 版本"></a>MAC 版本</h4><ol><li><p>解压下载的 <code>chromedriver_mac64.zip</code> 得到 <code>chromedrive</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/mac_chromedriver.png" alt="Mac ChromeDriver"></p></li><li><p>将 <code>chromedrive</code> 拷贝到 <code>/usr/local/bin/</code> 目录下即可。运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cp</span> chromedriver /usr/local/bin</span><br></pre></td></tr></table></figure></li></ol><h4 id="Windows-版本"><a href="#Windows-版本" class="headerlink" title="Windows 版本"></a>Windows 版本</h4><ol><li><p>解压下载得 <code>chromedriver_win32.zip</code> 的到 <code>chromedrive.exe</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/win_chromedriver.png" alt="Windows ChromeDriver"></p></li><li><p>将 <code>chromedrive.exe</code> 移入到 Google Chrome 根目录下</p><p>默认路径位置：<code>C:\Program Files (x86)\Google\Chrome\Application</code><br><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/win_chrome_path.png" alt="Google Chrome 根目录"></p></li><li><p>为 Windows 添加 chromedriver 得 PATH 环境变量</p><ol><li><p>右击 <code>此电脑</code>，选择 <code>属性</code>。</p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/win_property.png" alt="此电脑 属性"></p></li><li><p>点击右侧的 <code>高级系统设置</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/win_settings.png" alt="高级系统设置"></p></li><li><p>点击标签 <code>高级</code>，再点击 <code>环境变量</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/win_env.png" alt="环境变量"></p></li><li><p>找到 <code>系统变量</code>，再双击 <code>PATH</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/path_env.png" alt="编辑 PATH"></p></li><li><p>点击右上角 <code>新建</code> 按钮，输入放置 <code>chromedriver.exe</code> 的目录路径（<code>C:\Program Files (x86)\Google\Chrome\Application</code>）<br><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/add_path_env.png" alt="添加 PATH"></p></li><li><p>点击确认，即可保存</p></li></ol></li></ol><blockquote><p>注：如果你之前就打开了编辑器等，请重启编辑器环境变量才会生效！</p></blockquote><h4 id="Linux-版本"><a href="#Linux-版本" class="headerlink" title="Linux 版本"></a>Linux 版本</h4><ol><li><p>解压下载的 <code>chromedriver_linux64.zip</code> 得到 <code>chromedrive</code></p><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/selenium/linux_chromedriver.png" alt="Linux ChromeDriver"></p></li><li><p>将 <code>chromedrive</code> 拷贝到 <code>/usr/bin/</code> 目录下即可。运行如下命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> <span class="built_in">cp</span> chromedriver /usr/bin</span><br></pre></td></tr></table></figure></li><li><p>为 <code>chromedriver</code> 添加可执行权限</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> <span class="built_in">chmod</span> +x /usr/bin/chromedriver</span><br></pre></td></tr></table></figure></li></ol><h3 id="测试是否安装成功"><a href="#测试是否安装成功" class="headerlink" title="测试是否安装成功"></a>测试是否安装成功</h3><ol><li><p>创建一个 <code>py</code> 文件，将下面的代码复制进去，并运行即可。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> selenium <span class="keyword">import</span> webdriver</span><br><span class="line"></span><br><span class="line">browser = webdriver.Chrome()</span><br><span class="line">browser.get(<span class="string">&#x27;https://sitoi.cn/&#x27;</span>)</span><br><span class="line">browser.get_screenshot_as_file(<span class="string">&quot;sitoi.cn.png&quot;</span>)</span><br><span class="line">browser.close()</span><br></pre></td></tr></table></figure></li><li><p>看到浏览器自己打开，并且在运行的目录下有图片 <code>sitoi.cn.png</code> 则表示安装成功</p></li></ol>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;本教程教你如何在 Windows、Mac、Linux 安装 Selenium 并配置对应的 WebDriver，并以 ChromeDrive</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    <category term="Selenium" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/Selenium/"/>
    
    
    <category term="Mac" scheme="https://sitoi.cn/tags/Mac/"/>
    
    <category term="Windows" scheme="https://sitoi.cn/tags/Windows/"/>
    
    <category term="Linux" scheme="https://sitoi.cn/tags/Linux/"/>
    
    <category term="Selenium" scheme="https://sitoi.cn/tags/Selenium/"/>
    
    <category term="ChromeDriver" scheme="https://sitoi.cn/tags/ChromeDriver/"/>
    
  </entry>
  
  <entry>
    <title>Scrapy 小技巧（一）：使用 scrapy 自带的函数（follow &amp; follow_all）优雅的生成下一个请求</title>
    <link href="https://sitoi.cn/posts/61836.html"/>
    <id>https://sitoi.cn/posts/61836.html</id>
    <published>2020-06-06T12:57:01.000Z</published>
    <updated>2025-11-12T05:28:30.733Z</updated>
    
    <content type="html"><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>如何优雅的获取同一个网站上下一次爬取的链接并放到生成一个 Scrapy Response 呢？</p><h2 id="样例"><a href="#样例" class="headerlink" title="样例"></a>样例</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> urllib <span class="keyword">import</span> parse</span><br><span class="line"></span><br><span class="line"><span class="keyword">import</span> scrapy</span><br><span class="line"></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">SitoiSpider</span>(scrapy.Spider):</span><br><span class="line">    name = <span class="string">&quot;sitoi&quot;</span></span><br><span class="line"></span><br><span class="line">    start_urls = [</span><br><span class="line">        <span class="string">&#x27;https://sitoi.cn&#x27;</span>,</span><br><span class="line">    ]</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">        href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a/@href&quot;</span>).extract()</span><br><span class="line">        <span class="keyword">for</span> href <span class="keyword">in</span> href_list:</span><br><span class="line">            url = parse.urljoin(response.url, href)</span><br><span class="line">            <span class="keyword">yield</span> scrapy.Request(url=url, callback=<span class="variable language_">self</span>.parse_next)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">parse_next</span>(<span class="params">self, response</span>):</span><br><span class="line">        <span class="built_in">print</span>(response.url)</span><br></pre></td></tr></table></figure><h3 id="方式一：使用-urllib-库来拼接-URL"><a href="#方式一：使用-urllib-库来拼接-URL" class="headerlink" title="方式一：使用 urllib 库来拼接 URL"></a>方式一：使用 urllib 库来拼接 URL</h3><p>这个方式是通过 <code>urllib</code> 库来对下一个 url 进行补全成完整的 url，再使用 <code>scrapy.Request</code> 的方式进行下一个页面的爬取。</p><p><strong>优点</strong></p><ol><li>在处理每一个 href 的时候可以添加一些自定义的内容（例如记录一下当前第几页了等等）</li></ol><p><strong>缺点</strong></p><ol><li>需要引入其他的库</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a/@href&quot;</span>).extract()</span><br><span class="line">    <span class="keyword">for</span> href <span class="keyword">in</span> href_list:</span><br><span class="line">        url = parse.urljoin(response.url, href)</span><br><span class="line">        <span class="keyword">yield</span> scrapy.Request(url=url, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure><h3 id="方式二：使用-response-自带的-urljoin"><a href="#方式二：使用-response-自带的-urljoin" class="headerlink" title="方式二：使用 response 自带的 urljoin"></a>方式二：使用 response 自带的 urljoin</h3><p>这个方式是通过 Scrapy response 自带的 <code>urljoin</code> 对下一个 url 进行补全成完整的 url，再使用 <code>scrapy.Request</code> 的方式进行下一个页面的爬取。（和方式一基本相同）</p><p><strong>优点</strong></p><ol><li>不再需要在 spider 文件中引入多的第三方库。</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a/@href&quot;</span>).extract()</span><br><span class="line">    <span class="keyword">for</span> href <span class="keyword">in</span> href_list:</span><br><span class="line">        url = response.urljoin(href)</span><br><span class="line">        <span class="keyword">yield</span> scrapy.Request(url=url, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure><h3 id="方式三：使用-response-自带的-follow"><a href="#方式三：使用-response-自带的-follow" class="headerlink" title="方式三：使用 response 自带的 follow"></a>方式三：使用 response 自带的 follow</h3><p>这个方式是通过 Scrapy response 自带的 <code>follow</code> 进行下一个页面的爬取。</p><p><strong>优点</strong></p><ol><li>不再需要在 spider 文件中引入多的第三方库。</li><li>不需要写 <code>extract()</code> 来提取 href 字符串，只需要传入 href 这个 <code>Selector</code>（可选）</li><li>不需要写 url 拼接</li><li><code>xpath</code> 只需要编写到 <code>a</code> 标签即可，可以省略掉 <code>@href</code>,即不需要获取 href 的 <code>Selector</code>，直接传递 a 的 <code>Selector</code>（可选）</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a/@href&quot;</span>).extract()</span><br><span class="line">    <span class="keyword">for</span> href <span class="keyword">in</span> href_list:</span><br><span class="line">        <span class="keyword">yield</span> response.follow(url=href, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure><p><strong>变种一</strong></p><ol><li>不写 <code>extract()</code> 来提取 href 字符串，传入 href 这个 <code>Selector</code></li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a/@href&quot;</span>)</span><br><span class="line">    <span class="keyword">for</span> href <span class="keyword">in</span> href_list:</span><br><span class="line">        <span class="keyword">yield</span> response.follow(url=href, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure><p><strong>变种二</strong></p><ol><li>不写 <code>extract()</code> 来提取 href 字符串，传入 href 这个 <code>Selector</code></li><li><code>xpath</code> 不写 <code>@href</code>，直接传递 a 的 <code>Selector</code></li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a/&quot;</span>)</span><br><span class="line">    <span class="keyword">for</span> href <span class="keyword">in</span> href_list:</span><br><span class="line">        <span class="keyword">yield</span> response.follow(url=href, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure><h3 id="方式四：使用-response-自带的-follow-all"><a href="#方式四：使用-response-自带的-follow-all" class="headerlink" title="方式四：使用 response 自带的 follow_all"></a>方式四：使用 response 自带的 follow_all</h3><p>这个方式是通过 Scrapy response 自带的 <code>follow_all</code> 进行下一个页面的爬取。</p><p><strong>优点</strong></p><ol><li>不再需要在 spider 文件中引入多的第三方库。</li><li>不需要写 <code>extract()</code> 来提取 href 字符串，只需要传入 href 这个 selector（可选）</li><li>不需要写 url 拼接</li><li>只需要编写到 <code>a</code> 标签即可，可以省略掉 <code>@href</code>，即不需要获取 href 的 <code>SelectorList</code>，直接传递 a 的 <code>SelectorList</code>（可选）</li><li>不需要编写遍历，直接把抓到的 url 的 <code>SelectorList</code> 放入即可</li></ol><p><strong>缺点</strong></p><ol><li>如果中间还有什么逻辑，就不太适用了（例如记录一下当前第几页了等等）</li></ol><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    href_list = response.xpath(<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a&quot;</span>)</span><br><span class="line">    <span class="keyword">yield</span> <span class="keyword">from</span> response.follow_all(urls=href_list, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure><p><strong>变种</strong></p><blockquote><p>注：前方高能</p></blockquote><p>一行代码搞定。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">parse</span>(<span class="params">self, response</span>):</span><br><span class="line">    <span class="keyword">yield</span> <span class="keyword">from</span> response.follow_all(xpath=<span class="string">&quot;//div[@class=&#x27;card&#x27;]/a&quot;</span>, callback=<span class="variable language_">self</span>.parse_next)</span><br></pre></td></tr></table></figure>]]></content>
    
    
      
      
    <summary type="html">&lt;h2 id=&quot;前言&quot;&gt;&lt;a href=&quot;#前言&quot; class=&quot;headerlink&quot; title=&quot;前言&quot;&gt;&lt;/a&gt;前言&lt;/h2&gt;&lt;p&gt;如何优雅的获取同一个网站上下一次爬取的链接并放到生成一个 Scrapy Response 呢？&lt;/p&gt;
&lt;h2 id=&quot;样例&quot;&gt;&lt;a hre</summary>
      
    
    
    
    <category term="爬虫" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/"/>
    
    <category term="Scrapy" scheme="https://sitoi.cn/categories/%E7%88%AC%E8%99%AB/Scrapy/"/>
    
    
    <category term="scrapy" scheme="https://sitoi.cn/tags/scrapy/"/>
    
    <category term="爬虫" scheme="https://sitoi.cn/tags/%E7%88%AC%E8%99%AB/"/>
    
  </entry>
  
  <entry>
    <title>Mac 制作 Linux 启动 U 盘</title>
    <link href="https://sitoi.cn/posts/28583.html"/>
    <id>https://sitoi.cn/posts/28583.html</id>
    <published>2020-05-01T08:57:00.000Z</published>
    <updated>2025-11-12T05:28:30.732Z</updated>
    
    <content type="html"><![CDATA[<h1 id="Mac-制作-Linux-启动盘"><a href="#Mac-制作-Linux-启动盘" class="headerlink" title="Mac 制作 Linux 启动盘"></a>Mac 制作 Linux 启动盘</h1><blockquote><p>前期准备</p></blockquote><ol><li>一个 Mac 电脑</li><li>一个 U 盘（8GB 以上）</li><li>下载好 Linux 系统镜像（iso 文件）</li></ol><blockquote><p>具体步骤</p></blockquote><ol><li>挂载 U 盘</li><li>解挂 U 盘</li><li>写系统镜像到 U 盘</li><li>完成</li></ol><h2 id="一、挂载-U-盘"><a href="#一、挂载-U-盘" class="headerlink" title="一、挂载 U 盘"></a>一、挂载 U 盘</h2><p>首先插入 U 盘，打开终端输入下面的命令查看 U 盘是否已经 mount 到系统，或者在 Finder 下也可以看到 U 盘被识别了，如下图所示：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">diskutil list</span><br></pre></td></tr></table></figure><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/mac-u-linux/diskutil_list.png" alt="diskutil unmountDisk"></p><h2 id="二、解挂-U-盘"><a href="#二、解挂-U-盘" class="headerlink" title="二、解挂 U 盘"></a>二、解挂 U 盘</h2><p>使用 unmount 命令 解除挂载，命令如下：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">diskutil unmountDisk /dev/disk3</span><br></pre></td></tr></table></figure><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/mac-u-linux/diskutil_unmountDisk.png" alt="diskutil unmountDisk"></p><p>这样就有了一个已经插入但是 unmount 的 U 盘了，这时候你在 Finder 下看不到这个 U 盘了，但是用 <code>diskutil list</code> 命令还可以看到。</p><h2 id="三、写系统镜像到-U-盘"><a href="#三、写系统镜像到-U-盘" class="headerlink" title="三、写系统镜像到 U 盘"></a>三、写系统镜像到 U 盘</h2><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">sudo</span> <span class="built_in">dd</span> <span class="keyword">if</span>=/Users/shitao/Downloads/CentOS-7-x86_64-DVD-2003.iso of=/dev/disk3 bs=1m</span><br></pre></td></tr></table></figure><p><code>if=</code> 后面是 Linux ios 文件的的路径</p><p><code>of=</code> 后面的是 U 盘的名称</p><p><code>bs</code> 表示写入块大小，可以设置为 2m，但不要太大</p><p>Tips：因为是 sudo 所以需要输入密码，输入后按回车即可。</p><h2 id="四、完成"><a href="#四、完成" class="headerlink" title="四、完成"></a>四、完成</h2><p>等待几分钟，见到如下输出就表示已经成功完成了。</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">4560+0 records in</span><br><span class="line">4560+0 records out</span><br><span class="line">4781506560 bytes transferred in 291.165911 secs (16421931 bytes/sec)</span><br></pre></td></tr></table></figure><p><img src="https://cdn.jsdelivr.net/gh/Sitoi/cdn/img/mac-u-linux/sudo_dd_if_of_bs.png" alt="输出结果"></p>]]></content>
    
    
      
      
    <summary type="html">&lt;h1 id=&quot;Mac-制作-Linux-启动盘&quot;&gt;&lt;a href=&quot;#Mac-制作-Linux-启动盘&quot; class=&quot;headerlink&quot; title=&quot;Mac 制作 Linux 启动盘&quot;&gt;&lt;/a&gt;Mac 制作 Linux 启动盘&lt;/h1&gt;&lt;blockquote&gt;
&lt;p&gt;前</summary>
      
    
    
    
    <category term="开发环境" scheme="https://sitoi.cn/categories/%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83/"/>
    
    <category term="Mac" scheme="https://sitoi.cn/categories/%E5%BC%80%E5%8F%91%E7%8E%AF%E5%A2%83/Mac/"/>
    
    
    <category term="Mac" scheme="https://sitoi.cn/tags/Mac/"/>
    
    <category term="Linux" scheme="https://sitoi.cn/tags/Linux/"/>
    
    <category term="启动盘" scheme="https://sitoi.cn/tags/%E5%90%AF%E5%8A%A8%E7%9B%98/"/>
    
  </entry>
  
</feed>
