<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Beautifulsoup count occurences of string in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454525#M35698</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Seems like the easiest way to find out what it's doing is to just print out the results of&amp;nbsp;&lt;/P&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;&lt;CODE&gt;soup&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;.&lt;/SPAN&gt;find_all&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;(&lt;/SPAN&gt;string&lt;SPAN class="" style="color: #a67f59; background: rgba(255, 255, 255, 0.5); border: 0px; font-weight: inherit;"&gt;=&lt;/SPAN&gt;re&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;.&lt;/SPAN&gt;compile&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;(&lt;/SPAN&gt;&lt;SPAN class="" style="color: #669900; border: 0px; font-weight: inherit;"&gt;"COVID-19"&lt;/SPAN&gt;&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;)&lt;/SPAN&gt;&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;right? Seems that the function returns a list of all matches, so you could just do something like the following:&lt;/P&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;&lt;CODE&gt;matches &lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt; soup&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;find_all&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;string&lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt;re&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;compile&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;&lt;SPAN class="string token"&gt;"COVID-19"&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;
&lt;SPAN class="keyword token"&gt;for&lt;/SPAN&gt; match &lt;SPAN class="keyword token"&gt;in&lt;/SPAN&gt; matches&lt;SPAN class="punctuation token"&gt;:&lt;/SPAN&gt;
   &lt;SPAN class="keyword token"&gt;print&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;match&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;‍‍‍‍‍‍&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Also I had a bit of a peek at that webpage, and I think in some places it references "Covid-19" and in others "COVID-19", which with your regular expression would only return the latter.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Sat, 11 Dec 2021 20:16:21 GMT</pubDate>
    <dc:creator>JoshuaSharp-Heward</dc:creator>
    <dc:date>2021-12-11T20:16:21Z</dc:date>
    <item>
      <title>Beautifulsoup count occurences of string</title>
      <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454522#M35695</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I think this gets me the length of the text count for "COVID-19" because it prints 8.&lt;/P&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;&lt;CODE&gt;&lt;SPAN class="keyword token"&gt;import&lt;/SPAN&gt; requests
&lt;SPAN class="keyword token"&gt;from&lt;/SPAN&gt; bs4 &lt;SPAN class="keyword token"&gt;import&lt;/SPAN&gt; BeautifulSoup
&lt;SPAN class="keyword token"&gt;import&lt;/SPAN&gt; re

url &lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt; r&lt;SPAN class="string token"&gt;'https://www.bolingbrook.com/coronavirus'&lt;/SPAN&gt;

&lt;SPAN class="comment token"&gt;#request webpage&lt;/SPAN&gt;
soup &lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt; BeautifulSoup&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;requests&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;get&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;url&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;content&lt;SPAN class="punctuation token"&gt;,&lt;/SPAN&gt; &lt;SPAN class="string token"&gt;"lxml"&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;
&lt;SPAN class="comment token"&gt;#find occurences of string&lt;/SPAN&gt;
&lt;SPAN class="keyword token"&gt;print&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;len&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;soup&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;find_all&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;string&lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt;re&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;compile&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;&lt;SPAN class="string token"&gt;"COVID-19"&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

&lt;SPAN class="comment token"&gt;#prints&lt;/SPAN&gt;
&lt;SPAN class="operator token"&gt;&amp;gt;&amp;gt;&lt;/SPAN&gt;&lt;SPAN class="operator token"&gt;&amp;gt;&lt;/SPAN&gt; &lt;SPAN class="number token"&gt;8&lt;/SPAN&gt;‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When I do a CTRL+F for "COVID-19" on the webpage I get a count of 5 occurrences. When I do a CTRL+F for "COVID-19" in the Developer tools I get 15.&lt;/P&gt;&lt;P&gt;&lt;IMG __jive_id="503979" alt="" class="jive-emoji image-1 jive-image j-img-original" src="https://community.esri.com/legacyfs/online/503979_co.png" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I'm trying to get the count for the total occurrences of the string "COVID-19". How can I set up the code to do that?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 11 Dec 2021 20:16:18 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454522#M35695</guid>
      <dc:creator>JaredPilbeam2</dc:creator>
      <dc:date>2021-12-11T20:16:18Z</dc:date>
    </item>
    <item>
      <title>Re: Beautifulsoup count occurences of string</title>
      <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454523#M35696</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I can't say I fully understand what your COVID-19 searches are meant to reflect, but I am quite certain the results from the Developer tool are not what you are after.&amp;nbsp; HTML is stylized content, and there are plenty of reasons COVID-19 could should up in parts of HTML that would not be what you are trying to capture with your search.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In order to extract meaning from scraping HTML pages, you need a decent understanding ahead of time about the HTML structure of the site/pages. &amp;nbsp;&amp;nbsp; Most web pages today are dynamically generated from content management systems with fairly organized structure, i.e., there are set banners, sidebars, footers, ..., and main content.&amp;nbsp; Which part of the page you want to analyze and how you analyze it varies from question to question.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 20 Aug 2020 03:22:24 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454523#M35696</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2020-08-20T03:22:24Z</dc:date>
    </item>
    <item>
      <title>Re: Beautifulsoup count occurences of string</title>
      <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454524#M35697</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Apologies if I wasn't very clear. I was more or less wondering if the len() function was doing what I intended it to, which was to count the occurrences of the COVID-19 element not the length of the string.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The script is meant to find the number of times "COVID-19" occurs in that URL. If that number changes the next time the script runs (through Task Scheduler) it triggers an email to one of the GIS staff members here (I left those parts out for clarification).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I see what you mean, though. You have to really study the HTML in able to get the script to accurately find things. It's a little difficult to find time to know a website that good when you're looking at every municipal and school district website in the county, however. They change constantly.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 20 Aug 2020 14:45:02 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454524#M35697</guid>
      <dc:creator>JaredPilbeam2</dc:creator>
      <dc:date>2020-08-20T14:45:02Z</dc:date>
    </item>
    <item>
      <title>Re: Beautifulsoup count occurences of string</title>
      <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454525#M35698</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Seems like the easiest way to find out what it's doing is to just print out the results of&amp;nbsp;&lt;/P&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;&lt;CODE&gt;soup&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;.&lt;/SPAN&gt;find_all&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;(&lt;/SPAN&gt;string&lt;SPAN class="" style="color: #a67f59; background: rgba(255, 255, 255, 0.5); border: 0px; font-weight: inherit;"&gt;=&lt;/SPAN&gt;re&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;.&lt;/SPAN&gt;compile&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;(&lt;/SPAN&gt;&lt;SPAN class="" style="color: #669900; border: 0px; font-weight: inherit;"&gt;"COVID-19"&lt;/SPAN&gt;&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;)&lt;/SPAN&gt;&lt;SPAN class="" style="color: #999999; border: 0px; font-weight: inherit;"&gt;)&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;right? Seems that the function returns a list of all matches, so you could just do something like the following:&lt;/P&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;&lt;CODE&gt;matches &lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt; soup&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;find_all&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;string&lt;SPAN class="operator token"&gt;=&lt;/SPAN&gt;re&lt;SPAN class="punctuation token"&gt;.&lt;/SPAN&gt;compile&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;&lt;SPAN class="string token"&gt;"COVID-19"&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;
&lt;SPAN class="keyword token"&gt;for&lt;/SPAN&gt; match &lt;SPAN class="keyword token"&gt;in&lt;/SPAN&gt; matches&lt;SPAN class="punctuation token"&gt;:&lt;/SPAN&gt;
   &lt;SPAN class="keyword token"&gt;print&lt;/SPAN&gt;&lt;SPAN class="punctuation token"&gt;(&lt;/SPAN&gt;match&lt;SPAN class="punctuation token"&gt;)&lt;/SPAN&gt;‍‍‍‍‍‍&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Also I had a bit of a peek at that webpage, and I think in some places it references "Covid-19" and in others "COVID-19", which with your regular expression would only return the latter.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 11 Dec 2021 20:16:21 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454525#M35698</guid>
      <dc:creator>JoshuaSharp-Heward</dc:creator>
      <dc:date>2021-12-11T20:16:21Z</dc:date>
    </item>
    <item>
      <title>Re: Beautifulsoup count occurences of string</title>
      <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454526#M35699</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You're right. That's exactly what I did. I put the print statement in, and simply counted the results. There are 8 of them. I was thinking it was counting the length of "COVID-19" because there are 8 characters there too.&lt;/P&gt;&lt;PRE class="lia-code-sample line-numbers language-none"&gt;&lt;CODE&gt;#printed this
&amp;gt;&amp;gt;&amp;gt;
COVID-19 Message
Village of Bolingbrook COVID-19 Update 05.06.20
COVID-19 Message
Village of Bolingbrook COVID-19 Update 05.06.20
The Village of Bolingbrook has been in constant communication with both Amita Bolingbrook Hospital and Edward Hospital, along with the Will County and Illinois Department of Public Health to monitor the possible spread of the Coronavirus (COVID-19) in the Bolingbrook area. We are following the most current recommendations for treatment and preventing the potential spread of infection. Protective equipment and procedures are in place to provide emergency medical treatment and safe transport of individuals to the hospital.&amp;nbsp;
COVID-19 HELP
Will County COVID-19 Cases
Here in Northern Illinois, we have had 41 blood drives canceled due to coronavirus concerns, resulting in approximately 1,500 uncollected blood donations.&amp;nbsp; As the number of COVID-19 cases grow, we do expect that number to increase unfortunately.&amp;nbsp; That’s why we are asking organizations to please keep their blood drives and for donors to continue to give.&amp;nbsp; Together, we must ensure a readily available blood supply for patients who are counting on us.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍&lt;SPAN class="line-numbers-rows"&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;SPAN&gt;‍&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, good point. CTRL+F in the developer tools seems to not be case-sensitive. I'll go ahead and mark your answer correct since it was the closest thing to my not-so-clear question.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 11 Dec 2021 20:16:23 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454526#M35699</guid>
      <dc:creator>JaredPilbeam2</dc:creator>
      <dc:date>2021-12-11T20:16:23Z</dc:date>
    </item>
    <item>
      <title>Re: Beautifulsoup count occurences of string</title>
      <link>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454527#M35700</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I also noticed that for the first time today, that CTRL-F isn't case sensitive! Glad I could help out though mate.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 20 Aug 2020 19:25:32 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/beautifulsoup-count-occurences-of-string/m-p/454527#M35700</guid>
      <dc:creator>JoshuaSharp-Heward</dc:creator>
      <dc:date>2020-08-20T19:25:32Z</dc:date>
    </item>
  </channel>
</rss>

