-
Notifications
You must be signed in to change notification settings - Fork 53
response from _get_results(query) contains NoneType which leads to parsing Fail #35
Copy link
Copy link
Open
Description
Hi Matt,
trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
133 """Return the first 10 Google search results for a given query.
134
135 Args:
(...)
140 results (dict): Results of query.
141 """
143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
146 if results:
147 if output == "dataframe":
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
118 output = []
120 for result in results:
121 item = {
122 'title': result.find(css_identifier_title, first=True).text,
123 'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124 'text': result.find(css_identifier_text, first=True).text
...
125 }
127 output.append(item)
129 return output
AttributeError: 'NoneType' object has no attribute 'text'
then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:
results = google_search("stupid")
results
yields normal output, rerunning this (jupyter cell) with keyword
results = google_search("allergy")
results
yields
AttributeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
2 results
Cell In[8], line 3, in google_search(query)
1 def google_search(query):
2 response = get_results(query)
----> 3 return parse_results(response)
Cell In[7], line 17, in parse_results(response)
10 output = []
12 for result in results:
14 item = {
15 'title': result.find(css_identifier_title, first=True).text,
16 'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17 'text': result.find(css_identifier_text, first=True).text
18 }
20 output.append(item)
22 return output
AttributeError: 'NoneType' object has no attribute 'text'
So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels