-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
I was using tarantula and noticed that it was trying to crawl a "tel:" link, ultimately failing because the path of the parsed URI was nil. I looked into it and saw that a simple fix would be to add tel to the skip_uri_patterns list in Crawler's initialize function. However, the crawler would have the same issue with other URI schemes that aren't listed in skip_uri_patterns, so it seems like a more general approach may be better. Do you think it would make more sense to skip URIs that start with any scheme name, or is there a reason you specifically chose to only skip the javascript, mailto, and http schemes?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels