-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Current Behaviour
For the scrapping of each documentation a different ruleset need to be specified per documentation, the process needs to be unified as much as possible, you can see a template like ./documentations/template.sh , the rules should be getting unified as much as possible (maintaining always the flexibility of creating your own as there is always edge cases where the spidering is not easily done, the parsing is different as may need some JS , or something different to be checked etc...)
Still for the bulk of the documentations, the process is straight forward , and it is ending up on human intervention on the following parts:
- Given a link , or list of links
- Select the subpaths from the spidered .html
- Apply a ruleset of selectors for all the .html pages , and if they differ, create per subpath rules.
Intended Behaviour
Make the LLM's assist in the process of selecting the subpaths of insterest for the Filtering Stage and Determine the Title and Body Selectors for each subpath.
Make the LLM update the ruleset with the time