What and how it can be used:
The URL component fetches content from one or more web pages, following links recursively. It acts as a web scraper that can crawl websites, download page content, extract data, and navigate through multiple pages by following hyperlinks. This component enables automated web content retrieval and site crawling within workflows.

When/how the component should be used:
- Use when you need to fetch and extract content from one or more URLs, process it and return it in various formats.
- Ideal for web scraping and data extraction from websites.
- Use Chat Input to input a valid URL pointing to the desired content.
- The component fetches and extracts the content.
- Pass the output to Split Text for chunking if the content is long.
- Send chunks to the Embedding Model or Language Model for processing.
Connections with other components:
- Chat Output
- Text Input
- Agent Core
- API Request
- Directory
- News Search
- RSS Reader
- SQL Database
- Web Search
- Language Model
- If-Else
- Batch Run
- DataFrame Operations
- LLM Router
- Parser
- Python Interpreter
- Save File
- Smart Function
- Split Text
- Structured Output
- Type Convert
- Listen
- Loop
- Notify
- Smart Router
- Calculator
- Anonymization
- Guardrail
- Human-in-the-loop
- Bing Search API
- ChromaDB
In tool mode:
- Agent Core
- Human-in-the-loop
Configurable settings:
- URLs ( Write the URLs )
- Depth
Default settings:
- URLs ( Write the URLs )
- Depth
Control Section:
- URLs
- Depth
- Prevent Outside
- Use Async
- Output Format
- Timeout
- Headers
- Filter Text/HTML
- Continue on Failure
- Check Response Status
- Autoset Encoding
- Actions in tool mode
Default values:
- Depth = 1.00
- Prevent Outside = on
- Use Async = on
- Output Format = Text
- Timeout = 30
- Filter Text/HTML = on
- Continue on Failure = on
- Autoset Encoding = on
In tool mode:
- Actions = FETCH_CONTENT
Desired Behaviour:
- Load content.
- Clearly show the source.
