diff --git a/apify-api/openapi/openapi.yaml b/apify-api/openapi/openapi.yaml index 7ddfa96cf..77a6922c3 100644 --- a/apify-api/openapi/openapi.yaml +++ b/apify-api/openapi/openapi.yaml @@ -122,7 +122,7 @@ info: ``` However, there are a few explicitly described exceptions, such as - Dataset [Get items](#/reference/datasets/item-collection/get-items) or + [Get dataset items](#/reference/datasets/item-collection/get-items) or Key-value store [Get record](#/reference/key-value-stores/record/get-record) API endpoints, which return data in other formats. In case of an error, the response has the HTTP status code in the range of diff --git a/apify-api/openapi/paths/actor-tasks/actor-tasks@{actorTaskId}@runs.yaml b/apify-api/openapi/paths/actor-tasks/actor-tasks@{actorTaskId}@runs.yaml index 5c7c80418..1e9cc1237 100644 --- a/apify-api/openapi/paths/actor-tasks/actor-tasks@{actorTaskId}@runs.yaml +++ b/apify-api/openapi/paths/actor-tasks/actor-tasks@{actorTaskId}@runs.yaml @@ -153,7 +153,7 @@ post: To fetch the Actor run results that are typically stored in the default dataset, you'll need to pass the ID received in the `defaultDatasetId` field received in the response JSON to the - [Get items](#/reference/datasets/item-collection/get-items) API endpoint. + [Get dataset items](#/reference/datasets/item-collection/get-items) API endpoint. operationId: actorTask_runs_post parameters: - name: actorTaskId diff --git a/apify-api/openapi/paths/actors/acts@{actorId}@runs.yaml b/apify-api/openapi/paths/actors/acts@{actorId}@runs.yaml index 3e39d6a6c..e2df1ce35 100644 --- a/apify-api/openapi/paths/actors/acts@{actorId}@runs.yaml +++ b/apify-api/openapi/paths/actors/acts@{actorId}@runs.yaml @@ -167,7 +167,7 @@ post: To fetch the Actor run results that are typically stored in the default dataset, you'll need to pass the ID received in the `defaultDatasetId` field - received in the response JSON to the [Get items](#/reference/datasets/item-collection/get-items) + received in the response JSON to the [Get dataset items](#/reference/datasets/item-collection/get-items) API endpoint. operationId: act_runs_post parameters: diff --git a/apify-api/openapi/paths/datasets/datasets@{datasetId}@items.yaml b/apify-api/openapi/paths/datasets/datasets@{datasetId}@items.yaml index bde66cf2c..73217531e 100644 --- a/apify-api/openapi/paths/datasets/datasets@{datasetId}@items.yaml +++ b/apify-api/openapi/paths/datasets/datasets@{datasetId}@items.yaml @@ -1,7 +1,7 @@ get: tags: - Storage/Datasets - summary: Get items + summary: Get dataset items description: | Returns data stored in the dataset in a desired format. diff --git a/sources/academy/index.mdx b/sources/academy/index.mdx index 22dbc61fa..f100aeb4c 100644 --- a/sources/academy/index.mdx +++ b/sources/academy/index.mdx @@ -1,5 +1,5 @@ --- -title: Web Scraping Academy +title: Apify Academy description: Learn everything about web scraping and automation with our free courses that will turn you into an expert scraper developer. sidebar_position: 0 slug: / diff --git a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md index a11e82f12..d85205996 100644 --- a/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md +++ b/sources/academy/tutorials/api/run_actor_and_retrieve_data_via_api.md @@ -254,7 +254,7 @@ The **run info** JSON also contains the IDs of the default [dataset](/platform/s > If you are scraping products, or any list of items with similar fields, the [dataset](/platform/storage/dataset) should be your storage of choice. Don't forget though, that dataset items are immutable. This means that you can only add to the dataset, and not change the content that is already inside it. -To retrieve the data from a dataset, send a GET request to the [**Get items**](/api/v2/dataset-items-get) endpoint and pass the `defaultDatasetId` into the URL. For a GET request to the default dataset, no token is needed. +To retrieve the data from a dataset, send a GET request to the [**Get dataset items**](/api/v2/dataset-items-get) endpoint and pass the `defaultDatasetId` into the URL. For a GET request to the default dataset, no token is needed. ```cURL https://api.apify.com/v2/datasets/DATASET_ID/items diff --git a/sources/legal/index.mdx b/sources/legal/index.mdx index b1cfdb6e5..c7060cc0c 100644 --- a/sources/legal/index.mdx +++ b/sources/legal/index.mdx @@ -10,7 +10,7 @@ hide_table_of_contents: true -## Company details (Impressum) +## Legal info (Imprint) **Apify Technologies s.r.o.**
Registered seat: Vodickova 704/36, 110 00 Prague 1, Czech Republic
diff --git a/sources/legal/sidebars.js b/sources/legal/sidebars.js index 5e6971852..678a12cae 100644 --- a/sources/legal/sidebars.js +++ b/sources/legal/sidebars.js @@ -2,7 +2,7 @@ module.exports = { legal: [ { type: 'link', - label: 'Company details (Impressum)', + label: 'Legal info (Imprint)', href: '/legal', }, { diff --git a/sources/platform/actors/development/actor_definition/input_schema/secret_input.md b/sources/platform/actors/development/actor_definition/input_schema/secret_input.md index 061cfd48e..bfde32d4c 100644 --- a/sources/platform/actors/development/actor_definition/input_schema/secret_input.md +++ b/sources/platform/actors/development/actor_definition/input_schema/secret_input.md @@ -69,7 +69,7 @@ If you read the `INPUT` key from the Actor run's default key-value store directl > await Actor.getValue('INPUT'); { username: 'username', - password: 'ENCRYPTED_VALUE:Hw/uqRMRNHmxXYYDJCyaQX6xcwUnVYQnH4fWIlKZL2Vhtq1rZmtoGXQSnhIXmF58+DjKlMZpTlK2zN3YUXk1ylzU6LfXyysOG/PISAfwm27FUgy3IfdgMyQggQ4MydLzdlzefX0mPRyixBviRcFhRTC+K7nK9lkATt3wJpj91YAZm104ZYkcd5KmsU2JX39vxN0A0lX53NjIenzs3wYPaPYLdjKIe+nqG9fHlL7kALyi7Htpy91ZgnQJ1s9saJRkKfWXvmLYIo5db69zU9dGCeJzUc0ca154O+KYYP7QTebJxqZNQsC8EH6sVMQU3W0qYKjuN8fUm1fRzyw/kKFacQ==:VfQd2ZbUt3S0RZ2ciywEWYVBbTTZOTiy' + password: 'ENCRYPTED_VALUE:Hw/uqRMRNHmxXYYDJCyaQX6xcwUnVYQnH4fWIlKZL...' } ``` diff --git a/sources/platform/actors/development/actor_definition/input_schema/specification.md b/sources/platform/actors/development/actor_definition/input_schema/specification.md index 6e51c95a1..cd137101c 100644 --- a/sources/platform/actors/development/actor_definition/input_schema/specification.md +++ b/sources/platform/actors/development/actor_definition/input_schema/specification.md @@ -11,11 +11,18 @@ sidebar_label: Input schema specification --- -The Actor input schema serves three main purposes: +Actor input schema is a JSON file which defines the schema and description of the input object and its properties accepted by the +Actor on start. The file adheres to [JSON schema](https://json-schema.org/) with our extensions, +and describes a single Actor input object +and its properties, including documentation, default value, and user interface definition. + +The Actor input schema file is used to: + +- Validate the passed input JSON object on Actor run, so that Actors don't need to perform input validation and error handling in their code. +- Render user interface for Actors to make it easy for users to run and test them manually. +- Generate Actor API documentation and integration code examples on the web or in CLI, making Actors easy to integrate for users. +- Simplify integration of Actors into automation workflows such as Zapier or Make, by providing smart connectors that smartly pre-populate and link Actor input properties. -- It ensures the input data supplied to the Actor adhere to specified requirements and validation rules. -- It is used by the Apify platform to generate a user-friendly interface for configuring and running your Actor. -- It simplifies invoking your Actors from external systems by generating calling code and connectors for integrations. To define an input schema for an Actor, set `input` field in the `.actor/actor.json` file to an input schema object (described below), or path to a JSON file containing the input schema object. For backwards compatibility, if the `input` field is omitted, the system looks for an `INPUT_SCHEMA.json` file either in the `.actor` directory or the Actor's top-level directory—but note that this functionality is deprecated and might be removed in the future. The maximum allowed size for the input schema file is 500 kB. @@ -114,7 +121,7 @@ Even though the structure of the Actor input schema is similar to JSON schema, t ::: -## Fields +## Input fields Each field of your input is described under its key in the `inputSchema.properties` object. The field might have `integer`, `string`, `array`, `object`, or `boolean` type, and its specification contains the following properties: @@ -142,33 +149,38 @@ Here is a rule of thumb for whether an input field should have a `prefill`, `def In summary, you can use each option independently or use a combination of **Prefill + Required** or **Prefill + Default**, but the combination of **Default + Required** doesn't make sense to use. -## Additional properties +## Input types Most types also support additional properties defining, for example, the UI input editor. ### String -#### Code input +String is the most common input field type, and provide +a number of editors and validations properties: -Example of a code input: +| Property | Value | Required | Description | +|----------|--------|-----------|| +| `editor` | One of:
- `textfield`
- `textarea`
- `javascript`
- `python`
- `select`
- `datepicker`
- `fileupload`
- `hidden` | Yes | Visual editor used for the input field. | +| `pattern` | String | No | Regular expression that will be used to validate the input. If validation fails, the Actor will not run. | +| `minLength` | Integer | No | Minimum length of the string. | +| `maxLength` | Integer | No | Maximum length of the string. | +| `enum` | [String] | Required if `editor` is `select` | Using this field, you can limit values to the given array of strings. Input will be displayed as select box. | +| `enumTitles` | [String] | No | Titles for the `enum` keys described. | +| `nullable` | Boolean | No | Specifies whether `null` is an allowed value. | +| `isSecret` | Boolean | No | Specifies whether the input field will be stored encrypted. Only available with `textfield`, `textarea` and `hidden` editors. | +| `dateType` | One of | No | This property, which is only available with `datepicker` editor, specifies what date format should visual editor accept (The JSON editor accepts any string without validation.).


Defaults to `absolute`. | -```json -{ - "title": "Page function", - "type": "string", - "description": "Function executed for each request", - "editor": "javascript", - "prefill": "async () => { return $('title').text(); }" -} -``` +:::note Regex escape -Rendered input: +When using escape characters `\` for the regular expression in the `pattern` field, be sure to escape them to avoid invalid JSON issues. For example, the regular expression +`https:\/\/(www\.)?apify\.com\/.+` would become `https:\\/\\/(www\\.)?apify\\.com\\/.+`. -![Apify Actor input schema page function](./images/input-schema-page-function.png) +::: -#### Country selection +#### Select -Example of country selection using a select input: +Enables you to provide a list of predefined values for the string, including display titles. +Here's an example of `countryCode` input field with a country selection: ```json { @@ -182,11 +194,34 @@ Example of country selection using a select input: } ``` -Rendered input: +The `select` editor is rendered as drop-down in user interface: ![Apify Actor input schema - country input](./images/input-schema-country.png) -#### `datepicker` editor + +#### Code editor + +If the input string is code, you can use either `javascript` or `python` editor +for syntax highlighting. + +For example: + +```json +{ + "title": "Page function", + "type": "string", + "description": "Function executed for each request", + "editor": "javascript", + "prefill": "async () => { return $('title').text(); }" +} +``` + +Rendered input: + +![Apify Actor input schema page function](./images/input-schema-page-function.png) + + +#### Date picker Example of date selection using absolute and relative `datepicker` editor: @@ -230,48 +265,29 @@ The `anyDate` property renders a date picker that accepts both absolute and rela ![Apify Actor input schema - country input](./images/input-schema-date-both.png) -#### `fileupload` editor - -The `fileupload` editor enables users to specify a file as input. The input is passed to the Actor as a string. It is the Actor author's responsibility to interpret this string, including validating its existence and format. - -The editor makes it easier to users to upload the file to a key-value store of their choice. - -![Apify Actor input schema - fileupload input](./images/input-schema-fileupload-input.png) +#### Advanced date and time handling -The user provides either a URL or uploads the file to a key-value store (existing or new). +While the `datepicker` editor doesn't support setting time values visually, you can allow users to handle more complex datetime formats and pass them via JSON. The following regex allows users to optionally extend the date with full ISO datetime format or pass `hours` and `minutes` as a relative date: -![Apify Actor input schema - fileupload input options](./images/input-schema-fileupload-modal.png) +`"pattern": "^(\\d{4})-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01])(T[0-2]\\d:[0-5]\\d(:[0-5]\\d)?(\\.\\d+)?Z?)?$|^(\\d+)\\s*(minute|hour|day|week|month|year)s?$"` -Properties: +When implementing time-based fields, make sure to explain to your users through the description that the time values should be provided in UTC. This helps prevent timezone-related issues. -| Property | Value | Required | Description | -|----------|--------|-----------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `editor` | One of:
- `textfield`
- `textarea`
- `javascript`
- `python`
- `select`
- `datepicker`
- `fileupload`
- `hidden` | Yes | Visual editor used for the input field. | -| `pattern` | String | No | Regular expression that will be used to validate the input. If validation fails, the Actor will not run. | -| `minLength` | Integer | No | Minimum length of the string. | -| `maxLength` | Integer | No | Maximum length of the string. | -| `enum` | [String] | Required if `editor` is `select` | Using this field, you can limit values to the given array of strings. Input will be displayed as select box. | -| `enumTitles` | [String] | No | Titles for the `enum` keys described. | -| `nullable` | Boolean | No | Specifies whether `null` is an allowed value. | -| `isSecret` | Boolean | No | Specifies whether the input field will be stored encrypted. Only available with `textfield`, `textarea` and `hidden` editors. | -| `dateType` | One of | No | This property, which is only available with `datepicker` editor, specifies what date format should visual editor accept (The JSON editor accepts any string without validation.).


Defaults to `absolute`. | -:::note Regex escape +#### File upload -When using escape characters `\` for the regular expression in the `pattern` field, be sure to escape them to avoid invalid JSON issues. For example, the regular expression -`https:\/\/(www\.)?apify\.com\/.+` would become `https:\\/\\/(www\\.)?apify\\.com\\/.+`. +The `fileupload` editor enables users to specify a file as input. The input is passed to the Actor as a string. It is the Actor author's responsibility to interpret this string, including validating its existence and format. -::: +The editor makes it easier to users to upload the file to a key-value store of their choice. -#### Advanced date and time handling +![Apify Actor input schema - fileupload input](./images/input-schema-fileupload-input.png) -While the `datepicker` editor doesn't support setting time values visually, you can allow users to handle more complex datetime formats and pass them via JSON. The following regex allows users to optionally extend the date with full ISO datetime format or pass `hours` and `minutes` as a relative date: +The user provides either a URL or uploads the file to a key-value store (existing or new). -`"pattern": "^(\\d{4})-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01])(T[0-2]\\d:[0-5]\\d(:[0-5]\\d)?(\\.\\d+)?Z?)?$|^(\\d+)\\s*(minute|hour|day|week|month|year)s?$"` +![Apify Actor input schema - fileupload input options](./images/input-schema-fileupload-modal.png) -When implementing time-based fields, make sure to explain to your users through the description that the time values should be provided in UTC. This helps prevent timezone-related issues. -### Boolean +### Boolean type Example options with group caption: @@ -343,7 +359,7 @@ Properties: | `unit` | String | No | Unit displayed next to the field in UI,
for example _second_, _MB_, etc. | | `nullable` | Boolean | No | Specifies whether null is an allowed value. | -### Object +### Object type Example of proxy configuration: diff --git a/sources/platform/actors/running/index.md b/sources/platform/actors/running/index.md index 56541d822..88be596a3 100644 --- a/sources/platform/actors/running/index.md +++ b/sources/platform/actors/running/index.md @@ -118,4 +118,4 @@ print(dataset_items) The newly started Actor runs under the account associated with the provided `token`, and therefore all resources consumed are charged to this user account. -Internally, the `call()` function invokes the [Run Actor](/api/v2/#/reference/actors/run-collection/run-actor) API endpoint, waits for the Actor to finish, and reads its output using the [Get items](/api/v2/#/reference/datasets/item-collection/get-items) API endpoint. +Internally, the `call()` function invokes the [Run Actor](/api/v2/#/reference/actors/run-collection/run-actor) API endpoint, waits for the Actor to finish, and reads its output using the [Get dataset items](/api/v2/#/reference/datasets/item-collection/get-items) API endpoint. diff --git a/sources/platform/index.mdx b/sources/platform/index.mdx index 50204a168..2886d8dc8 100644 --- a/sources/platform/index.mdx +++ b/sources/platform/index.mdx @@ -10,9 +10,10 @@ import Card from "@site/src/components/Card"; import CardGrid from "@site/src/components/CardGrid"; import homepageContent from "./homepage_content.json"; -> **Apify** is a cloud platform that helps you build reliable web scrapers, fast, and automate anything you can do manually in a web browser. + +> **Apify** is a cloud platform and marketplace of tools for web data extraction and automation. > -> **Actors** are serverless cloud programs running on the Apify platform that can easily crawl websites with millions of pages, but also perform arbitrary computing jobs such as sending emails or data transformations. They can be started manually, using our API or scheduler, and they can be easily integrated with other apps. +> **Actors** are serverless programs that run in the cloud. They can perform anything from simple actions such as filling out a web form or sending an email, to complex operations such as crawling a website with a million pages, or removing duplicates from a large dataset. Actors can persist their state and be restarted, and thus they can run as short or as long as necessary, from seconds to hours, even infinitely. ## Getting started diff --git a/sources/platform/integrations/ai/mcp.md b/sources/platform/integrations/ai/mcp.md index 03a84d6b5..f3751c752 100644 --- a/sources/platform/integrations/ai/mcp.md +++ b/sources/platform/integrations/ai/mcp.md @@ -12,7 +12,8 @@ toc_max_heading_level: 4 import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; -The _Apify Model Context Protocol (MCP) Server_ enables AI applications to connect to Apify's extensive library of Actors. Tools allowing your AI agents to perform web scraping, data extraction, and automation tasks in real time. +The Apify's [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) server allows AI applications and agents to search and run Actors from [Apify Store](https://apify.com/store) as tools for web scraping, data extraction, or automation, +as well as access Apify documentation and tutorials from your AI coding environments. ![Apify MCP Server](../../images/apify_mcp_server.png) diff --git a/sources/platform/quick-start/build_with_ai.md b/sources/platform/quick-start/build_with_ai.md index 8e0a6b48b..d73849125 100644 --- a/sources/platform/quick-start/build_with_ai.md +++ b/sources/platform/quick-start/build_with_ai.md @@ -1,12 +1,13 @@ --- title: Build Actors with AI sidebar_position: 3 -description: Use pre-built prompts, refer to Apify docs via llms.txt, and follow best practices for effective vibe coding. +sidebar_label: Build with AI +description: Learn how to build new Actors or improving existing ones using AI code generation and vibe coding tools. slug: /actors/development/quick-start/build-with-ai toc_max_heading_level: 4 --- -**Use pre-built prompts, reference Apify documentation through `/llms.txt`, and follow best practices to build Actors efficiently with AI coding assistants.** +**Learn how to develop new Actors or improve existing ones using AI code generation and vibe coding tools.** --- @@ -15,7 +16,9 @@ import PromptButton from "@site/src/components/PromptButton"; import InstallMCPButton from "@site/src/components/InstallMCPButton"; import copyForAI from "./images/copy-for-ai.png"; -You will learn several approaches to building Apify Actors with the help of AI coding assistants. This guide includes independent instructions, tools, and best practices that you can use individually or combine together. Each section focuses on a specific part of the process such as prompt usage, Actor templates, Apify MCP server tools, or documentation integration, so you can follow only the parts that fit your development style. +This guide provides best practices for building new Actors or improving existing ones using AI code generation +and vibe coding tools such as Cursor, Claude Code, or Visual Studio Code, +by providing the AI agents with the right instructions and context. ## AI coding assistant instructions @@ -66,30 +69,17 @@ Every page in the Apify documentation has a **Copy for LLM** button. You can use Copy for LLM -## Use `llms.txt` and `llms-full.txt` - -Search engines weren't built for Large Language Models (LLMs), but LLMs need context. That's why we've created [`llms.txt`](https://docs.apify.com/llms.txt) and [`llms-full.txt`](https://docs.apify.com/llms-full.txt) for our documentation. These files can provide additional context if you link them. - - - - - - - - - - - - - - - - - - -
FilePurpose
llms.txtContains index of the docs page in Markdown, with links to all subpages in Markdown.
- llms-full.txt - Contains a full dump of documentation in Markdown.
+## Use `/llms.txt` files + +The entire Apify documentation is available in Markdown format to make it easy to +digest by LLMs and AI coding tools. There are two special files: + +- **https://docs.apify.com/llms.txt**: A Markdown file with an index to all documentation pages in Markdown format. This is based on the [llmstxt.org](https://llmstxt.org/) standard. +- **https://docs.apify.com/llms-full.txt**: A single Markdown file with a complete dump of the entire Apify documentation. + +Note that for each Apify documentation page, you can get the Markdown version by adding `.md` to the URL. For example: + +https://docs.apify.com/platform/actors => https://docs.apify.com/platform/actors.md :::note Provide link to AI assistants diff --git a/sources/platform/quick-start/index.mdx b/sources/platform/quick-start/index.mdx index d5fb70b65..35f0b4dbd 100644 --- a/sources/platform/quick-start/index.mdx +++ b/sources/platform/quick-start/index.mdx @@ -1,5 +1,6 @@ --- -title: Quick start +title: Actor development quick start +sidebar_label: Quick start sidebar_position: 0.5 description: Create your first Actor using the Apify Web IDE or locally in your IDE. slug: /actors/development/quick-start diff --git a/sources/platform/quick-start/start_locally.md b/sources/platform/quick-start/start_locally.md index a73f4ca74..aaed7069e 100644 --- a/sources/platform/quick-start/start_locally.md +++ b/sources/platform/quick-start/start_locally.md @@ -1,5 +1,6 @@ --- -title: Local development +title: Local Actor development +sidebar_label: Local development sidebar_position: 1 description: Create your first Actor locally on your machine, deploy it to the Apify platform, and run it in the cloud. slug: /actors/development/quick-start/locally diff --git a/src/pages/index.tsx b/src/pages/index.tsx index ffb16bf36..12c6cf80d 100644 --- a/src/pages/index.tsx +++ b/src/pages/index.tsx @@ -265,8 +265,8 @@ export default function Home() {