Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion pages/data-lab/concepts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,15 @@ Lighter is a technology that enables SparkMagic commands to be readable and exec

A notebook for an Apache Spark cluster is an interactive, web-based tool that allows users to write and execute code, visualize data, and share results in a collaborative environment. It connects to an Apache Spark cluster to run large-scale data processing tasks directly from the notebook interface, making it easier to develop and test data workflows.

Adding a notebook to your cluster requires 1 GB of storage.

## Persistent volume

A Persistent Volume (PV) is a cluster-wide storage resource that ensures data persistence beyond the lifecycle of individual Pods. Persistent volumes abstract the underlying storage details, allowing administrators to use various storage solutions.

Apache Spark® executors require storage space for various operations, particularly to shuffle data during wide operations such as sorting, grouping, and aggregation. Wide operations are transformations that require data from different partitions to be combined, often resulting in data movement across the cluster. During the map phase, executors write data to shuffle storage, which is then read by reducers.

A PV sized properly ensures a smooth execution of your workload.
A persistent volume sized properly ensures a smooth execution of your workload.

## SparkMagic

Expand Down
32 changes: 32 additions & 0 deletions pages/data-lab/how-to/access-notebook.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: How to access and use the notebook of a Data Lab cluster
description: Step-by-step guide to access and use the notebook environment in a Data Lab for Apache Spark™ on Scaleway.
tags: data lab apache spark notebook environment jupyterlab
dates:
validation: 2025-12-04
posted: 2025-12-04
---

import Requirements from '@macros/iam/requirements.mdx'

This page explains how to access and use the notebook environment of your Data Lab for Apache Spark™ cluster.

<Requirements />

- A Scaleway account logged into the [console](https://console.scaleway.com)
- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
- Created a [Data Lab for Apache Spark™ cluster](/data-lab/how-to/create-data-lab/) with a notebook
- Created an [IAM API key](/iam/how-to/create-api-keys/)

## How to access the notebook of your cluster

1. Click **Data Lab** under **Data & Analytics** on the side menu. The Data Lab for Apache Spark™ page displays.

2. Click the name of the desired Data Lab cluster. The overview tab of the cluster displays.

3. Click the **Open notebook** button. A login page displays.

4. Enter the **secret key** of your API key, then click **Authenticate**. The notebook dashboard displays.



31 changes: 31 additions & 0 deletions pages/data-lab/how-to/access-spark-ui.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
title: How to Access the Apache Spark™ UI
description: Step-by-step guide to access and use the Apache Spark™ UI in a Data Lab for Apache Spark™ on Scaleway.
tags: data lab apache spark ui gui console
dates:
validation: 2025-12-04
posted: 2025-12-04
---

import Requirements from '@macros/iam/requirements.mdx'

This page explains how to Access the Apache Spark™ UI of your Data Lab for Apache Spark™ cluster.

<Requirements />

- A Scaleway account logged into the [console](https://console.scaleway.com)
- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
- Created a [Data Lab for Apache Spark™ cluster](/data-lab/how-to/create-data-lab/)
- Created an [IAM API key](/iam/how-to/create-api-keys/)

1. Click **Data Lab** under **Data & Analytics** on the side menu. The Data Lab for Apache Spark™ page displays.

2. Click the name of the desired Data Lab cluster. The overview tab of the cluster displays.

3. Click the **Open Apache Spark™ UI** button. A login page displays.

4. Enter the **secret key** of your API key, then click **Authenticate**. The Apache Spark™ UI dashboard displays.

From this view, you can view and monitor worker nodes, executors and applications.

Refer to the [official Apache Spark™ documentation](https://spark.apache.org/docs/latest/web-ui.html) for comprehensive information on how to use the web UI.
8 changes: 5 additions & 3 deletions pages/data-lab/how-to/create-data-lab.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ Data Lab for Apache Spark™ is a product designed to assist data scientists and

2. Click **Create Data Lab cluster**. The creation wizard displays.

3. Complete the following steps in the wizard:
- Choose an Apache Spark version from the drop-down menu.
- Select a worker node configuration.
3. Choose an Apache Spark version from the drop-down menu.

4. Choose a main node type. If you plan to add a notebook to your cluster, select the **DDL-PLAY2-MICRO** configuration to provision sufficient resources for it.

5. Select a worker node configuration.
- Enter the desired number of worker nodes.
<Message type="note">
Provisioning zero worker nodes lets you retain and access you cluster and notebook configurations, but will not allow you to run calculations.
Expand Down
41 changes: 41 additions & 0 deletions pages/data-lab/how-to/use-private-networks.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: How to use Private Networks with your Data Lab cluster
description: This page explains how to use Private Networks with Scaleway Data Lab for Apache Spark™
tags: private-networks private networks data lab spark apache cluster vpc
dates:
validation: 2025-06-25
posted: 2021-06-25
---
import Requirements from '@macros/iam/requirements.mdx'


[Private Networks](/vpc/concepts/#private-networks) allow your Data Lab for Apache Spark™ cluster to communicate in an isolated and secure network without needing to be connected to the public internet.

For full information about Scaleway Private Networks and VPC, see our [dedicated documentation](/vpc/) and [best practices guide](/vpc/reference-content/getting-most-private-networks/).

<Requirements />

- A Scaleway account logged into the [console](https://console.scaleway.com)
- [Owner](/iam/concepts/#owner) status or [IAM permissions](/iam/concepts/#permission) allowing you to perform actions in the intended Organization
- [Created a Private Network](/vpc/how-to/create-private-network/)


## How to create a Private Network

This action must be carried out from the Private Networks section of the console. Follow the procedure detailed in our [dedicated Private Networks documentation](/vpc/how-to/create-private-network/).

## How to use

## How to attach and detach a cluster to a Private Network

At the moment, Data Lab clusters can only be attached to a Private Network during their creation, and cannot be detached and reattached to another Private Network afterward.

Refer to the [dedicated documentation](/data-lab/how-to/create-data-lab/) for comprehensive information on how to create a Data Lab for Apache Spark™ cluster.

## How to delete a Private Network

<Message type="note">
Before deleting a Private Network, you must [detach](/vpc/how-to/attach-resources-to-pn/#how-to-detach-a-resource-from-a-private-network) all resources attached to it.
</Message>

This must be carried out from the Private Networks section of the console. Follow the procedure detailed in our [dedicated Private Networks documentation](/vpc/how-to/delete-private-network/).
Loading