Skip to content

Commit 7d357a0

Browse files
Moved spark_utils into own module to remove dependency on pyspark (#55)
1 parent 8d94434 commit 7d357a0

8 files changed

Lines changed: 164 additions & 64 deletions

File tree

README.md

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,12 @@ This provides a library of Python utility functions and classes, generally in th
44

55
## Sub-modules
66

7-
### `pyspark.utilities`
8-
9-
**⚠️ Note: This module requires the 'pyspark' extra to be installed: `corvus-python[pyspark]`**
10-
11-
Includes utility functions when working with PySpark to build data processing solutions. Primary API interfaces:
7+
### `spark_utils`
128

139
| Component Name | Object Type | Description | Import syntax |
1410
|-----------------------------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
15-
| <code>get_or_create_spark_session</code> | Function | Gets or creates a Spark Session, depending on the environment. Supports Synapse or a Local Spark Session configuration. | <code>from corvus_python.pyspark.utilities import get_or_create_spark_session</code> |
16-
| <code>get_spark_utils</code> | Function | Returns spark utility functions corresponding to current environment (local/Synapase) based on mssparkutils API. Useful for local development. <b>Note:</b> Config file required for local development - see [section below](#configuration). | <code>from corvus_python.pyspark.utilities import get_spark_utils</code> |
17-
| <code>null_safe_join</code> | Function | Joins two Spark DataFrames incorporating null-safe equality. | <code>from corvus_python.pyspark.utilities import null_safe_join</code> |
18-
| | | | |
11+
| <code>get_spark_utils</code> | Function | Returns spark utility functions corresponding to current environment (local/Synapse) based on mssparkutils API. Useful for local development. <b>Note:</b> Config file required for local development - see [section below](#configuration). | <code>from corvus_python.spark_utils import get_spark_utils</code> |
12+
1913

2014
#### `get_spark_utils()`
2115

@@ -60,6 +54,17 @@ Below shows the current, complete specification of the config file for the suppo
6054

6155
By default, a file in the root of the current working directory with file name `local-spark-utils-config.json` will be automatically discovered. If the file resides in a different location, and/or has a different file name, then the absolute path must be specified when calling `get_spark_utils()`.
6256

57+
58+
### `pyspark.utilities`
59+
60+
**⚠️ Note: This module requires the 'pyspark' extra to be installed: `corvus-python[pyspark]`**
61+
62+
Includes utility functions when working with PySpark to build data processing solutions. Primary API interfaces:
63+
64+
| Component Name | Object Type | Description | Import syntax |
65+
|-----------------------------------|-------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|
66+
| <code>get_or_create_spark_session</code> | Function | Gets or creates a Spark Session, depending on the environment. Supports Synapse or a Local Spark Session configuration. | <code>from corvus_python.pyspark.utilities import get_or_create_spark_session</code> |
67+
| <code>null_safe_join</code> | Function | Joins two Spark DataFrames incorporating null-safe equality. | <code>from corvus_python.pyspark.utilities import null_safe_join</code> |
6368
---
6469

6570
### `pyspark.synapse`
@@ -107,7 +112,7 @@ Includes utility functions when working with Synapse Analytics. Primary API inte
107112

108113
---
109114

110-
### Auth
115+
### `auth`
111116

112117
Includes utility functions when working with authentication libraries within Python. Primary API interfaces:
113118

@@ -116,7 +121,7 @@ Includes utility functions when working with authentication libraries within Pyt
116121
| <code>get_az_cli_token</code> | Function | Gets an Entra ID token from the Azure CLI for a specified resource (/audience) and tenant. Useful for local development. | <code>from corvus_python.auth import get_az_cli_token</code> |
117122
| | | | |
118123

119-
### SharePoint
124+
### `sharepoint`
120125

121126
Includes utility functions when working with SharePoint REST API. Primary API interfaces:
122127

@@ -126,7 +131,7 @@ Includes utility functions when working with SharePoint REST API. Primary API in
126131

127132
---
128133

129-
### Email
134+
### `email`
130135

131136
Includes utility classes and models for sending emails using Azure Communication Services (ACS). Primary API interfaces:
132137

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
"""
2+
Deprecated: This module has been moved to corvus_python.spark_utils.
3+
4+
This module provides backward compatibility aliases for the spark utility functions
5+
that have been moved from corvus_python.pyspark.utilities to corvus_python.spark_utils.
6+
New code should import directly from corvus_python.spark_utils.
7+
"""
8+
9+
import warnings
10+
11+
# Re-export everything from the new location
12+
from corvus_python.spark_utils import get_spark_utils # noqa F401
13+
14+
# Issue a deprecation warning when this module is imported
15+
warnings.warn(
16+
"get_spark_utils in corvus_python.pyspark.utilities is deprecated. "
17+
"Import from corvus_python.spark_utils instead.",
18+
DeprecationWarning,
19+
stacklevel=2,
20+
)
Lines changed: 15 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,19 @@
1-
"""Copyright (c) Endjin Limited. All rights reserved."""
1+
"""
2+
Deprecated: This module has been moved to corvus_python.spark_utils.
23
3-
import os
4-
import json
5-
from .local_spark_utils import LocalSparkUtils
4+
This module provides backward compatibility aliases for the spark utility functions
5+
that have been moved from corvus_python.pyspark.utilities to corvus_python.spark_utils.
6+
New code should import directly from corvus_python.spark_utils.
7+
"""
68

9+
import warnings
710

8-
def get_spark_utils(local_spark_utils_config_file_path: str = f"{os.getcwd()}/local-spark-utils-config.json"):
9-
"""Returns spark utility functions corresponding to the current environment.
11+
from corvus_python.spark_utils.spark_utils import * # noqa F401
1012

11-
Args:
12-
local_spark_utils_config_file_path (str): Path to the config used to instantiate the `LocalSparkUtils` class.
13-
Defaults to a file located in the root of the current working directory.
14-
15-
Returns:
16-
object: An instance of the spark utility functions.
17-
18-
Raises:
19-
FileNotFoundError: If the local-spark-utils-config.json file is not found at the specified path.
20-
"""
21-
if os.environ.get("MMLSPARK_PLATFORM_INFO") == "synapse":
22-
from notebookutils import mssparkutils
23-
return mssparkutils
24-
else:
25-
try:
26-
with open(local_spark_utils_config_file_path) as f:
27-
config = json.load(f)
28-
except FileNotFoundError:
29-
raise FileNotFoundError(
30-
f"""
31-
Could not find local-spark-utils-config.json at {local_spark_utils_config_file_path}.
32-
Please ensure a config file is at this location or pass in an absolute path to the file if it is located elsewhere.
33-
Please see `https://github.com/corvus-dotnet/Corvus.Python` for more information.
34-
""")
35-
36-
return LocalSparkUtils(config)
13+
# Issue a deprecation warning when this module is imported
14+
warnings.warn(
15+
"corvus_python.pyspark.utilities.spark_utils.spark_utils is deprecated. "
16+
"Import from corvus_python.spark_utils instead.",
17+
DeprecationWarning,
18+
stacklevel=2,
19+
)
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .spark_utils import get_spark_utils
2+
3+
__all__ = ["get_spark_utils"]

src/corvus_python/pyspark/utilities/spark_utils/local_spark_utils.py renamed to src/corvus_python/spark_utils/local_spark_utils.py

Lines changed: 73 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,60 @@
11
"""Copyright (c) Endjin Limited. All rights reserved."""
22

3+
from typing import Dict, TypedDict, Literal, Optional
34
from corvus_python.auth import get_az_cli_token
45

56

7+
class StaticSecretConfig(TypedDict):
8+
"""Configuration for a static secret."""
9+
10+
type: Literal["static"]
11+
value: str
12+
13+
14+
class LinkedServiceSecretsConfig(TypedDict, total=False):
15+
"""Configuration for secrets within a linked service.
16+
17+
Keys are secret names, values are secret configurations.
18+
"""
19+
20+
pass # This allows any string key with StaticSecretConfig values
21+
22+
23+
class GetSecretWithLSConfig(TypedDict, total=False):
24+
"""Configuration for getSecretWithLS method.
25+
26+
Keys are linked service names, values are their secret configurations.
27+
"""
28+
29+
pass # This allows any string key with LinkedServiceSecretsConfig values
30+
31+
32+
class GetTokenConfig(TypedDict):
33+
"""Configuration for getToken method."""
34+
35+
tenantId: str
36+
37+
38+
class CredentialsConfig(TypedDict):
39+
"""Configuration for credentials utilities."""
40+
41+
getSecretWithLS: Dict[str, Dict[str, StaticSecretConfig]]
42+
getToken: Optional[GetTokenConfig]
43+
44+
45+
class EnvConfig(TypedDict):
46+
"""Configuration for environment utilities."""
47+
48+
getWorkspaceName: str
49+
50+
51+
class LocalSparkUtilsConfig(TypedDict):
52+
"""Main configuration for LocalSparkUtils."""
53+
54+
credentials: CredentialsConfig
55+
env: EnvConfig
56+
57+
658
class LSRLinkedServiceFailure(Exception):
759
"""Exception raised when the Linked Service can't be found.
860
@@ -46,23 +98,23 @@ def __init__(self, secret_name: str):
4698
super().__init__(self.message)
4799

48100

49-
class LocalCredentialUtils():
101+
class LocalCredentialUtils:
50102
"""Class which mirrors elements of the mssparkutils.credentials API. Intentionally not a full representation -
51103
additional methods will be added to it as and when the need arises.
52104
53105
Attributes:
54-
config (dict): Dictionary representing configuration required for `credentials` API. See
106+
config (CredentialsConfig): Configuration required for `credentials` API. See
55107
https://github.com/corvus-dotnet/Corvus.Python/blob/main/README.md for details.
56108
"""
57109

58-
def __init__(self, config: dict):
110+
def __init__(self, config: CredentialsConfig):
59111
"""Constructor method
60112
61113
Args:
62-
config (dict): Dictionary representing configuration required for `credentials` API. See
114+
config (CredentialsConfig): Configuration required for `credentials` API. See
63115
https://github.com/corvus-dotnet/Corvus.Python/blob/main/README.md for details.
64116
"""
65-
self.config = config
117+
self.config: CredentialsConfig = config
66118

67119
def getSecretWithLS(self, linked_service: str, secret_name: str) -> str:
68120
lookup = self.config.get("getSecretWithLS")
@@ -81,8 +133,7 @@ def getSecretWithLS(self, linked_service: str, secret_name: str) -> str:
81133
case "static":
82134
return target_secret.get("value")
83135
case _:
84-
raise ValueError(
85-
f"Unknown secret type {target_secret.get('type')}")
136+
raise ValueError(f"Unknown secret type {target_secret.get('type')}")
86137

87138
def getToken(self, audience: str) -> str:
88139
scopes = {
@@ -107,49 +158,51 @@ def getToken(self, audience: str) -> str:
107158
if get_token_config:
108159
tenant_id = get_token_config.get("tenantId")
109160

110-
return get_az_cli_token(scope, tenant_id=tenant_id)
161+
return get_az_cli_token(scope, tenant_id=tenant_id) # type: ignore
111162

112163

113-
class LocalEnvUtils():
164+
class LocalEnvUtils:
114165
"""Class which mirrors elements of the mssparkutils.env API. Intentionally not a full representation - additional
115166
methods will be added to it as and when the need arises.
116167
117168
Attributes:
118-
config (dict): Dictionary representing configuration required for `env` API. See
169+
config (EnvConfig): Configuration required for `env` API. See
119170
https://github.com/corvus-dotnet/Corvus.Python/blob/main/README.md for details.
120171
121172
"""
122-
def __init__(self, config: dict):
173+
174+
def __init__(self, config: EnvConfig):
123175
"""Constructor method
124176
125177
Args:
126-
config (dict): Dictionary representing configuration required for `env` API. See
178+
config (EnvConfig): Configuration required for `env` API. See
127179
https://github.com/corvus-dotnet/Corvus.Python/blob/main/README.md for details.
128180
"""
129181

130-
self.config = config
182+
self.config: EnvConfig = config
131183

132184
def getWorkspaceName(self) -> str:
133185
return self.config.get("getWorkspaceName")
134186

135187

136-
class LocalSparkUtils():
188+
class LocalSparkUtils:
137189
"""Class which mirrors elements of the mssparkutils API. Intentionally not a full representation - additional
138190
sub-classes will be added to it as and when the need arises.
139191
140192
Attributes:
141-
config (dict): Dictionary representing full `LocalSparkUtils` configuration. See
193+
config (LocalSparkUtilsConfig): Full `LocalSparkUtils` configuration. See
142194
https://github.com/corvus-dotnet/Corvus.Python/blob/main/README.md for details.
143-
credentials (dict): LocalCredentialUtils instance.
144-
env (dict): LocalEnvUtils instance.
195+
credentials (LocalCredentialUtils): LocalCredentialUtils instance.
196+
env (LocalEnvUtils): LocalEnvUtils instance.
145197
"""
146-
def __init__(self, local_config: dict):
198+
199+
def __init__(self, local_config: LocalSparkUtilsConfig):
147200
"""Constructor method
148201
149202
Args:
150-
local_config (dict): Dictionary representing full `LocalSparkUtils` configuration. See
203+
local_config (LocalSparkUtilsConfig): Full `LocalSparkUtils` configuration. See
151204
https://github.com/corvus-dotnet/Corvus.Python/blob/main/README.md for details.
152205
"""
153-
self.config = local_config
206+
self.config: LocalSparkUtilsConfig = local_config
154207
self.credentials = LocalCredentialUtils(local_config.get("credentials"))
155208
self.env = LocalEnvUtils(local_config.get("env"))
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
"""Copyright (c) Endjin Limited. All rights reserved."""
2+
3+
import os
4+
import json
5+
from .local_spark_utils import LocalSparkUtils
6+
7+
8+
def get_spark_utils(local_spark_utils_config_file_path: str = f"{os.getcwd()}/local-spark-utils-config.json"):
9+
"""Returns spark utility functions corresponding to the current environment.
10+
11+
Args:
12+
local_spark_utils_config_file_path (str): Path to the config used to instantiate the `LocalSparkUtils` class.
13+
Defaults to a file located in the root of the current working directory.
14+
15+
Returns:
16+
object: An instance of the spark utility functions.
17+
18+
Raises:
19+
FileNotFoundError: If the local-spark-utils-config.json file is not found at the specified path.
20+
"""
21+
if os.environ.get("MMLSPARK_PLATFORM_INFO") == "synapse":
22+
from notebookutils import mssparkutils
23+
return mssparkutils
24+
else:
25+
try:
26+
with open(local_spark_utils_config_file_path) as f:
27+
config = json.load(f)
28+
except FileNotFoundError:
29+
raise FileNotFoundError(
30+
f"""
31+
Could not find local-spark-utils-config.json at {local_spark_utils_config_file_path}.
32+
Please ensure a config file is at this location or pass in an absolute path to the file if it is located elsewhere.
33+
Please see `https://github.com/corvus-dotnet/Corvus.Python` for more information.
34+
""")
35+
36+
return LocalSparkUtils(config)

tests/unit/test_email_errors.py

Whitespace-only changes.

tests/unit/test_email_integration.py

Whitespace-only changes.

0 commit comments

Comments
 (0)