Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/gradle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,13 @@ name: BoundingBox CI

on:
push:
branches: [main]
branches:
- main
- 'test_cases_with_boolean_param'
pull_request:
branches: [main]
branches:
- main
- 'test_cases_with_boolean_param'

jobs:
build:
Expand Down
150 changes: 73 additions & 77 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,90 +1,58 @@
# Bounding Box

## Requirements
This console app takes input from stdin with the following properties:
- Input is split into lines delimited by newline characters.
- Every line has the same length.
- Every line consists of an arbitrary sequence of hyphens ("-") and asterisks ("\*").
- The final line of input is terminated by a newline character.

Each character in the input will have coordinates defined by `(line number, character number)`, starting at the top and left. So the first character on the first line will have the coordinates `(1,1)`
and the fifth character on line 3 will have the coordinates `(3,5)`.

The program should find a box (or boxes) in the input with the following properties:
- The box must be defined by two pairs of coordinates corresponding to its top left and bottom right corners.
- It must be the **minimum bounding box** for some contiguous group of asterisks, with each asterisk in the
group being horizontally or vertically (but not diagonally) adjacent to each other. A single, detached asterisk
is considered to be a valid box.
The box should not _strictly_ bound the group, so the coordinates for the box in the following input
should be `(2,2)(3,3)` not `(1,1)(4,4)`.
```
----
-**-
-**-
----
```
- It should not overlap (i.e. share any characters with) any other minimum bounding boxes.
- Of all the non-overlapping, minimum bounding boxes in the input, _return the largest by area_.

If any boxes satisfying the conditions can be found in the input, the program should return an exit code
of 0 and, for each box, print a line to stdout with the two pairs of coordinates.

So, given the file “groups.txt” with the following content:
```
**-------***
-*--**--***-
-----***--**
-------***--
```
## Overview

Running this program manually:
```
> ./bounding-box < groups.txt
```
Outputs:
BoundingBox is a Java program that reads a 2D ASCII grid from standard input and detects
the largest or all non-overlapping bounding boxes enclosing contiguous regions of asterisks (*).
It is designed to handle large inputs efficiently and uses a Disjoint Set (Union-Find) data structure
to identify connected components.

Each bounding box is defined by the minimum and maximum x and y coordinates
(with 1-based indexing)that surround a connected group of * characters.

### Example Input

```txt
*--*
-**-
----
*--*
```
(1,1)(2,2)
```
### Output (largest non-overlapping box)

This is because the larger groups on the right of the input have overlapping bounding boxes,
so the returned coordinates bound the smaller group on the top left.
`(1,1)(2,4)`

---

## Design/Implementation
## Features

Chose to use [DSU (Disjoint Set Union) w/ Union Find](https://en.wikipedia.org/wiki/Disjoint_sets) and [Sweep Line](https://en.wikipedia.org/wiki/Sweep_line_algorithm) algorithms,
instead of, the [Depth-first Search (DFS)](https://en.wikipedia.org/wiki/Depth-first_search) due to:
• Detects all connected components of * characters using Union-Find.

1. `Efficiency in Grouping`
- DSU's near-constant time per operation _(O(a(N)))_ makes it highly efficient for grouping cells, especially in sparse grids where many cells are not asterisks.
- DFS, while _O(R⋅C)_, requires explicit traversal and can be slower due to recursive overhead or iterative queue management.
• Computes the minimum bounding box for each component.

2. `Modularity`
- DSU separates the grouping phase (union operations) from the bounds computation (updating min and max), making the code more modular and easier to maintain.
- DFS combines exploration and bounds tracking, which can make the code less clean and harder to modify.
• Filters for the largest non-overlapping bounding box.

3. `Dynamic Updates`
- DSU's merge operations for bounds (min and max) are efficient and straightforward, allowing easy tracking of bounding box coordinates.
- DFS requires tracking min/max coordinates during traversal, which adds complexity and may require additional data structures.
• Optionally returns all non-overlapping boxes in sorted order.

4. `Scalability`
- DSU scales well for large grids due to its amortized constant-time operations and lack of recursive overhead.
- DFS may face stack overflow for very large grids (in recursive implementations) or require careful management in iterative versions.
• Handles malformed input with a clear "Error" output.

5. `Code Simplicity`
- DSU's iterative nature and use of maps make it concise for this problem, especially with the merge method for bounds.
- DFS requires explicit traversal logic, which can be more verbose and error-prone when tracking additional properties like bounds.
• Efficient for large grids (10,000 × 10,000).

In conclusion, `BoundingBox` code efficiently solves the problem using DSU to group contiguous asterisks and a sweep line algorithm
to find non-overlapping bounding boxes. The time complexity is approximately _O(R⋅C + K^2)_
, and the space complexity is _O(R⋅C)_. DSU is preferred over DFS due to its efficiency, modularity, and
ease of tracking bounding box coordinates, making it a better fit for this problem's requirements. Sweep Line algorithm finds non-overlapping bounding boxes by processing boxes in order of their x-coordinates.
See [Requirements.md](Requirements.md) for more details.

---

## Technical Requirements
## Design/Implementation

• Uses a Union-Find (DisjointSet) to track connected components of *.

• Calculates bounding box coordinates during the find-union phase.

• Filters and compares boxes using area and coordinates.

---

## Machine / Target Environment

Local / target machine should have the following software installed:

Expand Down Expand Up @@ -112,21 +80,49 @@ Locate `BoundingBox/app/build/libs/bounding-box` and invoke the following comman

`./bounding-box < groups.txt` (the data input files are located inside `BoundingBox/app/src/test/resources`)

### Run via GitHub Actions and download the generated artifact file.
---

### Run via GitHub Actions and download the generated Artifact file.

1. Run via [GitHub Actions](https://github.com/unnsse/BoundingBox/actions) CI/CD

2. Once downloaded, unzip the artifact:
2. Once downloaded, unzip the artifact: `unzip bounding-box.zip`

```bash
unzip bounding-box.zip
```
---

3. Test data using `stdin`
## Usage

```bash
Usage: ./bounding-box < input.txt
```
./bounding-box < groups.txt
(1,1)(2,2)
To return all non-overlapping bounding boxes:
```java
new BoundingBox().largestNonOverlappingBox(lines, true);

```

Here's the link to the first successful [run](https://github.com/unnsse/BoundingBox/actions/runs/14742728397).
---

## Time and Space Complexity

|Operation | Time Complexity | Space Complexity |
| -------- | --------------- | ---------------- |
| Parsing and validation | O(n × m) | O(1) |
| Union-Find operations | O(α(n × m)) amortized | O(n × m) |
| Bounding box updates |O(k) | O(k) |
| Box overlap checks | O(k²) | O(k) |
| Final sorting & filtering | O(k log k) | O(k) |

Where:

• n = number of rows

• m = number of columns

• k = number of connected components (bounding boxes)

• α is the inverse Ackermann function (nearly constant in practice)

---

Feel free to fork and/or add to this!! :smile: :coffee:
88 changes: 88 additions & 0 deletions Requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Requirements

This console app takes input from stdin with the following properties:

- Input is split into lines delimited by newline characters.

- Every line has the same length.

- Every line consists of an arbitrary sequence of hyphens ("-") and asterisks ("\*").

- The final line of input is terminated by a newline character.

Each character in the input will have coordinates defined by `(line number, character number)`,
starting at the top and left. So the first character on the first line will have the coordinates `(1,1)`
and the fifth character on line 3 will have the coordinates `(3,5)`.

The program should find a box (or boxes) in the input with the following properties:

- The box must be defined by two pairs of coordinates corresponding to its top left and bottom right corners.

- It must be the **minimum bounding box** for some contiguous group of asterisks, with each asterisk in the
group being horizontally or vertically (but not diagonally) adjacent to each other. A single, detached asterisk
is considered to be a valid box. The box should not _strictly_ bound the group, so the coordinates for the box
in the following input should be `(2,2)(3,3)` not `(1,1)(4,4)`.
```txt
----
-**-
-**-
----
```

- It should not overlap (i.e. share any characters with) any other minimum bounding boxes.

- Of all the non-overlapping, minimum bounding boxes in the input, _return the largest by area_.

If any boxes satisfying the conditions can be found in the input, the program should return an exit code
of 0 and, for each box, print a line to stdout with the two pairs of coordinates.

So, given the file “groups.txt” with the following content:
```
**-------***
-*--**--***-
-----***--**
-------***--
```

Running this program manually:
```
> ./bounding-box < groups.txt
```
Outputs:

```txt
(1,1)(2,2)
```

This is because the larger groups on the right of the input have overlapping bounding boxes,
so the returned coordinates bound the smaller group on the top left.

---

## Input

• Grid of characters (* and - only).

• All lines must be of equal length.

• Input is read from standard input.

---

## Output

• Default: largest non-overlapping bounding box as `(x1,y1)(x2,y2)`.

• If `returnAllBoxes=true`: concatenated string of all non-overlapping boxes sorted by top-left coordinate.

• Returns "Error" if input is malformed.

---

## Error Handling

Returns "Error" if:

• Input contains invalid characters.

• Rows are not the same length.
Loading