Refactor #1

Tendo33 · 2025-10-20T06:50:44Z

Refactor output and reader for optimized performance

Summary by CodeRabbit

New Features
- Real-time progress indicator for split operations.
Performance Improvements
- Faster, I/O-minimizing reads for file heads/tails.
- Reduced memory use via direct columnar processing and leaner table rendering with improved display.
Bug Fixes
- Input validation for head/tail requests (handles zero and negative values).
Tests
- Added tests for head/tail edge cases.
Chores
- Removed pandas dependency.

…ependency in output formatting and enhance row reading efficiency in reader. Update version to 0.1.0 in project files.

…r to support progress callbacks for improved user experience during file operations.

…ns for improved table presentation in the console.

gemini-code-assist · 2025-10-20T06:51:02Z

Summary of Changes

Hello @Tendo33, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant performance and memory optimizations across the parq library, particularly in how it handles data reading and output. By removing the pandas dependency for table display and implementing smarter, partial reads for large Parquet files, the changes aim to make the tool more efficient and responsive. Additionally, a progress bar has been added to the file splitting functionality to enhance user experience during long-running tasks.

Highlights

Progress Bar Integration: The split command now features a visual progress bar using rich.progress, providing real-time feedback during file splitting operations.
Pandas Dependency Removal: Eliminated the pandas dependency by refactoring print_table to directly handle PyArrow tables, leading to reduced memory usage and improved performance for data display.
Optimized Partial Reads: Enhanced read_head and read_tail methods to intelligently read only necessary row groups from large Parquet files, drastically cutting down on I/O and memory consumption.
Improved File Splitting Performance: The split_file method now supports a progress callback and processes data in batches, contributing to overall performance and user experience.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2025-10-20T06:51:09Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Rich progress UI to the CLI split flow, a progress_callback parameter to ParquetReader.split_file, optimizes read_head/read_tail to read only needed row groups, replaces pandas-based table rendering with direct PyArrow access, and removes pandas from dependencies.

Changes

Cohort / File(s)	Change Summary
CLI progress integration `parq/cli.py`	Adds a Rich `Progress` context with Spinner/Text/Bar/TimeRemaining columns; creates a progress task and `update_progress` callback; passes `progress_callback` to `ParquetReader.split_file`; moves console/progress imports into function scope and wraps the split call in try/except.
Reader: progress + read optimizations `parq/reader.py`	Adds `FAST_PATH_ROWS_MULTIPLIER`; updates `split_file` signature to accept `progress_callback` and reports `rows_processed` during batching; introduces input validation and optimized logic in `read_head` and `read_tail` to handle `n==0`, negative `n`, and to read only necessary row groups for large files.
Output: PyArrow rendering `parq/output.py`	Removes pandas conversion and `iterrows`; builds display rows directly from PyArrow `arrow_table.column_names` and per-column `to_pylist()` data; configures Rich table padding and `show_lines=True`.
Dependency removal `pyproject.toml`	Removes `pandas>=2.0.0` from project dependencies.
Tests: head/tail edge cases & PyArrow asserts `tests/test_reader.py`	Adds `test_read_head_edge_cases` and `test_read_tail_edge_cases` for `n==0` and negative `n`; updates `test_read_tail` assertions to use PyArrow `.to_pylist()`/column access instead of pandas conversions.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant CLI as parq/cli.py
    participant Progress as Rich Progress
    participant Reader as ParquetReader

    User->>CLI: invoke split command
    CLI->>Progress: start Progress context & add task
    CLI->>Reader: call split_file(..., progress_callback)
    loop per batch
        Reader->>Reader: process batch
        Reader->>CLI: progress_callback(current, total)
        CLI->>Progress: update task progress
    end
    Reader-->>CLI: return split results
    CLI->>Progress: stop Progress context
    CLI-->>User: print results / exit

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I spun a tiny spinner and counted every row,
Pawing through PyArrow where column breezes blow,
Heads and tails trimmed neatly with smarter reads,
Progress ticks along as the splitter proceeds,
A rabbit hops off happy — lighter, faster, and aglow. ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title Check	❓ Inconclusive	The PR title "Refactor" is vague and generic, using a non-descriptive term that fails to convey meaningful information about the changeset. While the changes do constitute refactoring efforts, the title provides no specificity about what is being refactored or the nature of the improvements. A developer scanning commit history would not understand from "Refactor" alone that the PR includes performance optimizations to read operations, removal of pandas dependency, implementation of progress UI, and PyArrow integration improvements. The title needs to be more specific and descriptive to clearly communicate the primary changes.	Consider updating the title to be more specific and descriptive of the main changes. For example, "Optimize ParquetReader and output rendering with PyArrow integration" or "Refactor reader/output for performance with progress reporting" would better communicate the scope and intent of the changes. The title should highlight the key improvements such as I/O optimization, pandas removal, or the new progress UI feature to help teammates quickly understand the purpose of the PR.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request is a great step forward in optimizing the tool by removing the pandas dependency. The performance and memory usage improvements, especially in read_head, read_tail, and print_table, are significant. The addition of a progress bar for the split command is also a nice touch for user experience. My review includes a few suggestions to further enhance code style, maintainability, and performance in some areas. Overall, excellent work on this refactor!

gemini-code-assist · 2025-10-20T06:53:03Z

parq/cli.py

+        from rich.progress import (
+            BarColumn,
+            Progress,
+            SpinnerColumn,
+            TextColumn,
+            TimeRemainingColumn,
        )

+        from parq.output import console


For better code style and maintainability, it's recommended to move imports to the top of the file, as per PEP 8 guidelines.¹ While local imports can sometimes be useful for lazy loading or avoiding circular dependencies, in this case, rich is a core part of the CLI and moving these imports to the top would improve consistency and readability. I've suggested removing them here so you can add them at the top of the file.

Style Guide References

Footnotes

PEP 8: Imports are always put at the top of the file, just after any module comments and docstrings, and before global variables and constants. ↩

gemini-code-assist · 2025-10-20T06:53:03Z

parq/output.py

+        for row_idx in range(arrow_table.num_rows):
+            row_values = []
+            for col_name in arrow_table.column_names:
+                value = arrow_table[col_name][row_idx].as_py()
+                row_values.append(str(value))
+            table.add_row(*row_values)


This implementation iterates through the table cell by cell, which can be inefficient. A more performant and Pythonic approach is to convert columns to Python lists first and then iterate over the transposed data. This leverages PyArrow's optimized columnar operations.

Suggested change

for row_idx in range(arrow_table.num_rows):

row_values = []

for col_name in arrow_table.column_names:

value = arrow_table[col_name][row_idx].as_py()

row_values.append(str(value))

table.add_row(*row_values)

# Add rows by transposing columns for efficiency

column_data = [col.to_pylist() for col in arrow_table.columns]

for row in zip(*column_data):

# Convert all values in the row to string before adding

table.add_row(*[str(item) for item in row])

gemini-code-assist · 2025-10-20T06:53:03Z

parq/output.py

+            table.add_row(*row_values)

        console.print(table)
+        # {{END MODIFICATIONS}}


This comment appears to be a leftover from a code generation tool or a merge marker. It should be removed to keep the code clean.

gemini-code-assist · 2025-10-20T06:53:03Z

parq/reader.py

-        return table.slice(0, min(n, self.num_rows))
+        # Fast path: small files or single row group
+        # Avoids overhead of row group calculation
+        if self.num_rows <= n * 10 or self.num_row_groups == 1:


The number 10 here is a magic number, making the condition's intent less clear. Consider extracting it into a named constant at the module level (e.g., FAST_PATH_ROWS_MULTIPLIER = 10) with a comment explaining this heuristic. This would improve readability and maintainability. This same magic number is also used in the read_tail method on line 161.

gemini-code-assist · 2025-10-20T06:53:03Z

parq/reader.py

        output_pattern: str,
        file_count: Optional[int] = None,
        record_count: Optional[int] = None,
+        progress_callback: Optional[callable] = None,


The type hint callable is quite generic. For better type safety and clarity, you can use typing.Callable to specify the signature of the callback function. You will need to add from typing import Callable at the top of the file.

Suggested change

progress_callback: Optional[callable] = None,

progress_callback: Optional["Callable[[int, int], None]"] = None,

gemini-code-assist · 2025-10-20T06:53:03Z

tests/test_reader.py

-        assert df.iloc[1]["id"] == 5
+        # Verify it's the last rows (using PyArrow directly)
+        # Extract id column values
+        id_values = [table["id"][i].as_py() for i in range(len(table))]


This list comprehension can be simplified and made more efficient by using the .to_pylist() method, which is the idiomatic way to convert a PyArrow Array to a Python list.

Suggested change

id_values = [table["id"][i].as_py() for i in range(len(table))]

id_values = table["id"].to_pylist()

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

parq/reader.py (1)

262-313: Lazy‑open writers and ensure final progress hits 100%

Open each writer when its first rows are written; close as you advance. Also emit a final progress update.

-        # Create writers for each output file
-        writers = []
-        try:
-            for output_path in output_files:
-                # Create parent directories if needed
-                output_path.parent.mkdir(parents=True, exist_ok=True)
-
-                # Create writer with same schema and compression as source
-                writer = pq.ParquetWriter(
-                    output_path,
-                    self.schema,
-                    compression=self._get_compression_type(),
-                )
-                writers.append(writer)
-
-            # Read and distribute data in batches
-            current_file_idx = 0
-            current_file_rows = 0
-            rows_processed = 0
+        # Read and distribute data in batches
+        writers: dict[int, pq.ParquetWriter] = {}
+        try:
+            current_file_idx = 0
+            current_file_rows = 0
+            rows_processed = 0
@@
-                    if rows_to_write == batch_rows and batch_offset == 0:
+                    # Lazily open current writer
+                    if current_file_idx not in writers:
+                        out_path = output_files[current_file_idx]
+                        out_path.parent.mkdir(parents=True, exist_ok=True)
+                        writers[current_file_idx] = pq.ParquetWriter(
+                            out_path,
+                            self.schema,
+                            compression=self._get_compression_type(),
+                        )
+
+                    if rows_to_write == batch_rows and batch_offset == 0:
                         # Write entire batch
-                        writers[current_file_idx].write_batch(batch)
+                        writers[current_file_idx].write_batch(batch)
                     else:
                         # Write partial batch
                         slice_batch = batch.slice(batch_offset, rows_to_write)
-                        writers[current_file_idx].write_batch(slice_batch)
+                        writers[current_file_idx].write_batch(slice_batch)
@@
-                    if progress_callback:
-                        progress_callback(rows_processed, total_rows)
+                    if progress_callback:
+                        progress_callback(rows_processed, total_rows)
@@
-                        current_file_idx += 1
+                        # Close finished writer before moving on
+                        if current_file_idx in writers:
+                            writers[current_file_idx].close()
+                        current_file_idx += 1
                         current_file_rows = 0
+
+            # Ensure final 100% update
+            if progress_callback and rows_processed != total_rows:
+                progress_callback(total_rows, total_rows)
         finally:
-            # Always close writers
-            for writer in writers:
-                if writer:
-                    writer.close()
+            # Always close any open writers
+            for w in writers.values():
+                w.close()

This avoids empty files and reduces open file handles when num_files is large.

🧹 Nitpick comments (11)

tests/test_reader.py (1)
74-78: Prefer vectorized column extraction via to_pylist()

Current loop indexes each scalar. You can simplify and speed this up:
-        # Extract id column values
-        id_values = [table["id"][i].as_py() for i in range(len(table))]
-        assert id_values[0] == 4
-        assert id_values[1] == 5
+        # Extract id column values
+        id_values = table["id"].to_pylist()
+        assert id_values == [4, 5]
parq/output.py (4)
106-108: Docstring wording: not “zero‑copy”

Per‑cell .as_py() materializes Python objects; this isn’t zero‑copy. Suggest: “Avoids pandas conversion; reads directly from PyArrow for lower overhead.”

118-120: Table styling defaults

show_lines=True and extra padding look nice but can slow rendering for large previews and wrap on narrow terminals. Consider show_lines=False and rely on zebra row styles, or gate this via a flag.

122-134: Avoid per‑cell scalar access; batch to Python lists once

Indexing each scalar via arrow_table[col_name][row_idx].as_py() is O(rows×cols) Python dispatch. For small head/tail it’s fine; for wider tables it’s noticeably slower. Convert each column once, then zip rows:
-        # Add rows directly from PyArrow table (zero-copy access)
-        # This avoids the overhead of pandas conversion and iterrows()
-        for row_idx in range(arrow_table.num_rows):
-            row_values = []
-            for col_name in arrow_table.column_names:
-                value = arrow_table[col_name][row_idx].as_py()
-                row_values.append(str(value))
-            table.add_row(*row_values)
+        # Add rows: convert columns to Python lists once, then zip rows
+        columns_py = [arrow_table[name].to_pylist() for name in arrow_table.column_names]
+        for row in zip(*columns_py):
+            table.add_row(*(str(v) for v in row))
If you need custom formatting (e.g., bytes, timestamps, decimals), add a small formatter per type before str().

197-245: Split summary UX nits

“Rows per file” is a floor; you already prefix with “~”, good. Consider displaying “min–max rows per file” if cheap to compute.

Start file index at 1 for user-facing table.
-        table.add_column("#", style="cyan", no_wrap=True)
+        table.add_column("#", style="cyan", no_wrap=True)
...
-        for idx, file_path in enumerate(output_files):
+        for idx, file_path in enumerate(output_files, start=1):
             if file_path.exists():
                 size = OutputFormatter._format_file_size(file_path.stat().st_size)
-                table.add_row(str(idx), str(file_path), size)
+                table.add_row(str(idx), str(file_path), size)
parq/cli.py (2)

233-251: Progress UI: handle non‑TTY and interrupts gracefully

Fallback to plain text when not attached to a TTY to avoid escape sequences in logs/CI. Use console.is_terminal or honor env flags (TTY_COMPATIBLE / TTY_INTERACTIVE). Based on learnings.

Catch KeyboardInterrupt to close progress cleanly and emit a friendly message.

I can draft a small wrapper to choose between Progress vs. simple prints.

253-255: Nit: unused parameter in update_progress

total isn’t used; drop it for clarity or use it to compute rate/ETA if you render it.
parq/reader.py (4)
197-208: Type hint progress_callback precisely

Use typing.Callable for better editor/static‑type support:
-        progress_callback: Optional[callable] = None,
+        from typing import Callable, Optional
+        progress_callback: Optional[Callable[[int, int], None]] = None,
244-260: Eager file preparation can create empty outputs

When file_count > total_rows, this creates and closes empty Parquet files. Prefer lazy writer creation on first write, and only return paths that received data. If you must keep indices stable, at least avoid touching the filesystem for files that get 0 rows.

I can supply a lazy-writer refactor; see next comment for a concrete diff.

80-85: Metadata key naming

You store a single compression under "compression_types". Consider "compression" (singular) or a mapping per column if you plan to extend it.

336-339: Optional: preserve per‑column compression

If source columns use mixed compression, capture a mapping and pass it to ParquetWriter:
comp = {}
for i, field in enumerate(self.schema):
    comp[field.name] = self.metadata.row_group(0).column(i).compression
return comp or "SNAPPY"
And then call ParquetWriter(..., compression=comp). Falls back to a string if uniform.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea27b9e and 63fe036.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

parq/cli.py (1 hunks)
parq/output.py (1 hunks)
parq/reader.py (5 hunks)
pyproject.toml (0 hunks)
tests/test_reader.py (1 hunks)

💤 Files with no reviewable changes (1)

pyproject.toml

🧰 Additional context used

🧬 Code graph analysis (1)

parq/cli.py (1)

parq/reader.py (3)

ParquetReader (13-339)

num_rows (40-42)

split_file (192-322)

parq/reader.py

… edge cases; optimize row reading performance in OutputFormatter by utilizing columnar access.

… reader.py for cleaner code; streamline the codebase by eliminating unnecessary comments.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

parq/output.py (1)

126-135: Use zip to avoid per‑cell indexed lookups.

Zipping the column lists is simpler and typically faster than double indexing; also handle None cleanly.

-        # Transpose and iterate over rows
-        for row_idx in range(arrow_table.num_rows):
-            row_values = [
-                str(columns_data[col_idx][row_idx]) for col_idx in range(len(columns_data))
-            ]
-            table.add_row(*row_values)
+        # Transpose and iterate rows via zip for efficiency
+        for row in zip(*columns_data):
+            table.add_row(*[(str(v) if v is not None else "") for v in row])

parq/cli.py (1)

227-233: Move Rich/console imports to module top.

PEP 8: keep imports at top; Rich is core here and also used elsewhere. Improves readability and static analysis.

-        from rich.progress import (
-            BarColumn,
-            Progress,
-            SpinnerColumn,
-            TextColumn,
-            TimeRemainingColumn,
-        )
-        from parq.output import console
+        # (Move these to the top of the file with other imports)
+        # from rich.progress import BarColumn, Progress, SpinnerColumn, TextColumn, TimeRemainingColumn
+        # from parq.output import console

Also applies to: 235-235

parq/reader.py (1)

224-224: Tighten the callback type hint.

Use typing.Callable for clarity and tooling.

-from typing import List, Optional
+from typing import List, Optional, Callable
@@
-        progress_callback: Optional[callable] = None,
+        progress_callback: Optional["Callable[[int, int], None]"] = None,

🧹 Nitpick comments (5)

parq/output.py (1)
118-119: Avoid heavy UI/render and peak memory on large previews.

Always-on show_lines and full-column to_pylist can be expensive for big tables. Gate lines and consider chunking.
-        table = Table(
+        show_lines_arg = arrow_table.num_rows <= 200
+        table = Table(
             title=f"[bold blue]📄 {title}[/bold blue]",
             box=box.ROUNDED,
             show_header=True,
             header_style="bold magenta",
-            padding=(0, 1),
-            show_lines=True,
+            padding=(0, 1),
+            show_lines=show_lines_arg,
         )
Optional: when arrow_table.num_rows > 50_000, build rows from small batches (iter_batches) to cap peak memory; I can provide a patch if desired.

Also applies to: 126-129
parq/cli.py (2)
247-249: Use the provided total when updating.

Keeps the task accurate if the source adjusts totals.
-            def update_progress(current: int, total: int):
-                progress.update(task, completed=current)
+            def update_progress(current: int, total: int):
+                progress.update(task, completed=current, total=total)
237-245: Apply Progress hardening for CI/non‑TTY environments.

Enable transient completion and auto‑disable when not a terminal:
        with Progress(
            SpinnerColumn(),
            TextColumn("[bold blue]{task.description}"),
            BarColumn(),
            TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
            TimeRemainingColumn(),
            console=console,
+            transient=True,
+            disable=not console.is_terminal,
         ) as progress:
Console.is_terminal is a read-only boolean property that indicates whether the Console is writing to a terminal, making this approach valid for detecting and disabling progress in non-TTY environments. Rich 14.x adds TTY_COMPATIBLE and TTY_INTERACTIVE environment variables if you need to override auto-detection in CI pipelines.
parq/reader.py (2)
282-341: Optional: clean up partial outputs on failure.

On write errors, the current code closes writers but leaves partial files. Consider deleting any created files to avoid corrupt artifacts.

Sketch:
try:
    ...
except Exception:
    for p in output_files:
        try:
            if p.exists():
                p.unlink()
        except Exception:
            pass
    raise
finally:
    for w in writers:
        if w:
            w.close()
I can send a precise patch if you want this behavior.

343-353: Normalize compression value for consistency with PyArrow documentation examples.

PyArrow's ParquetWriter accepts the compression parameter case-insensitively, so the current code is functionally correct. However, documentation examples commonly use lowercase names like "snappy" and "gzip", and normalizing ensures consistency. Additionally, metadata.row_group(0).column(0).compression may return an enum object that requires conversion to a string.

The suggested refactor is sound practice:
-        if self.num_row_groups > 0:
-            compression = self.metadata.row_group(0).column(0).compression
-            return compression
-        return "SNAPPY"  # Default compression
+        if self.num_row_groups > 0:
+            comp = self.metadata.row_group(0).column(0).compression
+            # Enum -> name, else str; normalize to lowercase for consistency.
+            name = getattr(comp, "name", str(comp)).lower()
+            return name
+        return "snappy"  # Default

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 395996c and 2791414.

📒 Files selected for processing (3)

parq/cli.py (1 hunks)
parq/output.py (1 hunks)
parq/reader.py (6 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

parq/cli.py (1)

parq/reader.py (2)

num_rows (45-47)

split_file (219-341)

🔇 Additional comments (3)

parq/output.py (1)

106-108: Good move away from pandas conversion.

Direct PyArrow access reduces copies and speeds up preview.

parq/reader.py (2)

12-16: Nice: extracted FAST_PATH_ROWS_MULTIPLIER.

Clearer intent and avoids magic numbers.

136-149: Solid guards and optimized row‑group reads.

Input validation + zero‑row fast return + selective row‑group reads look good.

Also applies to: 179-186

Tendo33 added 3 commits October 20, 2025 09:09

Refactor output and reader for optimized performance; remove pandas d…

7d35b10

…ependency in output formatting and enhance row reading efficiency in reader. Update version to 0.1.0 in project files.

Add progress tracking to Parquet file splitting in CLI; enhance reade…

02c260a

…r to support progress callbacks for improved user experience during file operations.

Enhance output formatting by adding padding and line visibility optio…

63fe036

…ns for improved table presentation in the console.

gemini-code-assist bot reviewed Oct 20, 2025

View reviewed changes

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

parq/reader.py Show resolved Hide resolved

parq/reader.py Show resolved Hide resolved

Tendo33 added 2 commits October 20, 2025 15:56

Refactor ParquetReader methods to improve input validation and handle…

395996c

… edge cases; optimize row reading performance in OutputFormatter by utilizing columnar access.

Remove commented-out modification markers from cli.py, output.py, and…

2791414

… reader.py for cleaner code; streamline the codebase by eliminating unnecessary comments.

coderabbitai bot reviewed Oct 20, 2025

View reviewed changes

Tendo33 merged commit dd676ba into main Oct 25, 2025
17 checks passed

coderabbitai bot mentioned this pull request Oct 25, 2025

Optimize code for performance and load times #2

Merged

	progress_callback: Optional[callable] = None,
	progress_callback: Optional["Callable[[int, int], None]"] = None,

	id_values = [table["id"][i].as_py() for i in range(len(table))]
	id_values = table["id"].to_pylist()

Refactor #1

Refactor #1

Uh oh!

Conversation

Tendo33 commented Oct 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Oct 20, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Style Guide References

Footnotes

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Tendo33 commented Oct 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 20, 2025 •

edited

Loading