Improve CSV chunk ingestion reliability, add HMSDMS RA/Dec support, and include Smith Southern Sky compilation tutorial#16
Merged
Yong2Sheng merged 2 commits intodevelopfrom Mar 3, 2026
Conversation
…s; fix bug when counting number of lines and chunks for tqdm progress bar; make the code comptiable with RA Dec in hmddms format.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR contains three practical improvements for my standard star catalog ingestion pipeline:
tqdm(total=...)forpd.read_csv(..., chunksize=...)so the progress bar total stays consistent with the actual reader behavior whenheader,names, andskiprowsare configured in different combinations.hmsdmsformat when writing the HDF5 standard star table. When enabled, the function converts coordinates to degrees before computing HEALPixipixand the coarsebucketid.Motivation
skiprows. This worked for older catalogs where I usedheader=Noneandskiprows=1, but it became fragile when switching to catalogs with a proper header (header=0) or when combiningheader=0withnames=...to override column names.pd.read_csviterator causedtqdmto end early relative tototal, showing a red progress bar even though no exception was raised.hmsdms, so I needed a first class way to ingest those without adding ad hoc conversions outside the writer.Changes
1) Robust progress bar total estimation for chunked CSV reads
read_csv_kwargs(for example, accounting for the header line whenheader=0).total_chunks = ceil(n_data_lines / chunksize)and returnsNonewhen it cannot safely infer a total (for example, whenskiprowsis list like or callable). In that case I passtotal=Nonetotqdmto avoid incorrect totals.Example usage:
2) Add
ra_dec_hmsdmssupport inwrite_std_h5ra_dec_hmsdms.True, I convert the RA/Dec columns from(hourangle, deg)to degrees usingastropy.coordinates.SkyCoord, then proceed with HEALPix indexing as usual.Key logic:
3) Add tutorial notebook for Smith Southern Sky compilation
Testing
header=0)header=0, names=colnames)header=None, names=colnames, skiprows=1)write_std_h5on a catalog providing RA/Dec ashmsdmsand confirmed:ipixandbucketcolumns are computed as expectedNotes
bugfixbranch