Skip to content

Make loading features from storage robust to order.#9

Open
edpizzi wants to merge 1 commit intomainfrom
vsc-storage-order
Open

Make loading features from storage robust to order.#9
edpizzi wants to merge 1 commit intomainfrom
vsc-storage-order

Conversation

@edpizzi
Copy link
Contributor

@edpizzi edpizzi commented Dec 19, 2022

The current load_features implementation relies on features from each video (same video_id) being in a contiguous block. This matches how store_features organizes feature files.

Update load_features to accept descriptors in any order by sorting by video_id (then by start timestamp) before constructing VideoFeature structures. Also change store_features to sort by video_id before storing features.

The current `load_features` implementation relies on features from each
video (same video_id) being in a contiguous block. This matches how
`store_features` organizes feature files.

Update `load_features` to accept descriptors in any order by sorting
by video_id (then by start timestamp) before constructing
`VideoFeature` structures. Also change `store_features` to sort by
video_id before storing features.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2022
restored = load_features(f.name)

features.sort(key=lambda x: x.video_id)
restored.sort(key=lambda x: x.video_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be testing that restored is already properly sorted when loading with load_features? I'm not sure we should sort it here.

@chrisjkuch
Copy link
Contributor

For the sake of completeness, we also tracked down the reason we believe the memory error was caused.

for video_id, start, end in same_value_ranges(video_ids):

In load_features, we iterate through same_value_ranges. For an unsorted array of video ids, this gives us a resulting array of VideoFeatures that is close to or exactly the same as the length of the array, rather than being the length of the number of videos.

num_candidates = int(AGGREGATED_CANDIDATES_PER_QUERY * len(query_features))

The resulting calculated number of query candidates to generate for a given input query descriptor is then more than an order of magnitude larger than we intend. When we exhaustively search for and return this number of candidates in our exponential iterator, we return increasingly large copies of matrices until we run out of memory.

@facebook-github-bot
Copy link

Hi @edpizzi!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants