Skip to content

Conversation

@mroth
Copy link
Owner

@mroth mroth commented Sep 2, 2014

Move the .char_to_unified and .unified_to_char fundamental conversions into their own module, and then precompile the results for all known Emoji character values. This makes these conversions effectively zero-cost for all known Emoji values. If nothing is matched, falls back to the actual conversion functions.

This adds a fair amount of code complexity, and because to precompile stuff you need any function it depends on to be in a different module, hence the nested Util modules here.

Speed comparison can be seen here, check out the Exmoji PR#1 column:
https://docs.google.com/spreadsheets/d/1T08I6dlyFNqqdtvQykNt43tdT85lJy4US2SqErcpZ7g/edit?usp=sharing

Most interesting benefit I see in the speed increases is the knock-on effects in making .scan/1 even faster, so the amount of text that could be scanned by a single node is even higher. That said, we were already pretty darn fast, so the code complexity (especially the potential to introduce ordering gotchas in compilation) might not be worth it here.

@mroth
Copy link
Owner Author

mroth commented Sep 2, 2014

Madness?
giphy

Although... even after spending a day working on this, I'm currently leaning towards the tradeoff in code complexity not being worth it until someone actually hits this as the performance bottleneck, and leaving this unmerged. My guess is even in a tight loop the I/O might be more of a bottleneck at this point?

Comments appreciated.

don’t actually need to define variants first for `char_to_unified`
because unlike `Scanner.bscan/1`, this is matching against an entire
binary rather than just head, so no need to worry about similar out of
order issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant