-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Issue
Currently there is no Rope data structure implemented in the .Net library. It has been suggested that one be created to make appending, prepending, and insertion of large strings more efficient. However, there has been a second suggestion that possibly the already-existing StringBuilder data structure is sufficiently efficient to handle these types of large string processing. Finally, a third class, ImmutableList, is available from Microsoft and might also be an already-existing candidate.
Overview of considerations and recommendations are below; a more detailed document with the results of comparative testing of rope and non-rope functions is attached.
Considerations
Rope structures appear to be by far more efficient in multiple situations when manipulating very large strings. In various commentaries, manipulation of such large strings has been defined as an ‘edge’ case, which, several years ago was probably an adequate description, and even up until recently might have been adequately re-defined as a ‘specialty’ case.
More recently, Gartner has suggested that by 2020 the number of connected devices may reach 25 billion; this explosion in the IoT has the potential to push greater numbers of programmers to scenarios requiring pre-processing of big data at the edge of the network, on remote devices, as well as increased pre-processing in the cloud – and some of these scenarios will require intensive processing of large strings.
Some of this processing could commonly require, e.g.,:
- Collection of very large set(s) of strings followed by some or all of the following:
- String search - for keywords and/or phrases
- String split - subsetting of strings based on various inputs
- String categorization - tagging of strings and substrings based on various inputs
- String bucketization – grouping and/or concatenation of strings and substrings through appends/inserts
- String packaging – sorting and formatting for delivery based on various output requirements
There are earlier investigations of developing a rope structure for the .Net library, as well as attempts to develop rope structures in some MS languages. For example, conversations in 2009 questioned the value of adding a rope structure to the library, and a later blog entry in 2011 by Eric Lippert (then with MSFT) stating he had worked on rewriting VBScript internal string manipulations to use ropes and that the code base probably still existed, commented out, as “Fancy_String”, and so I will be investigating this.
Recommendation
I suggest that a rope data structure be developed for the library which would, at start, contain methods that allow the following types of manipulations:
- Add different string object types to the end of a rope object.
- Insert different string object types at start or anywhere within body of a rope.
- Remove all or portions of a string from a rope object.
- Split a string with overloads which allow for various scenarios.
- Order a string or subset of a string with overloads which allow for various scenarios.
- Count the number of strings and characters in a rope.