Add tiered stats to request cache response #8
Add tiered stats to request cache response #8peteralfonsi wants to merge 10 commits intoframework-serializedfrom
Conversation
Fixing initialization issue that broke IT tests
…ing the domain to run
| private long evictions; | ||
| private long hitCount; | ||
| private long missCount; | ||
| private Map<String, StatsHolder> map; |
There was a problem hiding this comment.
Maybe initialize this map inline here. This way you don't need to worry about this not being intialized.
Like
map = new HashMap<>(){{
for(TierType tierType: TierType.values()) {
put(tierType.getStringValue(), new StatsHolder());
}
}};
| evictions = in.readVLong(); | ||
| hitCount = in.readVLong(); | ||
| missCount = in.readVLong(); | ||
| this(); |
There was a problem hiding this comment.
We are calling to initialize this map but looks error prone and wrong logically. As logically all variables inside this constructor should be initialized with StreamInput values. You can initialize the map inline as suggested above and we don't need this.
| this.missCount = missCount; | ||
| public RequestCacheStats(Map<TierType, StatsHolder> inputMap) { | ||
| // Create a RequestCacheStats with multiple tiers' statistics | ||
| this(); |
| import java.io.IOException; | ||
| import java.io.Serializable; | ||
|
|
||
| public class StatsHolder implements Serializable, Writeable, ToXContentFragment { |
There was a problem hiding this comment.
Considering this is specific to RequestCacheStats, better to move this inside RequestCacheStats itself.
There was a problem hiding this comment.
This is also used in ShardRequestCache - I moved it out from this class because I wanted it to be used by RequestCacheStats as well
There was a problem hiding this comment.
But would this StatsHolder be used elsewhere? As this looks pretty generic though will only be used for requestCache. Considering this is specific to RequestCache, maybe we can still keep it inside ShardRequestCache as public. And use it from RequestCacheStats, should be fine.
| // on a node with a maximum request cache size that we set. | ||
|
|
||
| @OpenSearchIntegTestCase.ClusterScope(scope = OpenSearchIntegTestCase.Scope.TEST, numDataNodes = 0) | ||
| public class IndicesRequestCacheDiskTierIT extends OpenSearchIntegTestCase { |
There was a problem hiding this comment.
We should ideally add these tests as part of IndicesRequestCacheIT. Lets check what is needed to do that.
|
|
||
| Settings.Builder builder = Settings.builder() | ||
| .put(IndicesRequestCache.INDEX_CACHE_REQUEST_ENABLED_SETTING.getKey(), true) | ||
| .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1) | ||
| .put(IndexMetadata.SETTING_NUMBER_OF_REPLICAS, 0); | ||
|
|
||
| assertAcked(client.admin().indices().prepareCreate("index").setMapping("k", "type=keyword").setSettings(builder).get()); |
There was a problem hiding this comment.
I guess this change is not required? Remove?
| assertSearchResponse(resp); | ||
| IndicesRequestCacheIT.assertCacheState(client, "index", 0, i + 1, TierType.ON_HEAP, false); | ||
| IndicesRequestCacheIT.assertCacheState(client, "index", 0, i + 1, TierType.DISK, false); | ||
| System.out.println("request number " + i); |
| System.out.println("request number " + i); | ||
| } | ||
|
|
||
| System.out.println("Num requests = " + numRequests); |
| public class IndicesRequestCacheDiskTierIT extends OpenSearchIntegTestCase { | ||
| public void testDiskTierStats() throws Exception { | ||
| int heapSizeBytes = 1800; // enough to fit 2 queries, as each is 687 B | ||
| int requestSize = 687; // each request is 687 B |
There was a problem hiding this comment.
How did we calculate this? I guess manually?
Possible to create a request and after which we can estimate the size? Doing this we can dynamically generate this value.
….java Signed-off-by: Peter Alfonsi <petealft@amazon.com>
| double getTimeEWMA = getTimeEWMAIfDisk(cachingTier); | ||
| if (value != null) { | ||
| tieredCacheEventListener.onHit(key, value, cachingTier.getTierType()); | ||
| tieredCacheEventListener.onHit(key, value, cachingTier.getTierType(), getTimeEWMA); | ||
| return new CacheValue<>(value, cachingTier.getTierType()); | ||
| } | ||
| tieredCacheEventListener.onMiss(key, cachingTier.getTierType()); | ||
| tieredCacheEventListener.onMiss(key, cachingTier.getTierType(), getTimeEWMA); |
There was a problem hiding this comment.
This doesn't seem right. We should ideally put these get times inside Disk caching tier itself. So that if we have a different implementation of TieredService, we don't have to duplicate this work.
There was a problem hiding this comment.
And as discussed having separate DiskCacheStats separately should be able to solve this.
| public interface TieredCacheEventListener<K, V> { | ||
|
|
||
| void onMiss(K key, TierType tierType); | ||
| void onMiss(K key, TierType tierType, double getTimeEWMA); |
There was a problem hiding this comment.
Adding getTimeEWMA here isn't needed considering it will only be needed as part of stats. We need to rethink in terms of low level design. For such specific stats related to a particular tier, we can instead create separate DiskTierStats associated with disk tier for example, keep accumulating relevant stats there in memory. This stats can be eventually used inside ShardRequestCacheStats to pull in those values.
As there might be more values coming in later on which we need to add as part of stats, so this solution isn't extensible as we can't keep adding it here.
There was a problem hiding this comment.
This makes sense. I just wasn't sure where to actually fetch the values from the disk tier, and I thought onHit and onMiss would be reasonable since that's when getTimeEWMA will actually change. But I agree it's not extensible to new stats which might change on some other frequency. Should we instead have some sort of background job to periodically gather stats from the disk tier?
| private RemovalListener<IndicesRequestCache.Key, BytesReference> removalListener; | ||
| private ExponentiallyWeightedMovingAverage getTimeMillisEWMA; | ||
| private static final double GET_TIME_EWMA_ALPHA = 0.3; // This is the value used elsewhere in OpenSearch | ||
| private static final double GET_TIME_EWMA_ALPHA = 0.3; // This is the value used elsewhere in OpenSearch |
There was a problem hiding this comment.
ExponentiallyWeightedMovingAverage(GET_TIME_EWMA_ALPHA, 10). Keeping 10 as intialAvg doesn't seem right. Don't know the right value but should keeping it 0 be better?
There was a problem hiding this comment.
I somewhat arbitrarily picked 10 since we expect a disk seek to take ~10 ms on a spinning disk but 0 might be better, yeah
| @@ -52,7 +56,7 @@ public class EhcacheDiskCachingTier implements DiskCachingTier<IndicesRequestCac | |||
| private final String diskCacheFP; // the one to use for this node | |||
| private RemovalListener<IndicesRequestCache.Key, BytesReference> removalListener; | |||
| private ExponentiallyWeightedMovingAverage getTimeMillisEWMA; | |||
There was a problem hiding this comment.
Lets also have normal average as well. EWMA might be useful for recent stats and normal will give overall view of get time. As discussed, creating some separate DiskStats might be better. We can move such stats there.
| import java.io.IOException; | ||
| import java.io.Serializable; | ||
|
|
||
| public class StatsHolder implements Serializable, Writeable, ToXContentFragment { |
There was a problem hiding this comment.
But would this StatsHolder be used elsewhere? As this looks pretty generic though will only be used for requestCache. Considering this is specific to RequestCache, maybe we can still keep it inside ShardRequestCache as public. And use it from RequestCacheStats, should be fine.
Description
Modifies the request cache's API to return statistics for additional cache tiers, like the upcoming disk tier. Also adds the number of entries to the response. Stats for the existing on-heap tier stayed where they were, in the "request_cache" object. This object has a new field, the "tiers" object. Each new tier, besides the on-heap tier, will have its stats returned here. If a certain tier is not enabled, its statistics will still be returned, with all its values set to 0.
Calling _nodes/stats/indices/request_cache now returns the following:
Tested with unit tests for the overhauled RequestCacheStats, an integration test, and manual testing with the API.
Related Issues
Part of larger tiered caching feature.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.