Skip to content

OOM but memory is enough #11747

@FelixYBW

Description

@FelixYBW

Backend

VL (Velox)

Bug description

org.apache.gluten.exception.GlutenException: org.apache.gluten.exception.GlutenException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan, plan node ID: 0]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 1024.0 KiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled). 
Current config settings: 
	spark.memory.offHeap.enabled=True
	spark.gluten.memory.dynamic.offHeap.sizing.enabled=N/A
	spark.gluten.memory.offHeap.size.in.bytes=93.6 GiB
	spark.gluten.memory.task.offHeap.size.in.bytes=6.2 GiB
	spark.gluten.memory.conservative.task.offHeap.size.in.bytes=3.1 GiB
Memory consumer stats: 
	Task.36963:                                                  Current used bytes:  361.0 MiB, peak bytes:        N/A
	\- Gluten.Tree.3307:                                         Current used bytes:  361.0 MiB, peak bytes:  424.0 MiB
	   \- Capacity[8.0 EiB].3306:                                Current used bytes:  361.0 MiB, peak bytes:  424.0 MiB
	      +- NativePlanEvaluator-3530.0:                         Current used bytes:  360.0 MiB, peak bytes:  416.0 MiB
	      |  \- single:                                          Current used bytes:  360.0 MiB, peak bytes:  416.0 MiB
	      |     +- root:                                         Current used bytes:  352.4 MiB, peak bytes:  416.0 MiB
	      |     |  +- task.Gluten_Stage_466_TID_36963_VTID_3537: Current used bytes:  352.4 MiB, peak bytes:  416.0 MiB
	      |     |  |  +- node.0:                                 Current used bytes:  351.9 MiB, peak bytes:  352.0 MiB
	      |     |  |  |  +- op.0.0.0.TableScan:                  Current used bytes:  351.9 MiB, peak bytes:  352.0 MiB
	      |     |  |  |  \- op.0.0.0.TableScan.test-hive:        Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  +- node.2:                                 Current used bytes:  432.0 KiB, peak bytes:  416.0 MiB
	      |     |  |  |  +- op.2.0.0.HashProbe:                  Current used bytes:  288.0 KiB, peak bytes:  320.0 KiB
	      |     |  |  |  \- op.2.1.0.HashBuild:                  Current used bytes:  144.0 KiB, peak bytes:  396.3 MiB
	      |     |  |  +- node.1:                                 Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  \- op.1.1.0.ValueStream:                Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  +- node.4:                                 Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  |  \- op.4.0.0.FilterProject:              Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |  \- node.3:                                 Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  |     \- op.3.0.0.FilterProject:              Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     |  \- default_leaf:                              Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |     \- gluten::MemoryAllocator:                      Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |        \- default:                                   Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- BuildSideRelation#deserialized.1632:                Current used bytes: 1024.0 KiB, peak bytes:    8.0 MiB
	      |  \- single:                                          Current used bytes: 1024.0 KiB, peak bytes:    8.0 MiB
	      |     +- root:                                         Current used bytes:   54.8 KiB, peak bytes: 1024.0 KiB
	      |     |  \- default_leaf:                              Current used bytes:   54.8 KiB, peak bytes:  861.0 KiB
	      |     \- gluten::MemoryAllocator:                      Current used bytes:      0.0 B, peak bytes:      0.0 B
	      |        \- default:                                   Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- BuildSideRelation#deserialized.1632.OverAcquire.0:  Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- ArrowContextInstance.2695:                          Current used bytes:      0.0 B, peak bytes:    8.0 MiB
	      +- IteratorMetrics.3077.OverAcquire.0:                 Current used bytes:      0.0 B, peak bytes:      0.0 B
	      +- NativePlanEvaluator-3530.0.OverAcquire.0:           Current used bytes:      0.0 B, peak bytes:      0.0 B
	      \- IteratorMetrics.3077:                               Current used bytes:      0.0 B, peak bytes:      0.0 B
	         \- single:                                          Current used bytes:      0.0 B, peak bytes:      0.0 B
	            +- root:                                         Current used bytes:      0.0 B, peak bytes:      0.0 B
	            |  \- default_leaf:                              Current used bytes:      0.0 B, peak bytes:      0.0 B
	            \- gluten::MemoryAllocator:                      Current used bytes:      0.0 B, peak bytes:      0.0 B
	               \- default:                                   Current used bytes:      0.0 B, peak bytes:      0.0 B

	at org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget.borrow(ThrowOnOomMemoryTarget.java:154)
	at org.apache.gluten.memory.listener.ManagedReservationListener.reserve(ManagedReservationListener.java:49)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
	at org.apache.gluten.vectorized.ColumnarBatchOutIterator.hasNext0(ColumnarBatchOutIterator.java:58)
	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:36)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:154)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:66)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:38)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:95)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.execution.BroadcastUtils$.serializeStream(BroadcastUtils.scala:176)
	at org.apache.gluten.backendsapi.velox.VeloxSparkPlanExecApi.$anonfun$createBroadcastRelation$1(VeloxSparkPlanExecApi.scala:685)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Retriable: False
Function: operator()
File: /velox/velox/exec/Driver.cpp
Line: 595
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE4_clEv.cold
# 4  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 5  _ZN8facebook5velox4exec6Driver4nextEPN5folly10SemiFutureINS3_4UnitEEERPNS1_8OperatorERNS1_14BlockingReasonE
# 6  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
# 7  _ZN6gluten24WholeStageResultIterator4nextEv
# 8  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 9  0x00007fb588e1f8db

	at org.apache.gluten.iterator.ClosableIterator.hasNext(ClosableIterator.java:38)
	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
	at org.apache.gluten.iterator.IteratorsV1$InvocationFlowProtection.hasNext(IteratorsV1.scala:154)
	at org.apache.gluten.iterator.IteratorsV1$IteratorCompleter.hasNext(IteratorsV1.scala:66)
	at org.apache.gluten.iterator.IteratorsV1$PayloadCloser.hasNext(IteratorsV1.scala:38)
	at org.apache.gluten.iterator.IteratorsV1$LifeTimeAccumulator.hasNext(IteratorsV1.scala:95)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:513)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at scala.collection.AbstractIterator.to(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
	at org.apache.spark.sql.execution.BroadcastUtils$.serializeStream(BroadcastUtils.scala:176)
	at org.apache.gluten.backendsapi.velox.VeloxSparkPlanExecApi.$anonfun$createBroadcastRelation$1(VeloxSparkPlanExecApi.scala:685)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:858)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:858)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingtriage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions