Skip to content

Parallax 실행시 Tensorflow error 문의 #27

@dostos

Description

@dostos

안녕하세요,
과제를 진행 중, Parallax에서 실행시에만 발생하는 문제가 생겨 문의 드리고자 합니다.

image
image

위의 코드로 training을 수행시 Parallax 에서만 아래의 문제가 발생합니다.

`(128, 784)
Traceback (most recent call last):
File "/hw2/code/run_parallax.py", line 71, in
cost = autoencoder.partial_fit(sess, batch)
File "/hw2/code/autoencoder/autoencoder_models/Autoencoder.py", line 65, in partial_fit
cost, opt = sess.run((self.cost, self.optimizer), feed_dict={self.x: X})
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1148, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1239, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1224, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1296, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/parallax/core/python/common/session_context.py", line 40, in _parallax_run
return self._run_internal(fetches, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1086, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (784,) for Tensor u'Placeholder:0', which has shape '(?, 784)'

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[59384,1],0]
Exit code: 1
--------------------------------------------------------------------------`

session run에 들어가는 tensor shape이 문제인듯 한데,
다른 실행 코드들도 같은 방식으로 잘 동작하며 feed 전의 모양은 해당하는 tensor의 shape에 알맞은 모양임을 확인하였습니다.

https://github.com/snuspl/parallax/blob/cpu_enable/parallax/parallax/core/python/common/session_context.py#L40

해당 라인 이후로 feed data가 parallax 내부에서 변환되는 부분이 있는지 알고 싶습니다.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions