-
Notifications
You must be signed in to change notification settings - Fork 76
Update handling of mxOPAQUE_CLASS objects and Subsystem in MAT_v5.jl #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9f44ce5 to
89572a7
Compare
|
With these changes, full support is added for loading classdef objects in MAT-files in both v5 and HDF5 formats. Classdef objects are returned as a The changes support different MAT-file and subsystem versions. It also supports loading all types of Some notes:
Edit: |
* MAT_subsys.jl: New file MAT_subsys with methods to set, parse and retrieve subsystem data * MAT_v5.jl: New method "read_opaque" to handle mxOPAQUE_CLASS * MAT_v5.jl: New method "read_subsystem" to handle subsystem data * MAT.jl (matread): Update to clear subsystem and object cache after load Support for loading mxOPAQUE_CLASS objects in v7.3 HDF5 format * MAT_HDF5.jl (matopen): New argument Endian indicator, Reads and parses subsystem on load * MAT_HDF5.jl (close): Update to write endian header based on system endianness * MAT_HDF5.jl (m_read::HDF5.Dataset): Update to handle MATLAB_object_decode (mxOPAQUE_CLASS) types * MAT_HDF5.jl (m_read::HDF5.Group): Update to read subsystem data and function_handles * MAT.jl (matopen): Update function calls Updated test for struct_table_datetime.mat to ensure accurate deserialization (including nested properties) in both v7 and v7.3 formats * test/read.jl: Update tests for "function_handles.mat" and "struct_table_datetime.mat"
|
Hi there, I like your PR and I hope you still want to work on it. I have my own PR-207 and I will use Is there a good way to separate normal struct arrays from MATLAB classes in the MAT.jl interface? Would the distinction come from the existence of the (I also hope an active contributor/maintainer pops up for MAT.jl to review our work. I am trying to get in contact with them via email) |
|
Hey! It does look like there will be a conflict in our PRs. I aimed to provide read-write support in this module, but I only included read support for MATLAB classes in this PR to make it reasonable to review. Coming to the typing, my opinion is that the same types should be used for both read and write, and the typing should be consistent across the different MAT-file versions. I worked on a Pythonic version of this module, and there I used a wrapper class Ultimately, it's a design choice that would benefit from a discussion surrounding user requirements. It would be good if a maintainer (@ViralBShah ?) could pitch in their thoughts. As I understand there's not much activity here, but I would be happy to help out in any way I can! |
|
I like the idea of a wrapper type, at least for read/write consistency. Having obscure key names like I wonder if it would be smart to have another wrapper type for the struct arrays? Some kind of mutable named tuple like type would be great, like: Looking at matio for inspiration, I found a struct array test, but I cannot quickly infer what Python type you are reading/writing? How do you differentiate between cell arrays of structs and struct arrays? Right now MAT.jl does not have read/write consistency for struct arrays (they get re-written as a struct with cell arrays per field name). |
|
@matthijscox is a maintainer and we can add others too. |
So my main point was to keep it consistent with For example, The only place this doesn't work is for empty structs which don't have field names. In this case, I've used a wrapper class
I do agree with this. In fact, I went through several iterations in my Python module before arriving at the current data representations. The main problem I had was ensuring separate types for simple read-write operations whilst keeping it user-friendly and scalable for the newer MATLAB classdef-based types. This lead to me using a bunch of wrapper classes that keep a balance between allowing numpy array operations and differentiability. Something of that sort can be done here if it aligns with the scope of this package, and I would be happy to help. I'm not familiar with Julia typing and will need some time, but in the meantime you can read the type conversions I used in |
src/MAT_subsys.jl
Outdated
|
|
||
| object_arr = Array{Dict{String,Any}}(undef, convert(Vector{Int}, dims)...) | ||
|
|
||
| for i = 1:length(object_arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor issue: more Julian syntax would be for (i, oid) in enumerate(object_ids)
src/MAT_subsys.jl
Outdated
| end | ||
|
|
||
| const subsys_cache = Ref{Union{Nothing,Subsys}}(nothing) | ||
| const object_cache = Ref{Union{Nothing, Dict{UInt32, Dict{String,Any}}}}(nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is pretty complicated, so I cannot oversee the decisions here, but I am curious why you need this stateful cache? Is it not possible to rewrite the code to just pass the subsys_cache and object_cache through each function?
I think we now we have the risk that someone tries to read multiple .mat files in multiple threads and this will break down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been a while so I'm not sure exactly, but I think it had to with it involving a whole bunch of function definition changes to incorporate both the caches. There's a lot of recursive calls so the caches would have to be passed to all functions even if they don't actually use it.
I would prefer a better alternative to handle thread safety, but your suggestion works as well.
|
I've begun reading the code. The classes seem to be read mostly here as dicts in MAT_subsys.jl R285-R288. I suppose we could define a struct MatlabOpaque{T}
classname::Symbol
properties::Vector{String} # or Vector{Symbol}
values::Vector{T}
endOptionally I wonder if we want to be able to expose the class "type" to the Julia type system. (These Val types do come with a small performance penalty I remember) struct MatlabOpaque{C<:Val, T}
properties::Vector{String} # or Vector{Symbol}
values::Vector{T}
end
# outer constructor
function MatlabOpaque(classname::Symbol, properties::Vector{String}, values::Vector{T}) where T
return MatlabOpaque{Val{classname}, T}(properties, values)
endNow you can identify a vector of same class (class/struct array?) from a vector different classes (cell array?) MatlabOpaque{Val{:table}, Any}[] # class array of Matlab tables
MatlabOpaque[] # class (cell) array of different classes
Any[] # cell array of any kind of (matlab) typeThis also means we can write automatic converters: Base.convert(::Type{DataFrame}, MatlabOpaque{Val{:table}}) = ...or better yet add the Tables.jl interface: Tables.istable(::Type{MatlabOpaque{Val{:table}}) = true
Tables.istable(::Type{MatlabOpaque}) = falseOfcourse instead of the Val type parameter, we could just convert immediately to different known types during the reading. Then we have to define the multiple MAT.jl internal types we need, or immediately convert to the most sensible Julia type. struct MatlabTable
...
end
Tables.istable(::Type{MatlabTable}) = true
struct MatlabDuration <: Dates.AbstractTime # or use Dates.Millisecond type
...
endI would propose to first implement the |
|
If I'm understanding correctly, we'd have Some points to clarify/get clarified:
The rest sounds good. Doesn't sound like it diverges too much from the current state of the PR. To summarize the main edits:
|
Indeed. Note I now created a
You mean that you want to index by name Basically we can define methods like: function Base.getindex(object::MatlabOpaque, prop::AbstractString)
idx = findfirst(isequal(prop), object.properties)
return object.values[idx]
end
Pff, I will have to check. You also make me worried about these empty sized struct arrays for my PR. Is this the way to construct them?
Yeah, well
I very new to MAT.jl myself I admit. I saw the It's possible we're going to have merge conflicts with my PR, since I've been refactoring a bit there. So if you have any time to review my PR that would be great. |
To quickly follow-up on your question: The bad newsCurrently (MAT v0.10) we cannot. An empty >> s00
s00 =
struct with fields:
c: {}
b: {}
a: {}Good newsIn my new struct array PR I can write them! (reading goes wrong, I'll fix that): In Julia: empty_sarr = MAT.MatlabStructArray(["a", "b", "c"], [Matrix{Any}(undef, 0, 0), Matrix{Any}(undef, 0, 0), Matrix{Any}(undef, 0, 0)])
matwrite("test.mat", Dict("s00" => empty_sarr))In MATLAB: >> load('test.mat')
>> s00
s00 =
0×0 empty struct array with fields:
a
b
c |
Haha yeah, sorry. It's an edge case from a user perspective, but MATLAB uses these zero-element dimensions as placeholders, and getting them wrong results in unexpected errors when loading in MATLAB. Ideally if the
yeah, sounds good.
I believe I did notice some differences between the HDF5 and v5 versions. The HDF5 file isn't as extensive as the v5 file, and I honestly think it needs a rewrite to allow consistency between versions. But that's for another time. |
|
I now support empty struct arrays in my PR #207. Create one easily via I also created the I think that's all we need to continue with your PR. I will probably merge soon and then if you want I can help with this PR to solve any merge conflicts for you. |
|
Thanks! I'm a bit swamped till the end of the month and will most probably take a look at this after, will try to squeeze something in as soon as I can |
| merge!(prop_dict, get_properties(subsys, oid)) | ||
| # cache it | ||
| obj = MatlabOpaque(prop_dict, classname) | ||
| subsys.object_cache[oid] = obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The object should be cached before merge!. Else get_properties can result in an infinite recursion for handle class objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see... Should we add a unit test for these handle class objects?
|
|
||
| function get_default_properties(subsys::Subsystem, class_id::UInt32) | ||
| prop_vals_class = subsys.prop_vals_defaults[class_id+1, 1] | ||
| # is it always a MatlabStructArray? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think prop_vals_class will be a MatlabStructArray. It should always be a scalar 1x1 struct. Did you come across an example that goes against this?
Also, we can get rid of the copy statement. Instead, something like this to identify nested objects should work.
for (k, v) in prop_vals_class
prop_vals_class[k] = update_nested_props!(v, subsys)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had errors for 0x0 MatlabStructArray so this case definitely happens in the test data somewhere
| return DateTime[] | ||
| end | ||
| if !isempty(obj["tz"]) | ||
| @warn "no timezone conversion yet for datetime objects. timezone ignored" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be a good idea to display the timezone the datetime object was saved with in the warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Note I didn't want to add TimeZones.jl yet because I remember that it's a massive dependency, adding lots of precompile or artifact download time
| if num_strings==1 | ||
| return first(strings) | ||
| else | ||
| return reshape(strings, shape...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the job is failing here. Maybe explicitly cast to a tuple?
|
Somehow I increased SnoopCompile invalidations a bit, this might mean package loading is slightly slower, but it's all due to dependencies (mostly HDF5.jl). Not sure I know how to fix this, so maybe we'll just take the hit. |
Spinoff from #23 to read the undocumented datatype
mxOPAQUE_CLASS. Part of a series of changes to read and write these types across formatsv5andv7.3.Some context :
uint8array. However, this looks like another MAT-file that needs to be converted and read into. This needs to parsed before reading any variables in file.Edit:
Added a new file "MAT_subsys.jl" which contains methods for caching, parsing, and retrieving subsystem data to be assigned to an object. With this it should successfully load classdef objects. Additional context regarding how subsystem data is organized below:
uint32integers even though its written in asuint8-- Block 1 is a version indicator and some offset values
-- Block 2 is a list of class and property names as uint8 integers (null terminated)
-- Block 3 is a list of class IDs
-- Blocks 4 and 6 contain some metadata about how linking property names and property values
-- Block 5 is a list of object ID metadata
-- Block 7 is a list of dynamic properties attached to the object
-- Blocks 8 and 9 are unknown