-
Notifications
You must be signed in to change notification settings - Fork 49
RFC: Refactor interrupts #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Currently the vector ID and queue ID are the same and the ID starts a 0. User interrupt are allocated after queue interrupts. This results in the user interrupt ending up at different locations depending on the number of vectors available in the system. This is a problem as the user interrupt vector is hardcoded in the shell and not configurable from SW. This patch reorders the interrupt such that it matches the QDMA reference driver: 0. mailbox interrupt (optional - defaults to unused) 1. user interrupt (optional - defaults to unused) 2. error interrupt on PF0 3. queue 0 interrupt 4. queue 1 interrupt ... Hence, if mailbox interrupts is disabled, user interrupt will end up a interrupt vector 0 and if enabled at interrupt vector 1. Signed-off-by: Lars Munch <lars@segv.dk>
|
Hi Lars, I tried testing your changes briefly this afternoon, but I noticed that I was seeing some packet loss (30-50%) when sending pings between my two test machines (with U250 versions of open-nic in each test machine). Did you see any packet loss in your tests? I'll be out the rest of the week, and will reply next week if you reply. Thanks, |
|
The problem seems not be due to your code bug. |
|
Interesting. I do not see any packet drops. I do though get one Error Interrupt on driver loading: Ideally, the driver should be able to reserve user interrupt vectors (like XDMA reference driver can) because everything else can be configured, but I figured I start with these changes. @Hyunok-Kim that is super interesting with the new driver!!! especially since the road to SRIOV support will be way shorter. To be honest I was a bit surprised see the opennic driver did not use libqdma. What is the plans for your new driver? replacing the current opennic driver? Thanks |
|
@Hyunok-Kim I have a quick question, would the libqdma version require an extra set of data copies for receiving and transmitting in the new version? I think the person at Xilinx who wrote the initial open-nic driver had started experimenting with libqdma, but then ended up not using it in the end. Does the performance with your new driver seem better, worse, or about the same? If your new driver turns out to be better overall, could we replace the existing code under open-nic-driver with your new driver? |
|
While qep-driver can use flexible configuration using device tree as firmware, The performance of my driver is worse than open-nic-driver. Anyway, I will test it again after I implement poll mode |
|
@Hyunok-Kim When I try your modifications to the shell and new driver, I get pretty similar performance to the current driver. Current = 13.1 Gb/s and New = 12.4 Gb/s. This is on my local test setup using two machines, each with a U250. I would say it's pretty good, but am interested if the polling mode improves performance. |
|
Ideally, the driver should be able to reserve user interrupt vectors (like XDMA reference driver can) because error, mailbox and queue interrupt vectors can all be configured dynamically while user interrupts cannot. So eventually something like this patch needs to go in. If it should be merged in it current form is up to you. With regards to the two drivers, I am leaning towards the new driver as the road to SRIOV is shorter and I believe it could easier to use the new driver on a PF while running DPDK on a VF. I have not yet had the time to try the new driver though. The benefit I see of the current driver is that it is way smaller and it performs better (at least at the moment). |
|
@Hyunok-Kim I noticed that you just updated your libqdma based driver with some nice json configuration. Did you try poll mode for tx to see if you can get similar performance as the current OpenNIC driver? What are your plans going forward with this driver? |
|
@lmunch Throughput of poll mode wasn't better than that of interrupt mode in my test-setup.
But my setup seems not to be proper for throughput test. @cneely-xlnx is doing performance test with my driver and powerful CPUs. He will report the result and make a certain decision. |
|
@Hyunok-Kim thanks for the update. Do you have any other features planned for the driver e.g SRIOV? I will see if I can find the time to try it out before Christmas. |
|
@lmunch Unfortunately, I'm not familiar with SRIOV and it is not in my interest. I'm still interesting in throughput improvement. |
|
In my test setup, I'm running n instances of iperf3 each configured for 5 Gb/s between two local machines. I'm planning to try this on some more powerful server machines, but haven't gotten to that yet. This is a work in progress. single process CPU util transfer rate n=15 processes CPU util transfer rate |
|
@Hyunok-Kim, I just had some time to try your new driver. I only had one issue with the code (I will send you a pull request later). The performance on my system is not that great (yet.. I know its work in progress). I use a 10G connection. On original OpenNIC driver I get: And similar in the other direction. When using your driver I get: and: |
|
@lmunch To measure network throughput, I referenced the following articles |
|
@Hyunok-Kim thanks for the links. I know my setup is not good for throughput measurements but its good enough for direct comparison of the 2 drivers. I run with the exact same FPGA code (I modified the rx/tx descriptor in the OpenNIC driver) so I can do simply swap drivers and re-test. As you can see from the numbers above then something is not good in onic-driver (especially in the last test). |
|
@lmunch @cneely-amd is that correct that this PR needs open-nic-shell patch for interrupt reordering? |
|
I do not recall having to change anything in the open-nic-shell to use this patch. If you use user interrupts, then you off course need to set that correctly. BTW, we never figured out why @cneely-amd was seeing packet loss. I have now been using this patch with success for several years. |
|
@lmunch with this patch I do not see any interrupts . open-nic-shell patch is not patched. And with main branch netperf works fine with lower number of queues (assuming netperf threads go to different rx queues.). And on increasing numbers of netperf state driver goes to state with 30% packet loss. And did not go back even if there is no traffic. I'm not sure if that what @cneely-amd saw, but I can reproduce it. If I limit number of queues to 1 in the driver. Then it operates pretty much stable at about 20 Gbit/s. At lest for first 20 minutes. After that error interrupt happens onic 0000:01:00.0: Error IRQ (BH) fired on Funtion#00000: vector=181 (/proc/interrupts shows that single error interrupt also). And hw stops to operate. (HW is Alveo u250 in 2 port mode (2 PFs)). Is there any doc where I can understand what is the error interrupt? |
Currently the vector ID and queue ID are the same and the ID starts a 0. User
interrupt are allocated after queue interrupts. This results in the user
interrupt ending up at different locations depending on the number of vectors
available in the system. This is a problem as the user interrupt vector is
hardcoded in the shell and not configurable from SW.
This patch reorders the interrupt such that it matches the QDMA reference
driver:
...
Hence, if mailbox interrupts is disabled, user interrupt will end up a interrupt
vector 0 and if enabled at interrupt vector 1.
Signed-off-by: Lars Munch lars@segv.dk