- 09 Sep, 2017 1 commit
-
-
Davidlohr Bueso authored
Allow interval trees to quickly check for overlaps to avoid unnecesary tree lookups in interval_tree_iter_first(). As of this patch, all interval tree flavors will require using a 'rb_root_cached' such that we can have the leftmost node easily available. While most users will make use of this feature, those with special functions (in addition to the generic insert, delete, search calls) will avoid using the cached option as they can do funky things with insertions -- for example, vma_interval_tree_insert_after(). [jglisse@redhat.com: fix deadlock from typo vm_lock_anon_vma()] Link: http://lkml.kernel.org/r/20170808225719.20723-1-jglisse@redhat.com Link: http://lkml.kernel.org/r/20170719014603.19029-12-dave@stgolabs.netSigned-off-by:
Davidlohr Bueso <dbueso@suse.de> Signed-off-by:
Jérôme Glisse <jglisse@redhat.com> Acked-by:
Christian König <christian.koenig@amd.com> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by:
Doug Ledford <dledford@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Cc: David Airlie <airlied@linux.ie> Cc: Jason Wang <jasowang@redhat.com> Cc: Christian Benvenuti <benve@cisco.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 29 Jul, 2017 1 commit
-
-
Jason Wang authored
This reverts commit 809ecb9b. Since it was reported to break vhost_net. We want to cache used event and use it to check for notification. The assumption was that guest won't move the event idx back, but this could happen in fact when 16 bit index wraps around after 64K entries. Signed-off-by:
Jason Wang <jasowang@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 20 Jun, 2017 1 commit
-
-
Ingo Molnar authored
Rename: wait_queue_t => wait_queue_entry_t 'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue", but in reality it's a queue *entry*. The 'real' queue is the wait queue head, which had to carry the name. Start sorting this out by renaming it to 'wait_queue_entry_t'. This also allows the real structure name 'struct __wait_queue' to lose its double underscore and become 'struct wait_queue_entry', which is the more canonical nomenclature for such data types. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- 01 Mar, 2017 1 commit
-
-
Jason Wang authored
When device IOTLB is enabled, all address translations were stored in interval tree. O(lgN) searching time could be slow for virtqueue metadata (avail, used and descriptors) since they were accessed much often than other addresses. So this patch introduces an O(1) array which points to the interval tree nodes that store the translations of vq metadata. Those array were update during vq IOTLB prefetching and were reset during each invalidation and tlb update. Each time we want to access vq metadata, this small array were queried before interval tree. This would be sufficient for static mappings but not dynamic mappings, we could do optimizations on top. Test were done with l2fwd in guest (2M hugepage): noiommu | before | after tx 1.32Mpps | 1.06Mpps(82%) | 1.30Mpps(98%) rx 2.33Mpps | 1.46Mpps(63%) | 2.29Mpps(98%) We can almost reach the same performance as noiommu mode. Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 15 Dec, 2016 1 commit
-
-
Jason Wang authored
When event index was enabled, we need to fetch used event from userspace memory each time. This userspace fetch (with memory barrier) could be saved sometime when 1) caching used event and 2) if used event is ahead of new and old to new updating does not cross it, we're sure there's no need to notify guest. This will be useful for heavy tx load e.g guest pktgen test with Linux driver shows ~3.5% improvement. Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 02 Aug, 2016 1 commit
-
-
Jason Wang authored
This patch tries to implement an device IOTLB for vhost. This could be used with userspace(qemu) implementation of DMA remapping to emulate an IOMMU for the guest. The idea is simple, cache the translation in a software device IOTLB (which is implemented as an interval tree) in vhost and use vhost_net file descriptor for reporting IOTLB miss and IOTLB update/invalidation. When vhost meets an IOTLB miss, the fault address, size and access can be read from the file. After userspace finishes the translation, it writes the translated address to the vhost_net file to update the device IOTLB. When device IOTLB is enabled by setting VIRTIO_F_IOMMU_PLATFORM all vq addresses set by ioctl are treated as iova instead of virtual address and the accessing can only be done through IOTLB instead of direct userspace memory access. Before each round or vq processing, all vq metadata is prefetched in device IOTLB to make sure no translation fault happens during vq processing. In most cases, virtqueues are contiguous even in virtual address space. The IOTLB translation for virtqueue itself may make it a little slower. We might add fast path cache on top of this patch. Signed-off-by:
Jason Wang <jasowang@redhat.com> [mst: use virtio feature bit: VHOST_F_DEVICE_IOTLB -> VIRTIO_F_IOMMU_PLATFORM ] [mst: fix build warnings ] Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> [ weiyj.lk: missing unlock on error ] Signed-off-by:
Wei Yongjun <weiyj.lk@gmail.com>
-
- 01 Aug, 2016 2 commits
-
-
Jason Wang authored
Current pre-sorted memory region array has some limitations for future device IOTLB conversion: 1) need extra work for adding and removing a single region, and it's expected to be slow because of sorting or memory re-allocation. 2) need extra work of removing a large range which may intersect several regions with different size. 3) need trick for a replacement policy like LRU To overcome the above shortcomings, this patch convert it to interval tree which can easily address the above issue with almost no extra work. The patch could be used for: - Extend the current API and only let the userspace to send diffs of memory table. - Simplify Device IOTLB implementation. Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
We use spinlock to synchronize the work list now which may cause unnecessary contentions. So this patch switch to use llist to remove this contention. Pktgen tests shows about 5% improvement: Before: ~1300000 pps After: ~1370000 pps Signed-off-by:
Jason Wang <jasowang@redhat.com> Reviewed-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 11 Mar, 2016 3 commits
-
-
Jason Wang authored
This patch tries to poll for new added tx buffer or socket receive queue for a while at the end of tx/rx processing. The maximum time spent on polling were specified through a new kind of vring ioctl. Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
This patch introduces a helper which will return true if we're sure that the available ring is empty for a specific vq. When we're not sure, e.g vq access failure, return false instead. This could be used for busy polling code to exit the busy loop. Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
This path introduces a helper which can give a hint for whether or not there's a work queued in the work list. This could be used for busy polling code to exit the busy loop. Signed-off-by:
Jason Wang <jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 02 Mar, 2016 1 commit
-
-
Greg Kurz authored
Looking at how callers use this, maybe we should just rename init_used to vhost_vq_init_access. The _used suffix was a hint that we access the vq used ring. But maybe what callers care about is that it must be called after access_ok. Also, this function manipulates the vq->is_le field which isn't related to the vq used ring. This patch simply renames vhost_init_used() to vhost_vq_init_access() as suggested by Michael. No behaviour change. Signed-off-by:
Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 28 Oct, 2015 1 commit
-
-
Michael S. Tsirkin authored
commit 2751c988 ("vhost: cross-endian support for legacy devices") introduced a minor regression: even with cross-endian disabled, and even on LE host, vhost_is_little_endian is checking is_le flag so there's always a branch. To fix, simply check virtio_legacy_is_little_endian first. Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 16 Sep, 2015 1 commit
-
-
Michael S. Tsirkin authored
virtio 1 and any layout are core features, move them there. This fixes vhost test. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 01 Jun, 2015 3 commits
-
-
Greg Kurz authored
This patch brings cross-endian support to vhost when used to implement legacy virtio devices. Since it is a relatively rare situation, the feature availability is controlled by a kernel config option (not set by default). The vq->is_le boolean field is added to cache the endianness to be used for ring accesses. It defaults to native endian, as expected by legacy virtio devices. When the ring gets active, we force little endian if the device is modern. When the ring is deactivated, we revert to the native endian default. If cross-endian was compiled in, a vq->user_be boolean field is added so that userspace may request a specific endianness. This field is used to override the default when activating the ring of a legacy device. It has no effect on modern devices. Signed-off-by:
Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by:
David Gibson <david@gibson.dropbear.id.au>
-
Greg Kurz authored
The current memory accessors logic is: - little endian if little_endian - native endian (i.e. no byteswap) if !little_endian If we want to fully support cross-endian vhost, we also need to be able to convert to big endian. Instead of changing the little_endian argument to some 3-value enum, this patch changes the logic to: - little endian if little_endian - big endian if !little_endian The native endian case is handled by all users with a trivial helper. This patch doesn't change any functionality, nor it does add overhead. Signed-off-by:
Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by:
David Gibson <david@gibson.dropbear.id.au>
-
Greg Kurz authored
Signed-off-by:
Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Acked-by:
Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by:
David Gibson <david@gibson.dropbear.id.au>
-
- 09 Dec, 2014 3 commits
-
-
Jason Wang authored
Signed-off-by:
Jason Wang <jasowang@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Michael S. Tsirkin authored
Add guest memory access wrappers to handle virtio endianness conversions. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Jason Wang <jasowang@redhat.com> Reviewed-by:
Cornelia Huck <cornelia.huck@de.ibm.com>
-
Michael S. Tsirkin authored
We need to use bit 32 for virtio 1.0. Make vhost_has_feature bool to avoid discarding high bits. Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Jason Wang <jasowang@redhat.com>
-
- 09 Jun, 2014 2 commits
-
-
Michael S. Tsirkin authored
commit 2ae76693b8bcabf370b981cd00c36cd41d33fabc vhost: replace rcu with mutex replaced rcu sync for memory accesses with VQ mutex locl/unlock. This is correct since all accesses are under VQ mutex, but incomplete: we still do useless rcu lock/unlock operations, someone might copy this code into some other context where this won't be right. This use of RCU is also non standard and hard to understand. Let's copy the pointer to each VQ structure, this way the access rules become straight-forward, and there's no need for RCU anymore. Reported-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Michael S. Tsirkin authored
Refactor code to make sure features are only accessed under VQ mutex. This makes everything simpler, no need for RCU here anymore. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 06 Dec, 2013 1 commit
-
-
Zhi Yong Wu authored
Since vhost_dev_init() forever return 0, some branches are never run, therefore need to be removed. Signed-off-by:
Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 11 Jul, 2013 1 commit
-
-
Asias He authored
Now, vq->private_data is always accessed under vq mutex. No need to play the vhost rcu trick. Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 07 Jul, 2013 1 commit
-
-
Asias He authored
Currently, vhost-net and vhost-scsi are sharing the vhost core code. However, vhost-scsi shares the code by including the vhost.c file directly. Making vhost a separate module makes it is easier to share code with other vhost devices. Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 11 Jun, 2013 1 commit
-
-
Michael S. Tsirkin authored
If device has an owner, we shouldn't touch ubuf_info since it might be in use. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 06 May, 2013 4 commits
-
-
Asias He authored
It is net.c specific. Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Asias He authored
It is supposed to be removed when hdr is moved into vhost_net_virtqueue. Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Asias He authored
vhost.h should not depend on device specific marcos like VHOST_NET_F_VIRTIO_NET_HDR and VIRTIO_NET_F_MRG_RXBUF. Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Asias He authored
Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 01 May, 2013 4 commits
-
-
Michael S. Tsirkin authored
RESET_OWNER ioctl would leave the fd in a bad state if memory allocation failed: device is stopped but owner is not reset. Make state changes after allocating memory, such that a failed ioctl has no effect. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Michael S. Tsirkin authored
This will remove the need for vhost scsi to pull in virtio-net.h. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Asias He authored
On top of 'vhost: Allow device specific fields per vq', we can move device specific fields to device virt queue from vhost virt queue. Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Asias He authored
This is useful for any device who wants device specific fields per vq. For example, tcm_vhost wants a per vq field to track requests which are in flight on the vq. Also, on top of this we can add patches to move things like ubufs from vhost.h out to net.c. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 29 Jan, 2013 1 commit
-
-
Jason Wang authored
Currently, the polling errors were ignored, which can lead following issues: - vhost remove itself unconditionally from waitqueue when stopping the poll, this may crash the kernel since the previous attempt of starting may fail to add itself to the waitqueue - userspace may think the backend were successfully set even when the polling failed. Solve this by: - check poll->wqh before trying to remove from waitqueue - report polling errors in vhost_poll_start(), tx_poll_start(), the return value will be checked and returned when userspace want to set the backend After this fix, there still could be a polling failure after backend is set, it will addressed by the next patch. Signed-off-by:
Jason Wang <jasowang@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- 06 Dec, 2012 1 commit
-
-
Michael S. Tsirkin authored
vring changes already do a flush internally where appropriate, so we do not need a second flush. It's currently not very expensive but a follow-up patch makes flush more heavy-weight, so remove the extra flush here to avoid regressing performance if call or kick fds are changed on data path. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
- 03 Nov, 2012 4 commits
-
-
Michael S. Tsirkin authored
Zerocopy handling code is vhost-net specific. Move it from vhost.c/vhost.h out to net.c Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Michael S. Tsirkin authored
This will be used to disable zerocopy when error rate is high. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Michael S. Tsirkin authored
Better document macros for DMA tracking. Add an explicit one for DMA in progress instead of relying on user supplying len != 1. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Michael S. Tsirkin authored
Even if skb is marked for zero copy, net core might still decide to copy it later which is somewhat slower than a copy in user context: besides copying the data we need to pin/unpin the pages. Add a parameter reporting such cases through zero copy callback: if this happens a lot, device can take this into account and switch to copying in user context. This patch updates all users but ignores the passed value for now: it will be used by follow-up patches. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-