1
Commit Graph

31732 Commits

Author SHA1 Message Date
Rusty Russell
0a8a69dd77 Virtio helper routines for a descriptor ringbuffer implementation
These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio.  Unlike the previous lguest
implementation:

1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
   cacheline.
5) We do a modulo on a variable.  We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.

Users need only get the ring pages and provide a notify hook (KVM
wants the guest to allocate the rings, lguest does it sanely).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
2007-10-23 15:49:55 +10:00
Rusty Russell
b01d9f2863 Module autoprobing support for virtio drivers.
This adds the logic to convert the virtio ids into module aliases, and
includes a modalias entry in sysfs and the env var to make probing work.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:55 +10:00
Rusty Russell
31610434bc Virtio console driver
This is an hvc-based virtio console driver.  It's suboptimal becuase
hvc expects to have raw access to interrupts and virtio doesn't assume
that, so it currently polls.

There are two solutions: expose hvc's "kick" interface, or wean off hvc.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:55 +10:00
Rusty Russell
e467cde238 Block driver using virtio.
The block driver uses scatter-gather lists with sg[0] being the
request information (struct virtio_blk_outhdr) with the type, sector
and inbuf id.  The next N sg entries are the bio itself, then the last
sg is the status byte.  Whether the N entries are in or out depends on
whether it's a read or a write.

We accept the normal (SCSI) ioctls: they get handed through to the other
side which can then handle it or reply that it's unsupported.  It's
not clear that this actually works in general, since I don't know
if blk_pc_request() requests have an accurate rq_data_dir().

Although we try to reply -ENOTTY on unsupported commands, ioctl(fd,
CDROMEJECT) returns success to userspace.  This needs a separate
patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <jens.axboe@oracle.com>
2007-10-23 15:49:54 +10:00
Rusty Russell
296f96fcfc Net driver using virtio
The network driver uses two virtqueues: one for input packets and one
for output packets.  This has nice locking properties (ie. we don't do
any for recv vs send).

TODO:
	1) Big packets.
	2) Multi-client devices (maybe separate driver?).
	3) Resolve freeing of old xmit skbs (Christian Borntraeger)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: netdev@vger.kernel.org
2007-10-23 15:49:54 +10:00
Rusty Russell
ec3d41c4db Virtio interface
This attempts to implement a "virtual I/O" layer which should allow
common drivers to be efficiently used across most virtual I/O
mechanisms.  It will no-doubt need further enhancement.

The virtio drivers add buffers to virtio queues; as the buffers are consumed
the driver "interrupt" callbacks are invoked.

There is also a generic implementation of config space which drivers can query
to get setup information from the host.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
Cc: Arnd Bergmann <arnd@arndb.de>
2007-10-23 15:49:54 +10:00
Rusty Russell
47436aa4ad Boot with virtual == physical to get closer to native Linux.
1) This allows us to get alot closer to booting bzImages.

2) It means we don't have to know page_offset.

3) The Guest needs to modify the boot pagetables to create the
   PAGE_OFFSET mapping before jumping to C code.

4) guest_pa() walks the page tables rather than using page_offset.

5) We don't use page_offset to figure out whether to emulate: it was
   always kinda quesationable, and won't work for instructions done
   before remapping (bzImage unpacking in particular).

6) We still want the kernel address for tlb flushing: have the initial
   hypercall give us that, too.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:54 +10:00
Rusty Russell
c18acd73ff Allow guest to specify syscall vector to use.
(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).

This patch allows Guests to specify what system call vector they want,
and we try to reserve it.  We only allow one non-Linux system call
vector, to try to avoid DoS on the Host.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:53 +10:00
Rusty Russell
ee3db0f2b6 Rename "cr3" to "gpgdir" to avoid x86-specific naming.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:53 +10:00
Matias Zabaljauregui
df29f43e65 Pagetables to use normal kernel types
This is my first step in the migration of page_tables.c to the kernel
types and functions/macros (2.6.23-rc3).  Seems to be working OK.

Signed-off-by: Matias Zabaljauregui <matias.zabaljauregui@cern.ch>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:53 +10:00
Jes Sorensen
d612cde060 Move register setup into i386_core.c
Move setup_regs() to lguest_arch_setup_regs() in i386_core.c given
that this is very architecture specific.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:52 +10:00
Jes Sorensen
511801dc31 Change example launcher to use unsigned long not u32
Apply Clue 2x4 to lguest userland<->kernel handling code and the
lguest launcher. Pointers are not to be passed in u32's!

Basic rule of thumb: Anything passing u32's back and forth should be
passing unsigned longs to be portable to 64 bit archs.

For those who forgotten already, I repeat: NO POINTERS IN u32!

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:52 +10:00
Jes Sorensen
b410e7b149 Make hypercalls arch-independent.
Clean up the hypercall code to make the code in hypercalls.c
architecture independent. First process the common hypercalls and
then call lguest_arch_do_hcall() if the call hasn't been handled.
Rename struct hcall_ring to hcall_args.

This patch requires the previous patch which reorganize the layout of
struct lguest_regs on i386 so they match the layout of struct
hcall_args.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:52 +10:00
Rusty Russell
cc6d4fbcef Introduce "hcall" pointer to indicate pending hypercall.
Currently we look at the "trapnum" to see if the Guest wants a
hypercall.  But once the hypercall is done we have to reset trapnum to
a bogus value, otherwise if we exit to userspace and return, we'd run
the same hypercall twice (that was a nasty bug to find!).

This has two main effects:

1) When Jes's patch changes the hypercall args to be a generic "struct
   hcall_args" we simply change the type of "lg->hcall".  It's set by
   arch code, so if it has to copy args or something it can do so, and
   point "hcall" into lg->arch somewhere.

2) Async hypercalls only get run when an actual hypercall is pending.
   This simplfies the code a little and is a more logical semantic.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:52 +10:00
Jes Sorensen
4614a3a3b6 Reorder guest saved regs to match hyperall order
Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they
will be located together and allow it to map directly to a struct
hcall_ring entry (which will be renamed struct hcall_args as in a
subsequent patch).

This is in preparation for making the code hcall code architecture
independent.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:51 +10:00
Jes Sorensen
625efab1cd Move i386 part of core.c to x86/core.c.
Separate i386 architecture specific from core.c and move it to
x86/core.c and add x86/lguest.h header file to match.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:51 +10:00
Rusty Russell
56adbe9ddc Make shadow IDT a complete IDT with 256 entries.
This simplifies the code a little, in preparation for allowing
alternate system call vectors in guests (Plan 9 uses 0x40).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:51 +10:00
Rusty Russell
48245cc070 Remove fixed limit on number of guests, and lguests array.
Back when we had all the Guest state in the switcher, we had a fixed
array of them.  This is no longer necessary.

If we switch the network code to using random_ether_addr (46 bits is
enough to avoid clashes), we can get rid of the concept of "guest id"
altogether.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:51 +10:00
Rusty Russell
3c6b5bfa3c Introduce guest mem offset, static link example launcher
In order to avoid problematic special linking of the Launcher, we give
the Host an offset: this means we can use any memory region in the
Launcher as Guest memory rather than insisting on mmap() at 0.

The result is quite pleasing: a number of casts are replaced with
simple additions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:50 +10:00
Rusty Russell
1f4e1de4f2 Rename switcher.S to x86/switcher_32.S
lguest uses a "switcher" shim mapped high to bounce between host and
guest.  As lguest becomes less i386-centric, we separate this code
into a subdir.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:50 +10:00
Rusty Russell
34b8867a03 Move lguest guest support to arch/x86.
Lguest has two sides: host support (to launch guests) and guest
support (replacement boot path and paravirt_ops).  This moves the
guest side to arch/x86/lguest where it's closer to related code.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
2007-10-23 15:49:50 +10:00
Tony Breeds
05aa026a62 Clocksource is continuous regardless of the state of the host's TSC.
Currently lguest will spend a lot of of time waking up the host, as it
cannot go tickless (if the [host] TSC has been marked unstable). On my
laptop I was getting ~40% of wakeups from lguest.

With this patch applied, my laptop is much happier!

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:49 +10:00
Rusty Russell
ebac52524d lguest_devices belongs in lguest_bus.c: it's not i386-specific.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:49 +10:00
Rusty Russell
141341cdae Lguest currently depends on 32-bit x86, not just x86.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:48 +10:00
Jes Sorensen
891ff65ff5 Use copy_to_user() not put_user for struct timespec
Use copy_to_user() when copying a struct timespec to the guest -
put_user() cannot handle two long's in one go on a 64bit arch.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
2007-10-23 15:49:48 +10:00
Rusty Russell
25e82eba3a Remove binfmts.h include from lg.h
It wasn't needed since a very early prototype of lguest.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2007-10-23 15:49:47 +10:00
Rusty Russell
9525ca0286 Consolidate host virtualization support under Virtualization menu
Move lguest under the virtualization menu.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Avi Kivity <avi@qumranet.com>
2007-10-23 15:49:47 +10:00
Rusty Russell
d3d1c4bdf1 Normalize config options for guest support
1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
   menu.
2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
3) Make kconfig help titles consistent.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
2007-10-23 15:49:47 +10:00
Linus Torvalds
81f8320f62 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: appletouch - apply idle reset logic to all touchpads
  Input: usbtouchscreen - add support for GoTop tablet devices
  Input: bf54x-keys - return real error when request_irq() fails
  Input: i8042 - export i8042_command()
2007-10-22 19:29:58 -07:00
Linus Torvalds
0fd56c7033 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()
  sched: don't clear PF_VCPU in scheduler
  KVM: Improve local apic timer wraparound handling
  KVM: Fix local apic timer divide by zero
  KVM: Move kvm_guest_exit() after local_irq_enable()
  KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3
  KVM: VMX: Force vm86 mode if setting flags during real mode
  KVM: x86 emulator: implement 'movnti mem, reg'
  KVM: VMX: Reset mmu context when entering real mode
  KVM: VMX: Handle NMIs before enabling interrupts and preemption
  KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()
  KVM: x86 emulator: fix repne/repnz decoding
  KVM: x86 emulator: fix merge screwup due to emulator split
2007-10-22 19:24:17 -07:00
Linus Torvalds
56d61a0e26 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] 4level-fixup cleanup
  [S390] Cleanup page table definitions.
  [S390] Introduce follow_table in uaccess_pt.c
  [S390] Remove unused user_seg from thread structure.
  [S390] tlb flush fix.
  [S390] kernel: Fix dump on panic for DASDs under LPAR.
  [S390] struct class_device -> struct device conversion.
  [S390] cio: Fix incomplete commit for uevent suppression.
  [S390] cio: Use to_channelpath() for device to channel path conversion.
  [S390] Add per-cpu idle time / idle count sysfs attributes.
  [S390] Update default configuration.
2007-10-22 19:23:34 -07:00
Linus Torvalds
f09cc910fe Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits)
  [IPSEC] IPV6: Fix to add tunnel mode SA correctly.
  [NET]: Cut off the queue_mapping field from sk_buff
  [NET]: Hide the queue_mapping field inside netif_subqueue_stopped
  [NET]: Make and use skb_get_queue_mapping
  [NET]: Use the skb_set_queue_mapping where appropriate
  [INET]: Use MODULE_ALIAS_NET_PF_PROTO_TYPE where possible.
  [INET]: Let inet_diag and friends autoload
  [NIU]: Cleanup PAGE_SIZE checks a bit
  [NET]: Fix SKB_WITH_OVERHEAD calculation
  [ATM]: Fix clip module reload crash.
  [TG3]: Update version to 3.85
  [TG3]: PCI command adjustment
  [TG3]: Add management FW version to ethtool report
  [TG3]: Add 5723 support
  [Bluetooth] Convert RFCOMM to use kthread API
  [Bluetooth] Add constant for Bluetooth socket options level
  [Bluetooth] Add support for handling simple eSCO links
  [Bluetooth] Add address and channel attribute to RFCOMM TTY device
  [Bluetooth] Fix wrong argument in debug code of HIDP
  [Bluetooth] Add generic driver for Bluetooth USB devices
  ...
2007-10-22 19:22:33 -07:00
Linus Torvalds
8b0eaccab4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] Enable restart support for lite5200 board
  [POWERPC] Add restart support for mpc52xx based platforms
  [POWERPC] Update device tree binding for mpc5200 gpt
  [POWERPC] Add mpc52xx_find_and_map_path(), refactor utility functions
  [POWERPC] bestcomm: Restrict bus prefetch bugfix to original mpc5200 silicon.
2007-10-22 19:21:54 -07:00
Linus Torvalds
0c326331c8 Merge git://git.infradead.org/battery-2.6
* git://git.infradead.org/battery-2.6:
  apm_power: calculate to_full/to_empty time using energy
  apm_power: improve battery finding algorithm
  apm_power: fix obviously wrong logic for time reporting
2007-10-22 19:20:52 -07:00
Linus Torvalds
ad792f4f46 Merge branch 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb
* 'master' of ssh://master.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb: (37 commits)
  V4L/DVB (6382): saa7134: fix NULL dereference at suspend time for cards without IR receiver
  V4L/DVB (6380): ivtvfb: Removal of the 'osd_compat' module option
  V4L/DVB (6379): patch which improves GotView Saa7135 remote control
  V4L/DVB (6378b): Updates info about the removal of V4L1 at feature-removal-schedule.txt
  V4L/DVB (6378a): Removal of VIDIOC_[G|S]_MPEGCOMP from feature-removal-schedule.txt
  V4L/DVB (6378): DiB0700-device: Using 1.10 firmware
  V4L/DVB (6357): pvrusb2: Improve encoder chip health tracking
  V4L/DVB (6356): "while (!ca->wakeup)" breaks the CAM initialisation
  V4L/DVB (6352): ir-kbd-i2c: Missing break statement
  V4L/DVB (6350): V4L: possible leak in em28xx_init_isoc
  V4L/DVB (6348): ivtv: undo video mute when closing the radio
  V4L/DVB (6347): ivtv: fix video mute when radio is used
  V4L/DVB (6346): ivtvfb: YUV output size fix when ivtvfb is not loaded
  V4L/DVB (6345): ivtvfb: YUV handling of an image which is not visible in the display area
  V4L/DVB (6343): ivtvfb: check return value of unregister_framebuffer
  V4L/DVB (6342): ivtv: fix circular locking (bug 9037)
  V4L/DVB (6341): ivtv: fix resizing MPEG1 streams
  V4L/DVB (6340): ivtvfb: screen mode change sometimes goes wrong
  V4L/DVB (6339): ivtv: set the video color to black instead of green when capturing from the radio
  V4L/DVB (6338): ivtv: fix incorrect EBUSY return
  ...
2007-10-22 19:20:22 -07:00
Linus Torvalds
9747170120 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
  firewire: fw-ohci: shut up a superfluous compiler warning
  firewire: fw-ohci: log a note about unsupported features
2007-10-22 19:14:05 -07:00
Linus Torvalds
69450bb5eb Merge branch 'sg' of git://git.kernel.dk/linux-2.6-block
* 'sg' of git://git.kernel.dk/linux-2.6-block:
  Add CONFIG_DEBUG_SG sg validation
  Change table chaining layout
  Update arch/ to use sg helpers
  Update swiotlb to use sg helpers
  Update net/ to use sg helpers
  Update fs/ to use sg helpers
  [SG] Update drivers to use sg helpers
  [SG] Update crypto/ to sg helpers
  [SG] Update block layer to use sg helpers
  [SG] Add helpers for manipulating SG entries
2007-10-22 19:11:06 -07:00
Paul Mackerras
3cfa8f6c54 Merge branch 'for-2.6.24' of git://git.secretlab.ca/git/linux-2.6-mpc52xx 2007-10-23 08:45:23 +10:00
Jens Axboe
45711f1af6 [SG] Update drivers to use sg helpers
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-10-22 21:19:53 +02:00
Stefan Richter
4b6d51ec62 firewire: fw-ohci: shut up a superfluous compiler warning
New warning since commit ab88ca488b,
"firewire: fw-ohci: missing dma_unmap_single":
drivers/firewire/fw-ohci.c: In function 'at_context_transmit':
drivers/firewire/fw-ohci.c:609: warning: 'payload_bus' may be used
 uninitialized in this function

Access to payload_bus is conditional on packet->payload_length > 0,
and that won't change while in at_context_queue_packet.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2007-10-22 19:48:56 +02:00
Stefan Richter
c74e92c209 firewire: fw-ohci: log a note about unsupported features
because there seems to be more time needed to implement this.
Also, change related error return values to more appropriate ones.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
2007-10-22 19:48:55 +02:00
Laurent Vivier
49d3bd7e2b KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()
In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single()
by a single call to smp_call_function_mask() (which is new for x86_64).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2007-10-22 17:21:54 +02:00
FUJITA Tomonori
c03ab37cbe intel-iommu sg chaining support
x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't
support sg chaining, we will get data corruption.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S
358dd8ac53 intel-iommu: fix for IOMMU early crash
pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due
to IOMMU code depending on this field.

This patch introduces new field in pci_dev's dev.archdata struct to hold
IOMMU specific per device IOMMU private data.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S
f76aec76ec intel-iommu: optimize sg map/unmap calls
This patch adds PageSelectiveInvalidation support replacing existing
DomainSelectiveInvalidation for intel_{map/unmap}_sg() calls and also
enables to mapping one big contiguous DMA virtual address which is mapped
to discontiguous physical address for SG map/unmap calls.

"Doamin selective invalidations" wipes out the IOMMU address translation
cache based on domain ID where as "Page selective invalidations" wipes out
the IOMMU address translation cache for that address mask range which is
more cache friendly when compared to Domain selective invalidations.

Here is how it is done.
1) changes to iova.c
alloc_iova() now takes a bool size_aligned argument, which
when when set, returns the io virtual address that is
naturally aligned to 2 ^ x, where x is the order
of the size requested.

Returning this io vitual address which is naturally
aligned helps iommu to do the "page selective
invalidations" which is IOMMU cache friendly
over "domain selective invalidations".

2) Changes to driver/pci/intel-iommu.c
Clean up intel_{map/unmap}_{single/sg} () calls so that
s/g map/unamp calls is no more dependent on
intel_{map/unmap}_single()

intel_map_sg() now computes the total DMA virtual address
required and allocates the size aligned total DMA virtual address
and maps the discontiguous physical address to the allocated
contiguous DMA virtual address.

In the intel_unmap_sg() case since the DMA virtual address
is contiguous and size_aligned, PageSelectiveInvalidation
is used replacing earlier DomainSelectiveInvalidations.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Greg KH <greg@kroah.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Suresh B <suresh.b.siddha@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S
49a0429e53 Intel IOMMU: Iommu floppy workaround
This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so
that the floppy device which does not use DMA api's will continue to work.

Once the floppy driver starts using DMA api's this config option can be turn
off or this patch can be yanked out of kernel at that time.

[akpm@linux-foundation.org: cleanups, rename things, build fix]
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S
e820482cd2 Intel IOMMU: Iommu Gfx workaround
When we fix all the opensource gfx drivers to use the DMA api's, at that time
we can yank this config options out.

[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S
3460a6d9ce Intel IOMMU: DMAR fault handling support
MSI interrupt handler registrations and fault handling support for Intel-IOMMU
hadrware.

This patch enables the MSI interrupts for the DMA remapping units and in the
interrupt handler read the fault cause and outputs the same on to the console.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S
7d3b03ce7b Intel IOMMU: Intel iommu cmdline option - forcedac
Introduce intel_iommu=forcedac commandline option.  This option is helpful to
verify the pci device capability of handling physical dma'able address greater
than 4G.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:18 -07:00
Keshavamurthy, Anil S
eb3fa7cb51 Intel IOMMU: Avoid memory allocation failures in dma map api calls
Intel IOMMU driver needs memory during DMA map calls to setup its internal
page tables and for other data structures.  As we all know that these DMA map
calls are mostly called in the interrupt context or with the spinlock held by
the upper level drivers(network/storage drivers), so in order to avoid any
memory allocation failure due to low memory issues, this patch makes memory
allocation by temporarily setting PF_MEMALLOC flags for the current task
before making memory allocation calls.

We evaluated mempools as a backup when kmem_cache_alloc() fails
and found that mempools are really not useful here because
 1) We don't know for sure how much to reserve in advance
 2) And mempools are not useful for GFP_ATOMIC case (as we call
    memory alloc functions with GFP_ATOMIC)

(akpm: point 2 is wrong...)

With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids any
watermark checks before allocating memory thus guarantee'ing the memory till
the last free page.  Further, looking at the code in mm/page_alloc.c in
__alloc_pages() function, looks like this flag is useful only in the
non-interrupt context.

If we are in the interrupt context and memory allocation in IOMMU driver fails
for some reason, then the DMA map api's will return failure and it is up to
the higher level drivers to retry.  Suppose, if upper level driver programs
the controller with the buggy DMA virtual address, the IOMMU will block that
DMA transaction when that happens thus preventing any corruption to main
memory.

So far in our test scenario, we were unable to create any memory allocation
failure inside dma map api calls.

Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-22 08:13:18 -07:00