1
linux/fs/nfs
Neil Horman e9e3d724e2 nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3)
The "bad_page()" page allocator sanity check was reported recently (call
chain as follows):

  bad_page+0x69/0x91
  free_hot_cold_page+0x81/0x144
  skb_release_data+0x5f/0x98
  __kfree_skb+0x11/0x1a
  tcp_ack+0x6a3/0x1868
  tcp_rcv_established+0x7a6/0x8b9
  tcp_v4_do_rcv+0x2a/0x2fa
  tcp_v4_rcv+0x9a2/0x9f6
  do_timer+0x2df/0x52c
  ip_local_deliver+0x19d/0x263
  ip_rcv+0x539/0x57c
  netif_receive_skb+0x470/0x49f
  :virtio_net:virtnet_poll+0x46b/0x5c5
  net_rx_action+0xac/0x1b3
  __do_softirq+0x89/0x133
  call_softirq+0x1c/0x28
  do_softirq+0x2c/0x7d
  do_IRQ+0xec/0xf5
  default_idle+0x0/0x50
  ret_from_intr+0x0/0xa
  default_idle+0x29/0x50
  cpu_idle+0x95/0xb8
  start_kernel+0x220/0x225
  _sinittext+0x22f/0x236

It occurs because an skb with a fraglist was freed from the tcp
retransmit queue when it was acked, but a page on that fraglist had
PG_Slab set (indicating it was allocated from the Slab allocator (which
means the free path above can't safely free it via put_page.

We tracked this back to an nfsv4 setacl operation, in which the nfs code
attempted to fill convert the passed in buffer to an array of pages in
__nfs4_proc_set_acl, which gets used by the skb->frags list in
xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
to a page struct via virt_to_page, but the vfs allocates the buffer via
kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
kmalloc and free it later in the tcp ack path with put_page, so we need
to either:

1) ensure that when we create the list of pages, no page struct has
   PG_Slab set

 or

2) not use a page list to send this data

Given that these buffers can be multiple pages and arbitrarily sized, I
think (1) is the right way to go.  I've written the below patch to
allocate a page from the buddy allocator directly and copy the data over
to it.  This ensures that we have a put_page free-able page for every
entry that winds up on an skb frag list, so it can be safely freed when
the frame is acked.  We do a put page on each entry after the
rpc_call_sync call so as to drop our own reference count to the page,
leaving only the ref count taken by tcp_sendpages.  This way the data
will be properly freed when the ack comes in

Successfully tested by myself to solve the above oops.

Note, as this is the result of a setacl operation that exceeded a page
of data, I think this amounts to a local DOS triggerable by an
uprivlidged user, so I'm CCing security on this as well.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: security@kernel.org
CC: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-03-04 17:28:52 -08:00
..
cache_lib.c
cache_lib.h
callback_proc.c NFS fix cb_sequence error processing 2011-01-25 15:26:51 -05:00
callback_xdr.c NFS do not find client in NFSv4 pg_authenticate 2011-01-25 15:26:51 -05:00
callback.c NFS do not find client in NFSv4 pg_authenticate 2011-01-25 15:26:51 -05:00
callback.h NFS do not find client in NFSv4 pg_authenticate 2011-01-25 15:26:51 -05:00
client.c NFS do not find client in NFSv4 pg_authenticate 2011-01-25 15:26:51 -05:00
delegation.c NFS: Fix an NFS client lockdep issue 2011-01-28 13:37:09 -05:00
delegation.h NFS: Move cl_delegations to the nfs_server struct 2011-01-06 14:57:46 -05:00
dir.c NFS: Use d_automount() rather than abusing follow_link() 2011-01-15 20:07:34 -05:00
direct.c NFS: Fix "kernel BUG at fs/aio.c:554!" 2011-01-25 15:24:47 -05:00
dns_resolve.c sunrpc: use seconds since boot in expiry cache 2010-09-07 19:21:20 -04:00
dns_resolve.h NFS: Use kernel DNS resolver [ver #2] 2010-08-11 17:11:28 +00:00
file.c NFS: Fix fcntl F_GETLK not reporting some conflicts 2010-12-07 19:30:43 -05:00
fscache-index.c
fscache.c NFS: Squelch compiler warning 2010-05-14 15:09:31 -04:00
fscache.h
getroot.c switch nfs to ->s_d_op 2011-01-12 20:02:45 -05:00
idmap.c nfs: fix mispelling of idmap CONFIG symbol 2011-01-04 13:10:39 -05:00
inode.c NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount 2011-01-25 15:28:21 -05:00
internal.h NFS do not find client in NFSv4 pg_authenticate 2011-01-25 15:26:51 -05:00
iostat.h NFS: Squelch compiler warning in nfs_add_server_stats() 2010-05-14 15:09:31 -04:00
Kconfig lockd: push lock_flocks down 2010-10-27 21:39:39 +02:00
Makefile NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure 2010-10-24 18:07:11 -04:00
mount_clnt.c NFS: Remove redundant unlikely() 2010-12-21 11:51:23 -05:00
namespace.c Unexport do_add_mount() and add in follow_automount(), not ->d_automount() 2011-01-15 20:07:48 -05:00
nfs2xdr.c Merge branch 'bugfixes' into nfs-for-2.6.38 2011-01-10 14:48:02 -05:00
nfs3acl.c NFS: Prevent memory allocation failure in nfsacl_encode() 2011-01-25 15:24:47 -05:00
nfs3proc.c NFS: readdir with vmapped pages 2010-10-23 15:27:35 -04:00
nfs3xdr.c NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" 2011-01-25 15:24:47 -05:00
nfs4_fs.h NFS: Move cl_state_owners and related fields to the nfs_server struct 2011-01-06 14:47:57 -05:00
nfs4filelayout.c pnfs: add prefix to struct pnfs_layout_hdr fields 2011-01-06 14:46:31 -05:00
nfs4filelayout.h NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure 2010-10-24 18:07:11 -04:00
nfs4filelayoutdev.c NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). 2011-01-25 15:24:46 -05:00
nfs4namespace.c
nfs4proc.c nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) 2011-03-04 17:28:52 -08:00
nfs4renewd.c NFS: Move cl_delegations to the nfs_server struct 2011-01-06 14:57:46 -05:00
nfs4state.c NFS do not find client in NFSv4 pg_authenticate 2011-01-25 15:26:51 -05:00
nfs4xdr.c NFS: NFSv4 readdir loses entries 2011-01-28 13:41:35 -05:00
nfsroot.c NFS: Fix a compile issue in nfs_root 2010-10-26 13:56:42 -04:00
pagelist.c nfs: Take advantage of kmem_cache_zalloc() in nfs_page_alloc() 2010-12-21 11:51:24 -05:00
pnfs.c NFS improve pnfs_put_deviceid_cache debug print 2011-01-25 15:26:51 -05:00
pnfs.h pnfs: layout roc code 2011-01-06 14:46:32 -05:00
proc.c NFS: Don't leak in nfs_proc_symlink() 2011-01-04 13:10:36 -05:00
read.c nfs: remove extraneous and problematic calls to nfs_clear_request 2010-12-07 23:02:44 -05:00
super.c switch nfs to ->s_d_op 2011-01-12 20:02:45 -05:00
symlink.c
sysctl.c NFS: new idmapper 2010-10-07 18:48:49 -04:00
unlink.c Merge branch 'nfs-for-2.6.38' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 2011-01-11 15:11:56 -08:00
write.c NFS: fix handling of malloc failure during nfs_flush_multi() 2011-01-19 15:37:49 -05:00