1
linux/fs
Al Viro 678379e1d4 close_range(): fix the logics in descriptor table trimming
Cloning a descriptor table picks the size that would cover all currently
opened files.  That's fine for clone() and unshare(), but for close_range()
there's an additional twist - we clone before we close, and it would be
a shame to have
	close_range(3, ~0U, CLOSE_RANGE_UNSHARE)
leave us with a huge descriptor table when we are not going to keep
anything past stderr, just because some large file descriptor used to
be open before our call has taken it out.

Unfortunately, it had been dealt with in an inherently racy way -
sane_fdtable_size() gets a "don't copy anything past that" argument
(passed via unshare_fd() and dup_fd()), close_range() decides how much
should be trimmed and passes that to unshare_fd().

The problem is, a range that used to extend to the end of descriptor
table back when close_range() had looked at it might very well have stuff
grown after it by the time dup_fd() has allocated a new files_struct
and started to figure out the capacity of fdtable to be attached to that.

That leads to interesting pathological cases; at the very least it's a
QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes
a snapshot of descriptor table one might have observed at some point.
Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination
of unshare(CLONE_FILES) with plain close_range(), ending up with a
weird state that would never occur with unshare(2) is confusing, to put
it mildly.

It's not hard to get rid of - all it takes is passing both ends of the
range down to sane_fdtable_size().  There we are under ->files_lock,
so the race is trivially avoided.

So we do the following:
	* switch close_files() from calling unshare_fd() to calling
dup_fd().
	* undo the calling convention change done to unshare_fd() in
60997c3d45 "close_range: add CLOSE_RANGE_UNSHARE"
	* introduce struct fd_range, pass a pointer to that to dup_fd()
and sane_fdtable_size() instead of "trim everything past that point"
they are currently getting.  NULL means "we are not going to be punching
any holes"; NR_OPEN_MAX is gone.
	* make sane_fdtable_size() use find_last_bit() instead of
open-coding it; it's easier to follow that way.
	* while we are at it, have dup_fd() report errors by returning
ERR_PTR(), no need to use a separate int *errorp argument.

Fixes: 60997c3d45 "close_range: add CLOSE_RANGE_UNSHARE"
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-09-29 21:52:29 -04:00
..
9p netfs: Speed up buffered reading 2024-09-12 12:20:41 +02:00
adfs
affs affs-for-6.12-tag 2024-09-16 13:07:59 +02:00
afs netfs: Speed up buffered reading 2024-09-12 12:20:41 +02:00
autofs autofs: add per dentry expire timeout 2024-08-30 08:22:36 +02:00
bcachefs bcachefs fixes for 6.11-rc1 2024-09-29 09:17:44 -07:00
befs
bfs
btrfs for-6.12-tag 2024-09-23 11:49:02 -07:00
cachefiles cachefiles, netfs: Fix write to partial block at EOF 2024-09-12 12:20:41 +02:00
ceph Three CephFS fixes from Xiubo and Luis and a bunch of assorted 2024-09-28 08:40:36 -07:00
coda coda: use param->file for FSCONFIG_SET_FD 2024-08-19 13:45:03 +02:00
configfs
cramfs
crypto
debugfs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
devpts
dlm [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
ecryptfs
efivarfs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
efs
erofs erofs: reject inodes with negative i_size 2024-09-12 23:00:09 +08:00
exfat exfat: resolve memory leak from exfat_create_upcase_table() 2024-09-23 21:38:13 +09:00
exportfs
ext2 vfs-6.12.file 2024-09-16 09:14:02 +02:00
ext4 struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
f2fs f2fs-6.12-rc1 2024-09-24 15:12:38 -07:00
fat
freevxfs
fuse [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
gfs2 gfs2 changes 2024-09-23 11:55:17 -07:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs
iomap vfs-6.12.blocksize 2024-09-20 17:53:17 -07:00
isofs isofs: Annotate struct SL_component with __counted_by() 2024-09-02 15:52:56 +02:00
jbd2 jbd2: remove unneeded check of ret in jbd2_fc_get_buf 2024-08-26 23:49:15 -04:00
jffs2 jffs2: Use a folio in jffs2_garbage_collect_dnode() 2024-08-19 13:40:00 +02:00
jfs A few fixes for jfs 2024-09-19 06:38:43 +02:00
kernfs
lockd sunrpc: allow svc threads to fail initialisation cleanly 2024-09-20 19:31:03 -04:00
minix
netfs netfs: Fix write oops in generic/346 (9p) and generic/074 (cifs) 2024-09-26 17:45:20 -05:00
nfs NFS Client Updates for Linux 6.12 2024-09-24 15:44:18 -07:00
nfs_common nfs: add LOCALIO support 2024-09-23 15:03:30 -04:00
nfsd nfsd: implement server support for NFS_LOCALIO_PROGRAM 2024-09-23 15:03:30 -04:00
nilfs2 Many singleton patches - please see the various changelogs for details. 2024-09-21 08:20:50 -07:00
nls
notify struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
ntfs3
ocfs2 ocfs2: fix uninit-value in ocfs2_get_block() 2024-09-26 14:01:45 -07:00
omfs
openpromfs
orangefs orangefs: Constify struct kobj_type 2024-09-20 19:34:00 -07:00
overlayfs ovl: fix file leak in ovl_real_fdget_meta() 2024-09-27 12:38:47 -07:00
proc Summary 2024-09-24 11:08:40 -07:00
pstore drm next for 6.12-rc1 2024-09-19 10:18:15 +02:00
qnx4
qnx6
quota \n 2024-09-23 10:49:28 -07:00
ramfs
reiserfs
romfs romfs: fix romfs_read_folio() 2024-08-21 22:32:58 +02:00
smb 5 smb3 server fixes 2024-09-28 08:35:21 -07:00
squashfs Many singleton patches - please see the various changelogs for details. 2024-09-21 08:20:50 -07:00
sysfs
sysv
tests
tracefs eventfs: Use list_del_rcu() for SRCU protected list variable 2024-09-05 10:18:48 -04:00
ubifs [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
udf vfs-6.12.file 2024-09-16 09:14:02 +02:00
ufs vfs-6.12.file 2024-09-16 09:14:02 +02:00
unicode
vboxsf
verity fsverity: expose verified fsverity built-in signatures to LSMs 2024-08-20 14:03:18 -04:00
xfs struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
zonefs iomap: add a private argument for iomap_file_buffered_write 2024-09-03 15:01:23 +02:00
aio.c fs/aio: Fix __percpu annotation of *cpu pointer in struct kioctx 2024-08-19 13:45:03 +02:00
anon_inodes.c
attr.c nfsd-6.11 fixes: 2024-08-29 06:20:44 +12:00
backing-file.c backing-file: convert to using fops->splice_write 2024-08-23 13:08:31 +02:00
bad_inode.c
binfmt_elf_fdpic.c binfmt_elf_fdpic: fix AUXV size calculation when ELF_HWCAP2 is defined 2024-08-26 13:00:38 -07:00
binfmt_elf.c Revert "binfmt_elf, coredump: Log the reason of the failed core dumps" 2024-09-26 11:39:02 -07:00
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
bpf_fs_kfuncs.c
buffer.c vfs-6.12.folio 2024-09-16 08:54:30 +02:00
char_dev.c
compat_binfmt_elf.c
coredump.c Revert "binfmt_elf, coredump: Log the reason of the failed core dumps" 2024-09-26 11:39:02 -07:00
d_path.c
dax.c
dcache.c vfs-6.12.misc 2024-09-16 08:35:09 +02:00
direct-io.c fs/direct-io: Remove linux/prefetch.h include 2024-08-19 13:45:02 +02:00
drop_caches.c
eventfd.c
eventpoll.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
exec.c ALong with the usual shower of singleton patches, notable patch series in 2024-09-21 07:29:05 -07:00
fcntl.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
fhandle.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
file_table.c slab updates for 6.12 2024-09-18 08:53:53 +02:00
file.c close_range(): fix the logics in descriptor table trimming 2024-09-29 21:52:29 -04:00
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c inode: port __I_SYNC to var event 2024-08-30 08:22:39 +02:00
fsopen.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
init.c
inode.c bcachefs changes for 6.12-rc1 2024-09-23 10:05:41 -07:00
internal.h file: reclaim 24 bytes from f_owner 2024-08-28 13:05:39 +02:00
ioctl.c
Kconfig NFS Client Updates for Linux 6.12 2024-09-24 15:44:18 -07:00
Kconfig.binfmt
kernel_read_file.c
libfs.c vfs-6.12.folio 2024-09-16 08:54:30 +02:00
locks.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
Makefile
mbcache.c
mnt_idmapping.c fuse update for 6.12 2024-09-24 15:29:42 -07:00
mount.h vfs-6.12.mount 2024-09-16 11:15:26 +02:00
mpage.c
namei.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
namespace.c fuse update for 6.12 2024-09-24 15:29:42 -07:00
nsfs.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
open.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
pidfs.c
pipe.c [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
pnode.c
pnode.h
posix_acl.c fs: Use in_group_or_capable() helper to simplify the code 2024-08-30 08:22:37 +02:00
proc_namespace.c
read_write.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
readdir.c
remap_range.c
select.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
seq_file.c
signalfd.c struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
splice.c
stack.c
stat.c
statfs.c
super.c vfs-6.12.misc 2024-09-16 08:35:09 +02:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c mm/hugetlb: remove hugetlb_follow_page_mask() leftover 2024-09-01 20:25:57 -07:00
utimes.c
xattr.c