summaryrefslogtreecommitdiffstats
path: root/fs/namespace.c
Commit message (Collapse)AuthorAgeFilesLines
* fs: fix overflow in sys_mount() for in-kernel callsVegard Nossum2009-09-241-30/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | sys_mount() reads/copies a whole page for its "type" parameter. When do_mount_root() passes a kernel address that points to an object which is smaller than a whole page, copy_mount_options() will happily go past this memory object, possibly dereferencing "wild" pointers that could be in any state (hence the kmemcheck warning, which shows that parts of the next page are not even allocated). (The likelihood of something going wrong here is pretty low -- first of all this only applies to kernel calls to sys_mount(), which are mostly found in the boot code. Secondly, I guess if the page was not mapped, exact_copy_from_user() _would_ in fact handle it correctly because of its access_ok(), etc. checks.) But it is much nicer to avoid the dubious reads altogether, by stopping as soon as we find a NUL byte. Is there a good reason why we can't do something like this, using the already existing strndup_from_user()? [akpm@linux-foundation.org: make copy_mount_string() static] [AV: fix compat mount breakage, which involves undoing akpm's change above] Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: al <al@dizzy.pdmi.ras.ru>
* vfs: mnt_want_write_file(): fix special file handlingOGAWA Hirofumi2009-08-071-1/+2
| | | | | | | | | | | | | I suspect that mnt_want_write_file() may have wrong assumption. I think mnt_want_write_file() is assuming it increments ->mnt_writers if (file->f_mode & FMODE_WRITE). But, if it's special_file(), it is false? Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Acked-by: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* headers: mnt_namespace.h reduxAlexey Dobriyan2009-07-081-0/+1
| | | | | | | | | | | | | Fix various silly problems wrt mnt_namespace.h: - exit_mnt_ns() isn't used, remove it - done that, sched.h and nsproxy.h inclusions aren't needed - mount.h inclusion was need for vfsmount_lock, but no longer - remove mnt_namespace.h inclusion from files which don't use anything from mnt_namespace.h Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ... and the same for vfsmount id/mount group idAl Viro2009-06-241-4/+22
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Switch init_mount_tree() to use the new create_mnt_ns() helperTrond Myklebust2009-06-241-9/+2
| | | | | | | Eliminates some duplicated code... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Add VFS helper functions for setting up private namespacesTrond Myklebust2009-06-221-8/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of this patch is to improve the remote mount path lookup support for distributed filesystems such as the NFSv4 client. When given a mount command of the form "mount server:/foo/bar /mnt", the NFSv4 client is required to look up the filehandle for "server:/", and then look up each component of the remote mount path "foo/bar" in order to find the directory that is actually going to be mounted on /mnt. Following that remote mount path may involve following symlinks, crossing server-side mount points and even following referrals to filesystem volumes on other servers. Since the standard VFS path lookup code already supports walking paths that contain all these features (using in-kernel automounts for following referrals) we would like to be able to reuse that rather than duplicate the full path traversal functionality in the NFSv4 client code. This patch therefore defines a VFS helper function create_mnt_ns(), that sets up a temporary filesystem namespace and attaches a root filesystem to it. It exports the create_mnt_ns() and put_mnt_ns() function for use by filesystem modules. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* VFS: Uninline the function put_mnt_ns()Trond Myklebust2009-06-221-2/+6
| | | | | | | In order to allow modules to use it without having to export vfsmount_lock. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Push BKL down into do_remount_sb()Al Viro2009-06-111-8/+2
| | | | | | [folded fix from Jiri Slaby] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Push BKL down beyond VFS-only parts of do_mount()Al Viro2009-06-111-3/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Push BKL into do_mount()Al Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* dcache: extrace and use d_unlinked()Alexey Dobriyan2009-06-111-4/+4
| | | | | | | | | d_unlinked() will be used in middle-term to ban checkpointing when opened but unlinked file is detected, and in long term, to detect such situation and special case on it. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: introduce mnt_clone_writenpiggin@suse.de2009-06-111-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up lmbench lat_mmap test by about another 2% after the first patch. Before: avg = 462.286 std = 5.46106 After: avg = 453.12 std = 9.58257 (50 runs of each, stddev gives a reasonable confidence) It does this by introducing mnt_clone_write, which avoids some heavyweight operations of mnt_want_write if called on a vfsmount which we know already has a write count; and mnt_want_write_file, which can call mnt_clone_write if the file is open for write. After these two patches, mnt_want_write and mnt_drop_write go from 7% on the profile down to 1.3% (including mnt_clone_write). [AV: mnt_want_write_file() should take file alone and derive mnt from it; not only all callers have that form, but that's the only mnt about which we know that it's already held for write if file is opened for write] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: mnt_want_write speedupnpiggin@suse.de2009-06-111-177/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch speeds up lmbench lat_mmap test by about 8%. lat_mmap is set up basically to mmap a 64MB file on tmpfs, fault in its pages, then unmap it. A microbenchmark yes, but it exercises some important paths in the mm. Before: avg = 501.9 std = 14.7773 After: avg = 462.286 std = 5.46106 (50 runs of each, stddev gives a reasonable confidence, but there is quite a bit of variation there still) It does this by removing the complex per-cpu locking and counter-cache and replaces it with a percpu counter in struct vfsmount. This makes the code much simpler, and avoids spinlocks (although the msync is still pretty costly, unfortunately). It results in about 900 bytes smaller code too. It does increase the size of a vfsmount, however. It should also give a speedup on large systems if CPUs are frequently operating on different mounts (because the existing scheme has to operate on an atomic in the struct vfsmount when switching between mounts). But I'm most interested in the single threaded path performance for the moment. [AV: minor cleanup] Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch lookup_mnt()Al Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch follow_down()Al Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Switch collect_mounts() to struct pathAl Viro2009-06-111-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't bother with check_mnt() in do_add_mount() on shrinkable onesAl Viro2009-06-111-1/+1
| | | | | | | | These guys are what we add as submounts; checks for "is that attached in our namespace" are simply irrelevant for those and counterproductive for use of private vfsmount trees a-la what NFS folks want. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Fix races around the access to ->s_optionsAl Viro2009-05-091-3/+18
| | | | | | | | | Put generic_show_options read access to s_options under rcu_read_lock, split save_mount_options() into "we are setting it the first time" (uses in foo_fill_super()) and "we are relacing and freeing the old one", synchronize_rcu() before kfree() in the latter. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: umount_begin BKL pushdownAlessio Igor Bogani2009-05-091-2/+0
| | | | | | | Push BKL down into ->umount_begin() Signed-off-by: Alessio Igor Bogani <abogani@texware.it> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Touch all affected namespaces on propagation of mountAl Viro2009-04-201-1/+1
| | | | | | | We shouldn't just touch the namespace of current process Caught-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Don't set relatime when noatime is specifiedAndi Kleen2009-04-191-2/+3
| | | | | | | | | | | | | | | | | Since commit 0a1c01c9477602ee8b44548a9405b2c1d587b5a2 ("Make relatime default") when a file system is mounted explicitely with noatime it gets both the MNT_RELATIME and MNT_NOATIME bits set. This shows up like this in /proc/mounts: /dev/xxx /yyy ext3 rw,noatime,relatime,errors=continue,data=writeback 0 0 That looks strange. The VFS uses noatime in this case, but both flags are set. So it's more a cosmetic issue, but still better to fix. Cc: mjg@redhat.com Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Get rid of indirect include of fs_struct.hAl Viro2009-03-311-0/+1
| | | | | | | | Don't pull it in sched.h; very few files actually need it and those can include directly. sched.h itself only needs forward declaration of struct fs_struct; Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Take fs_struct handling to new file (fs/fs_struct.c)Al Viro2009-03-311-68/+0
| | | | | | | | | | Pure code move; two new helper functions for nfsd and daemonize (unshare_fs_struct() and daemonize_fs_struct() resp.; for now - the same code as used to be in callers). unshare_fs_struct() exported (for nfsd, as copy_fs_struct()/exit_fs() used to be), copy_fs_struct() and exit_fs() don't need exports anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Get rid of bumping fs_struct refcount in pivot_root(2)Al Viro2009-03-311-9/+17
| | | | | | | | Not because execve races with _that_ are serious - we really need a situation when final drop of fs_struct refcount is done by something that used to have it as current->fs. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2009-03-271-2/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...
| * vfs: simple_set_mnt() should return voidSukadev Bhattiprolu2009-03-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Make relatime defaultMatthew Garrett2009-03-261-2/+3
| | | | | | | | | | | | | | | | | | Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Add a strictatime mount optionMatthew Garrett2009-03-261-1/+5
|/ | | | | | | | | Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix incomplete __mntput lockingAl Viro2009-02-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Getting this wrong caused WARNING: at fs/namespace.c:636 mntput_no_expire+0xac/0xf2() due to optimistically checking cpu_writer->mnt outside the spinlock. Here's what we really want: * we know that nobody will set cpu_writer->mnt to mnt from now on * all changes to that sucker are done under cpu_writer->lock * we want the laziest equivalent of spin_lock(&cpu_writer->lock); if (likely(cpu_writer->mnt != mnt)) { spin_unlock(&cpu_writer->lock); continue; } /* do stuff */ that would make sure we won't miss earlier setting of ->mnt done by another CPU. Anyway, for now we just move the spin_lock() earlier and move the test into the properly locked region. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reported-and-tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [CVE-2009-0029] System call wrappers part 14Heiko Carstens2009-01-141-2/+2
| | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* [CVE-2009-0029] System call wrappers part 10Heiko Carstens2009-01-141-5/+4
| | | | Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
* fs/namespace.c: drop code after returnJulia Lawall2008-12-311-1/+1
| | | | | | | | | | | The extra semicolon serves no purpose. Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Richard Genoud <richard.genoud@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'master' into nextJames Morris2008-11-141-2/+2
|\ | | | | | | | | | | | | | | | | | | | | Conflicts: security/keys/internal.h security/keys/process_keys.c security/keys/request_key.c Fixed conflicts above by using the non 'tsk' versions. Signed-off-by: James Morris <jmorris@namei.org>
| * vfs: fix shrink_submountsEric W. Biederman2008-11-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the last refactoring of shrink_submounts a variable was not completely renamed. So finish the renaming of mnt to m now. Without this if you attempt to mount an nfs mount that has both automatic nfs sub mounts on it, and has normal mounts on it. The unmount will succeed when it should not. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@ZenIV.linux.org.uk Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | CRED: Wrap task credential accesses in the filesystem subsystemDavid Howells2008-11-141-1/+1
|/ | | | | | | | | | | | | | | | | Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: James Morris <jmorris@namei.org>
* [RFC PATCH] touch_mnt_namespace when the mount flags changeDan Williams2008-10-231-1/+6
| | | | | | | | | | Daemons that need to be launched while the rootfs is read-only can now poll /proc/mounts to be notified when their O_RDWR requests may no longer end in EROFS. Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* [PATCH] no need for noinline stuff in fs/namespace.c anymoreAl Viro2008-10-231-12/+5
| | | | | | | Stack footprint from hell had been due to many struct nameidata in there. No more. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] finally get rid of nameidata in namespace.cAl Viro2008-10-231-60/+59
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] pass struct path * to do_add_mount()Al Viro2008-08-011-8/+8
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] sanitize __user_walk_fd() et.al.Al Viro2008-07-261-38/+36
| | | | | | | | | | | | | * do not pass nameidata; struct path is all the callers want. * switch to new helpers: user_path_at(dfd, pathname, flags, &path) user_path(pathname, &path) user_lpath(pathname, &path) user_path_dir(pathname, &path) (fail if not a directory) The last 3 are trivial macro wrappers for the first one. * remove nameidata in callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] vfs: use kstrdup() and check failing allocationLi Zefan2008-07-261-11/+13
| | | | | | | | | | - use kstrdup() instead of kmalloc() + memcpy() - return NULL if allocating ->mnt_devname failed - mnt_devname should be const Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] kill altrootAl Viro2008-07-261-7/+1
| | | | | | long overdue... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Use WARN() in fs/Arjan van de Ven2008-07-261-2/+1
| | | | | | | | | Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* LSM/SELinux: show LSM mount options in /proc/mountsEric Paris2008-07-141-3/+11
| | | | | | | | | | | This patch causes SELinux mount options to show up in /proc/mounts. As with other code in the area seq_put errors are ignored. Other LSM's will not have their mount options displayed until they fill in their own security_sb_show_options() function. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: James Morris <jmorris@namei.org>
* fs: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-04-301-2/+2
| | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vfs: remove lives_below_in_same_fs()Jan Blunck2008-04-291-12/+1
| | | | | | | | | | | | Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is providing the same functionality. Signed-off-by: Jan Blunck <jblunck@suse.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* quota: remove superfluous DQUOT_OFF() in fs/namespace.cJan Kara2008-04-281-2/+0
| | | | | | | | | We don't need to turn quotas off before remounting root ro, because do_remount_sb() already handles this. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] restore sane ->umount_begin() APIAl Viro2008-04-251-4/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 7/7] vfs: mountinfo: show dominating group idMiklos Szeredi2008-04-231-2/+7
| | | | | | | | Show peer group ID of nearest dominating group that has intersection with the mount's namespace. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfoRam Pai2008-04-231-23/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [mszeredi@suse.cz] rewrite and split big patch into managable chunks /proc/mounts in its current form lacks important information: - propagation state - root of mount for bind mounts - the st_dev value used within the filesystem - identifier for each mount and it's parent It also suffers from the following problems: - not easily extendable - ambiguity of mountpoints within a chrooted environment - doesn't distinguish between filesystem dependent and independent options - doesn't distinguish between per mount and per super block options This patch introduces /proc/<pid>/mountinfo which attempts to address all these deficiencies. Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is extracted into separate functions. Thanks to Al Viro for the help in getting the design right. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>