summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-clientLinus Torvalds2016-12-091-10/+14
|\ | | | | | | | | | | | | | | | | | | | | | | Pull ceph fix from Ilya Dryomov: "A fix for an issue with ->d_revalidate() in ceph, causing frequent kernel crashes. Marked for stable - it goes back to 4.6, but started popping up only in 4.8" * tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-client: ceph: don't set req->r_locked_dir in ceph_d_revalidate
| * ceph: don't set req->r_locked_dir in ceph_d_revalidateJeff Layton2016-12-081-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function sets req->r_locked_dir which is supposed to indicate to ceph_fill_trace that the parent's i_rwsem is locked for write. Unfortunately, there is no guarantee that the dir will be locked when d_revalidate is called, so we really don't want ceph_fill_trace to do any dcache manipulation from this context. Clear req->r_locked_dir since it's clearly not safe to do that. What we really want to know with d_revalidate is whether the dentry still points to the same inode. ceph_fill_trace installs a pointer to the inode in req->r_target_inode, so we can just compare that to d_inode(dentry) to see if it's the same one after the lookup. Also, since we aren't generally interested in the parent here, we can switch to using a GETATTR to hint that to the MDS, which also means that we only need to reserve one cap. Finally, just remove the d_unhashed check. That's really outside the purview of a filesystem's d_revalidate. If the thing became unhashed while we're checking it, then that's up to the VFS to handle anyway. Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry") Link: http://tracker.ceph.com/issues/18041 Reported-by: Donatas Abraitis <donatas.abraitis@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | fuse: fix clearing suid, sgid for chown()Miklos Szeredi2016-12-061-5/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | Basically, the pjdfstests set the ownership of a file to 06555, and then chowns it (as root) to a new uid/gid. Prior to commit a09f99eddef4 ("fuse: fix killing s[ug]id in setattr"), fuse would send down a setattr with both the uid/gid change and a new mode. Now, it just sends down the uid/gid change. Technically this is NOTABUG, since POSIX doesn't _require_ that we clear these bits for a privileged process, but Linux (wisely) has done that and I think we don't want to change that behavior here. This is caused by the use of should_remove_suid(), which will always return 0 when the process has CAP_FSETID. In fact we really don't need to be calling should_remove_suid() at all, since we've already been indicated that we should remove the suid, we just don't want to use a (very) stale mode for that. This patch should fix the above as well as simplify the logic. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: a09f99eddef4 ("fuse: fix killing s[ug]id in setattr") Cc: <stable@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com>
* Merge branch 'overlayfs-linus' of ↵Linus Torvalds2016-12-011-3/+3
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fix from Miklos Szeredi: "This fixes a regression introduced in 4.8" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix d_real() for stacked fs
| * ovl: fix d_real() for stacked fsMiklos Szeredi2016-11-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handling of recursion in d_real() is completely broken. Recursion is only done in the 'inode != NULL' case. But when opening the file we have 'inode == NULL' hence d_real() will return an overlay dentry. This won't work since overlayfs doesn't define its own file operations, so all file ops will fail. Fix by doing the recursion first and the check against the inode second. Bash script to reproduce the issue written by Quentin: - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - tmpdir=$(mktemp -d) pushd ${tmpdir} mkdir -p {upper,lower,work} echo -n 'rocks' > lower/ksplice mount -t overlay level_zero upper -o lowerdir=lower,upperdir=upper,workdir=work cat upper/ksplice tmpdir2=$(mktemp -d) pushd ${tmpdir2} mkdir -p {upper,work} mount -t overlay level_one upper -o lowerdir=${tmpdir}/upper,upperdir=upper,workdir=work ls -l upper/ksplice cat upper/ksplice - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 2d902671ce1c ("vfs: merge .d_select_inode() into .d_real()") Cc: <stable@vger.kernel.org> # v4.8+
* | isofs: add KERN_CONT to printing of ER recordsMike Rapoport2016-11-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | The ER records are printed without explicit log level presuming line continuation until "\n". After the commit 4bcc595ccd8 (printk: reinstate KERN_CONT for printing continuation lines), the ER records are printed a character per line. Adding KERN_CONT to appropriate printk statements restores the printout behavior. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | CIFS: iterate over posix acl xattr entry correctly in ACL_to_cifs_posix()Eryu Guan2016-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2211d5ba5c6c ("posix_acl: xattr representation cleanups") removes the typedefs and the zero-length a_entries array in struct posix_acl_xattr_header, and uses bare struct posix_acl_xattr_header and struct posix_acl_xattr_entry directly. But it failed to iterate over posix acl slots when converting posix acls to CIFS format, which results in several test failures in xfstests (generic/053 generic/105) when testing against a samba v1 server, starting from v4.9-rc1 kernel. e.g. [root@localhost xfstests]# diff -u tests/generic/105.out /root/xfstests/results//generic/105.out.bad --- tests/generic/105.out 2016-09-19 16:33:28.577962575 +0800 +++ /root/xfstests/results//generic/105.out.bad 2016-10-22 15:41:15.201931110 +0800 @@ -1,3 +1,4 @@ QA output created by 105 -rw-r--r-- root +setfacl: subdir: Invalid argument -rw-r--r-- root Fix it by introducing a new "ace" var, like what cifs_copy_posix_acl() does, and iterating posix acl xattr entries over it in the for loop. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
* | Call echo service immediately after socket reconnectSachin Prabhu2016-11-281-7/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect long after socket reconnect") changes the behaviour of the SMB2 echo service and causes it to renegotiate after a socket reconnect. However under default settings, the echo service could take up to 120 seconds to be scheduled. The patch forces the echo service to be called immediately resulting a negotiate call being made immediately on reconnect. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
* | CIFS: Fix BUG() in calc_seckey()Sachin Prabhu2016-11-281-3/+8
| | | | | | | | | | | | | | | | | | | | | | Andy Lutromirski's new virtually mapped kernel stack allocations moves kernel stacks the vmalloc area. This triggers the bug kernel BUG at ./include/linux/scatterlist.h:140! at calc_seckey()->sg_init() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
* | fix default_file_splice_read()Al Viro2016-11-261-1/+2
| | | | | | | | | | | | | | | | | | | | Botched calculation of number of pages. As the result, we were dropping pieces when doing splice to pipe from e.g. 9p. Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2016-11-234-13/+35
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Anna Schumaker: "Most of these fix regressions or races, but there is one patch for stable that Arnd sent me Stable bugfix: - Hide array-bounds warning Bugfixes: - Keep a reference on lock states while checking - Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state - Don't call close if the open stateid has already been cleared - Fix CLOSE rases with OPEN - Fix a regression in DELEGRETURN" * tag 'nfs-for-4.9-4' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.x: hide array-bounds warning NFSv4.1: Keep a reference on lock states while checking NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_state NFSv4: Don't call close if the open stateid has already been cleared NFSv4: Fix CLOSE races with OPEN NFSv4.1: Fix a regression in DELEGRETURN
| * | NFSv4.x: hide array-bounds warningArnd Bergmann2016-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A correct bugfix introduced a harmless warning that shows up with gcc-7: fs/nfs/callback.c: In function 'nfs_callback_up': fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds] What happens here is that the 'minorversion == 0' check tells the compiler that we assume minorversion can be something other than 0, but when CONFIG_NFS_V4_1 is disabled that would be invalid and result in an out-of-bounds access. The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this really can't happen, which makes the code slightly smaller and also avoids the warning. The bugfix that introduced the warning is marked for stable backports, we want this one backported to the same releases. Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net") Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.1: Keep a reference on lock states while checkingBenjamin Coddington2016-11-211-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | While walking the list of lock_states, keep a reference on each nfs4_lock_state to be checked, otherwise the lock state could be removed while the check performs TEST_STATEID and possible FREE_STATEID. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.1: Handle NFS4ERR_OLD_STATEID in nfs4_reclaim_open_stateBenjamin Coddington2016-11-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we're doing TEST_STATEID in nfs4_reclaim_open_state(), we can have a NFS4ERR_OLD_STATEID returned from nfs41_open_expired() . Instead of marking state recovery as failed, mark the state for recovery again. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: Don't call close if the open stateid has already been clearedTrond Myklebust2016-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Ensure we test to see if the open stateid is actually set, before we send a CLOSE. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4: Fix CLOSE races with OPENTrond Myklebust2016-11-182-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the reply to a successful CLOSE call races with an OPEN to the same file, we can end up scribbling over the stateid that represents the new open state. The race looks like: Client Server ====== ====== CLOSE stateid A on file "foo" CLOSE stateid A, return stateid C OPEN file "foo" OPEN "foo", return stateid B Receive reply to OPEN Reset open state for "foo" Associate stateid B to "foo" Receive CLOSE for A Reset open state for "foo" Replace stateid B with C The fix is to examine the argument of the CLOSE, and check for a match with the current stateid "other" field. If the two do not match, then the above race occurred, and we should just ignore the CLOSE. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | NFSv4.1: Fix a regression in DELEGRETURNTrond Myklebust2016-11-181-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want to call nfs4_free_revoked_stateid() in the case where the delegreturn was successful. Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | Merge tag 'ext4_for_stable' of ↵Linus Torvalds2016-11-194-36/+51
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "A security fix (so a maliciously corrupted file system image won't panic the kernel) and some fixes for CONFIG_VMAP_STACK" * tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: sanity check the block and cluster size at mount time fscrypto: don't use on-stack buffer for key derivation fscrypto: don't use on-stack buffer for filename encryption
| * | | ext4: sanity check the block and cluster size at mount timeTheodore Ts'o2016-11-192-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | | fscrypto: don't use on-stack buffer for key derivationEric Biggers2016-11-191-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. get_crypt_info() was using a stack buffer to hold the output from the encryption operation used to derive the per-file key. Fix it by using a heap buffer. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | fscrypto: don't use on-stack buffer for filename encryptionEric Biggers2016-11-191-32/+21
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. For short filenames, fname_encrypt() was encrypting a stack buffer holding the padded filename. Fix it by encrypting the filename in-place in the output buffer, thereby making the temporary buffer unnecessary. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2016-11-171-8/+14
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of regression fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix iov_iter_advance() for ITER_PIPE xattr: Fix setting security xattrs on sockfs
| * | | xattr: Fix setting security xattrs on sockfsAndreas Gruenbacher2016-11-171-8/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IOP_XATTR flag is set on sockfs because sockfs supports getting the "system.sockprotoname" xattr. Since commit 6c6ef9f2, this flag is checked for setxattr support as well. This is wrong on sockfs because security xattr support there is supposed to be provided by security_inode_setsecurity. The smack security module relies on socket labels (xattrs). Fix this by adding a security xattr handler on sockfs that returns -EAGAIN, and by checking for -EAGAIN in setxattr. We cannot simply check for -EOPNOTSUPP in setxattr because there are filesystems that neither have direct security xattr support nor support via security_inode_setsecurity. A more proper fix might be to move the call to security_inode_setsecurity into sockfs, but it's not clear to me if that is safe: we would end up calling security_inode_post_setxattr after that as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge tag 'for-linus-4.9-rc5-ofs-1' of ↵Linus Torvalds2016-11-171-0/+2
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs fix from Mike Marshall: "orangefs: add .owner to debugfs file_operations Without ".owner = THIS_MODULE" it is possible to crash the kernel by unloading the Orangefs module while someone is reading debugfs files" * tag 'for-linus-4.9-rc5-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: add .owner to debugfs file_operations
| * | | orangefs: add .owner to debugfs file_operationsMike Marshall2016-11-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without ".owner = THIS_MODULE" it is possible to crash the kernel by unloading the Orangefs module while someone is reading debugfs files. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-11-164-1/+14
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "A regression fix and bug fix bound for stable" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: fix fuse_write_end() if zero bytes were copied fuse: fix root dentry initialization
| * | | | fuse: fix fuse_write_end() if zero bytes were copiedMiklos Szeredi2016-11-151-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If pos is at the beginning of a page and copied is zero then page is not zeroed but is marked uptodate. Fix by skipping everything except unlock/put of page if zero bytes were copied. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: 6b12c1b37e55 ("fuse: Implement write_begin/write_end callbacks") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
| * | | | fuse: fix root dentry initializationMiklos Szeredi2016-10-183-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing dentry initialization to root dentry. Fixes: f75fdf22b0a8 ("fuse: don't use ->d_time") Reported-by: Andreas Reis <andreas.reis@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
* | | | | Merge branch 'akpm' (patches from Andrew)Linus Torvalds2016-11-112-1/+4
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge misc fixes from Andrew Morton: "15 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib/stackdepot: export save/fetch stack for drivers mm: kmemleak: scan .data.ro_after_init memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB coredump: fix unfreezable coredumping task mm/filemap: don't allow partially uptodate page for pipes mm/hugetlb: fix huge page reservation leak in private mapping error paths ocfs2: fix not enough credit panic Revert "console: don't prefer first registered if DT specifies stdout-path" mm: hwpoison: fix thp split handling in memory_failure() swapfile: fix memory corruption via malformed swapfile mm/cma.c: check the max limit for cma allocation scripts/bloat-o-meter: fix SIGPIPE shmem: fix pageflags after swapping DMA32 object mm, frontswap: make sure allocated frontswap map is assigned mm: remove extra newline from allocation stall warning
| * | | | | coredump: fix unfreezable coredumping taskAndrey Ryabinin2016-11-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It could be not possible to freeze coredumping task when it waits for 'core_state->startup' completion, because threads are frozen in get_signal() before they got a chance to complete 'core_state->startup'. Inability to freeze a task during suspend will cause suspend to fail. Also CRIU uses cgroup freezer during dump operation. So with an unfreezable task the CRIU dump will fail because it waits for a transition from 'FREEZING' to 'FROZEN' state which will never happen. Use freezer_do_not_count() to tell freezer to ignore coredumping task while it waits for core_state->startup completion. Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | ocfs2: fix not enough credit panicJunxiao Bi2016-11-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following panic was caught when run ocfs2 disconfig single test (block size 512 and cluster size 8192). ocfs2_journal_dirty() return -ENOSPC, that means credits were used up. The total credit should include 3 times of "num_dx_leaves" from ocfs2_dx_dir_rebalance(), because 2 times will be consumed in ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() -> ocfs2_dx_dir_format_cluster(). But only two times is included in ocfs2_dx_dir_rebalance_credits(), fix it. This can cause read-only fs(v4.1+) or panic for mainline linux depending on mount option. ------------[ cut here ]------------ kernel BUG at fs/ocfs2/journal.c:775! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016 task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000 RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2] RSP: 0018:ffff8800a7d4b6d8 EFLAGS: 00010286 RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9 RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460 RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460 R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070 FS: 00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0 Call Trace: ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2] ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2] ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2] ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2] ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2] ocfs2_mknod+0x5a2/0x1400 [ocfs2] ocfs2_create+0x73/0x180 [ocfs2] vfs_create+0xd8/0x100 lookup_open+0x185/0x1c0 do_last+0x36d/0x780 path_openat+0x92/0x470 do_filp_open+0x4a/0xa0 do_sys_open+0x11a/0x230 SyS_open+0x1e/0x20 system_call_fastpath+0x12/0x71 Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 RIP ocfs2_journal_dirty+0xa7/0xb0 [ocfs2] ---[ end trace 91ac5312a6ee1288 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: http://lkml.kernel.org/r/1478248135-31963-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mfasheh@versity.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2016-11-113-105/+109
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS fixes from Al Viro: "Christoph's and Jan's aio fixes, fixup for generic_file_splice_read (removal of pointless detritus that actually breaks it when used for gfs2 ->splice_read()) and fixup for generic_file_read_iter() interaction with ITER_PIPE destinations." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: splice: remove detritus from generic_file_splice_read() mm/filemap: don't allow partially uptodate page for pipes aio: fix freeze protection of aio writes fs: remove aio_run_iocb fs: remove the never implemented aio_fsync file operation aio: hold an extra file reference over AIO read/write operations
| * | | | | | splice: remove detritus from generic_file_splice_read()Al Viro2016-11-101-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i_size check is a leftover from the horrors that used to play with the page cache in that function. With the switch to ->read_iter(), it's neither needed nor correct - for gfs2 it ends up being buggy, since i_size is not guaranteed to be correct until later (inside ->read_iter()). Spotted-by: Abhi Das <adas@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | aio: fix freeze protection of aio writesJan Kara2016-10-301-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we dropped freeze protection of aio writes just after IO was submitted. Thus aio write could be in flight while the filesystem was frozen and that could result in unexpected situation like aio completion wanting to convert extent type on frozen filesystem. Testcase from Dmitry triggering this is like: for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite Fix the problem by dropping freeze protection only once IO is completed in aio_complete(). Reported-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> [hch: forward ported on top of various VFS and aio changes] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | fs: remove aio_run_iocbChristoph Hellwig2016-10-301-88/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the ABI iocb structure to aio_setup_rw and let it handle the non-vectored I/O case as well. With that and a new helper for the AIO return value handling we can now define new aio_read and aio_write helpers that implement reads and writes in a self-contained way without duplicating too much code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | fs: remove the never implemented aio_fsync file operationChristoph Hellwig2016-10-302-16/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | aio: hold an extra file reference over AIO read/write operationsChristoph Hellwig2016-10-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we might dereference an already freed file and/or inode when aio_complete is called before we return from the read_iter or write_iter method. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | | Merge tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-clientLinus Torvalds2016-11-111-1/+0
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Ceph fixes from Ilya Dryomov: "Ceph's ->read_iter() implementation is incompatible with the new generic_file_splice_read() code that went into -rc1. Switch to the less efficient default_file_splice_read() for now; the proper fix is being held for 4.10. We also have a fix for a 4.8 regression and a trival libceph fixup" * tag 'ceph-for-4.9-rc5' of git://github.com/ceph/ceph-client: libceph: initialize last_linger_id with a large integer libceph: fix legacy layout decode with pool 0 ceph: use default file splice read callback
| * | | | | | | ceph: use default file splice read callbackYan, Zheng2016-11-101-1/+0
| | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Splice read/write implementation changed recently. When using generic_file_splice_read(), iov_iter with type == ITER_PIPE is passed to filesystem's read_iter callback. But ceph_sync_read() can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter expects pages from page cache). Fixing ceph_sync_read() requires a big patch. So use default splice read callback for now. Signed-off-by: Yan, Zheng <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | | | | | | Merge tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2016-11-114-7/+12
|\ \ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Anna Schumaker: "Most of these fix regressions in 4.9, and none are going to stable this time around. Bugfixes: - Trim extra slashes in v4 nfs_paths to fix tools that use this - Fix a -Wmaybe-uninitialized warnings - Fix suspicious RCU usages - Fix Oops when mounting multiple servers at once - Suppress a false-positive pNFS error - Fix a DMAR failure in NFS over RDMA" * tag 'nfs-for-4.9-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use() NFS: Don't print a pNFS error if we aren't using pNFS NFS: Ignore connections that have cl_rpcclient uninitialized SUNRPC: Fix suspicious RCU usage NFSv4.1: work around -Wmaybe-uninitialized warning NFS: Trim extra slash in v4 nfs_path
| * | | | | | fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()Shuah Khan2016-11-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following warn: fs/nfs/nfs4session.c: In function ‘nfs4_slot_seqid_in_use’: fs/nfs/nfs4session.c:203:54: warning: ‘cur_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (nfs4_slot_get_seqid(tbl, slotid, &cur_seq) == 0 && ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ cur_seq == seq_nr && test_bit(slotid, tbl->used_slots)) ~~~~~~~~~~~~~~~~~ Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | | NFS: Don't print a pNFS error if we aren't using pNFSAnna Schumaker2016-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to check for a valid layout type id before verifying pNFS flags as an indicator for if we are using pNFS. This changed in 3132e49ece with the introduction of multiple layout types, since now we are passing an array of ids instead of just one. Since then, users have been seeing a KERN_ERR printk show up whenever mounting NFS v4 without pNFS. This patch restores the original behavior of exiting set_pnfs_layoutdriver() early if we aren't using pNFS. Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo structure") Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | | NFS: Ignore connections that have cl_rpcclient uninitializedPetr Vandrovec2016-11-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cl_rpcclient starts as ERR_PTR(-EINVAL), and connections like that are floating freely through the system. Most places check whether pointer is valid before dereferencing it, but newly added code in nfs_match_client does not. Which causes crashes when more than one NFS mount point is present. Signed-off-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | | NFSv4.1: work around -Wmaybe-uninitialized warningArnd Bergmann2016-10-241-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use if we enable -Wmaybe-uninitialized again: fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized] gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair results in a nonzero return value here. Using PTR_ERR_OR_ZERO() instead makes this clear to the compiler. The warning originally did not appear in v4.8 as it was globally disabled, but the bugfix that introduced the warning got backported to stable kernels which again enable it, and this is now the only warning in the v4.7 builds. Fixes: e09c978aae5b ("NFSv4.1: Fix Oopsable condition in server callback races") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * | | | | | NFS: Trim extra slash in v4 nfs_pathBenjamin Coddington2016-10-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A NFSv4 mount of a subdirectory will show an extra slash (as in 'server://path') in proc's mountinfo which will not match the device name and path. This can cause problems for programs searching for the mount. Fix this by checking for a leading slash in the dentry path, if so trim away any trailing slashes in the device name. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | | | | | Merge tag 'xfs-fixes-for-linus-4.9-rc5' of ↵Linus Torvalds2016-11-111-12/+5
|\ \ \ \ \ \ \ | |_|_|_|/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fix from Dave Chinner: "This is a fix for an unmount hang (regression) when the filesystem is shutdown. It was supposed to go to you for -rc3, but I accidentally tagged the commit prior to it in that pullreq. Summary: - fix for aborting deferred transactions on filesystem shutdown" * tag 'xfs-fixes-for-linus-4.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: defer should abort intent items if the trans roll fails
| * | | | | | xfs: defer should abort intent items if the trans roll failsDarrick J. Wong2016-10-241-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the deferred ops transaction roll fails, we need to abort the intent items if we haven't already logged a done item for it, regardless of whether or not the deferred ops has had a transaction committed. Dave found this while running generic/388. Move the tracepoint to make it easier to track object lifetimes. Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | | | | | | Merge tag 'for-linus-4.9-rc4-ofs-1' of ↵Linus Torvalds2016-11-092-85/+68
|\ \ \ \ \ \ \ | |_|_|/ / / / |/| | | | | / | | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs fix from Mike Marshall: "We recently refactored the Orangefs debugfs code. The refactor seemed to trigger dan.carpenter@oracle.com's static tester to find a possible double-free in the code. While designing the fix we saw a condition under which the buffer being freed could also be overflowed. We also realized how to rebuild the related debugfs file's "contents" (a string) without deleting and re-creating the file. This fix should eliminate the possible double-free, the potential overflow and improve code readability" * tag 'for-linus-4.9-rc4-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: clean up debugfs
| * | | | | orangefs: clean up debugfsMike Marshall2016-11-072-85/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We recently refactored the Orangefs debugfs code. The refactor seemed to trigger dan.carpenter@oracle.com's static tester to find a possible double-free in the code. While designing the fix we saw a condition under which the buffer being freed could also be overflowed. We also realized how to rebuild the related debugfs file's "contents" (a string) without deleting and re-creating the file. This fix should eliminate the possible double-free, the potential overflow and improve code readability. Signed-off-by: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Martin Brandenburg <martin@omnibond.com>
* | | | | | Merge tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2016-11-042-19/+24
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd bugfixes from Bruce Fields: "Fixes for some recent regressions including fallout from the vmalloc'd stack change (after which we can no longer encrypt stuff on the stack)" * tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linux: nfsd: Fix general protection fault in release_lock_stateid() svcrdma: backchannel cannot share a page for send and rcv buffers sunrpc: fix some missing rq_rbuffer assignments sunrpc: don't pass on-stack memory to sg_set_buf nfsd: move blocked lock handling under a dedicated spinlock