summaryrefslogtreecommitdiffstats
path: root/block/blk-softirq.c
Commit message (Collapse)AuthorAgeFilesLines
* smp: Remove wait argument from __smp_call_function_single()Frederic Weisbecker2014-02-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main point of calling __smp_call_function_single() is to send an IPI in a pure asynchronous way. By embedding a csd in an object, a caller can send the IPI without waiting for a previous one to complete as is required by smp_call_function_single() for example. As such, sending this kind of IPI can be safe even when irqs are disabled. This flexibility comes at the expense of the caller who then needs to synchronize the csd lifecycle by himself and make sure that IPIs on a single csd are serialized. This is how __smp_call_function_single() works when wait = 0 and this usecase is relevant. Now there don't seem to be any usecase with wait = 1 that can't be covered by smp_call_function_single() instead, which is safer. Lets look at the two possible scenario: 1) The user calls __smp_call_function_single(wait = 1) on a csd embedded in an object. It looks like a nice and convenient pattern at the first sight because we can then retrieve the object from the IPI handler easily. But actually it is a waste of memory space in the object since the csd can be allocated from the stack by smp_call_function_single(wait = 1) and the object can be passed an the IPI argument. Besides that, embedding the csd in an object is more error prone because the caller must take care of the serialization of the IPIs for this csd. 2) The user calls __smp_call_function_single(wait = 1) on a csd that is allocated on the stack. It's ok but smp_call_function_single() can do it as well and it already takes care of the allocation on the stack. Again it's more simple and less error prone. Therefore, using the underscore prepend API version with wait = 1 is a bad pattern and a sign that the caller can do safer and more simple. There was a single user of that which has just been converted. So lets remove this option to discourage further users. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* block: Stop abusing rq->csd.list in blk-softirqJan Kara2014-02-241-6/+11
| | | | | | | | | | | | | | | | Abusing rq->csd.list for a list of requests to complete is rather ugly. We use rq->queuelist instead which is much cleaner. It is safe because queuelist is used by the block layer only for requests waiting to be submitted to a device. Thus it is unused when irq reports the request IO is finished. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* kernel: remove CONFIG_USE_GENERIC_SMP_HELPERSChristoph Hellwig2013-11-151-2/+2
| | | | | | | | | | | We've switched over every architecture that supports SMP to it, so remove the new useless config variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block: Replace __get_cpu_var usesChristoph Lameter2013-11-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. At the end of the patch set all uses of __get_cpu_var have been removed so the macro is removed too. The patch set includes passes over all arches as well. Once these operations are used throughout then specialized macros can be defined in non -x86 arches as well in order to optimize per cpu access by f.e. using a global register that may be set to the per cpu base. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to this_cpu_inc(y) Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: delete __cpuinit usage from all block filesPaul Gortmaker2013-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. This removes all the drivers/block uses of the __cpuinit macros from all C files. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* sched, block: Unify cache detectionPeter Zijlstra2012-01-271-8/+8
| | | | | | | | | | | The block layer has some code trying to determine if two CPUs share a cache, the scheduler has a similar function. Expose the function used by the scheduler and make the block layer use it, thereby removing the block layers usage of CONFIG_SCHED* and topology bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <axboe@kernel.dk> Link: http://lkml.kernel.org/r/1327579450.2446.95.camel@twins
* block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_requestTao Ma2011-09-141-1/+1
| | | | | | | | | | | In __blk_complete_request, we check both QUEUE_FLAG_SAME_COMP and req->cpu to decide whether we should use req->cpu. Actually the user can also select the complete cpu by either setting BIO_CPU_AFFINE or by calling bio_set_completion_cpu. Current solution makes these 2 ways don't work any more. So we'd better just check req->cpu. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: improve rq_affinity placementShaohua Li2011-08-111-3/+13
| | | | | | | | | | | | | | | | | | | | | This patch reverts commit 35ae66e0a09ab70ed(block: Make rq_affinity = 1 work as expected). The purpose is to avoid an unnecessary IPI. Let's take an example. My test box has cpu 0-7, one socket. Say request is added from CPU 1, blk_complete_request() occurs at CPU 7. Without the reverted patch, softirq will be done at CPU 7. With it, an IPI will be directed to CPU 0, and softirq will be done at CPU 0. In this case, doing softirq at CPU 0 and CPU 7 have no difference from cache sharing point view and we can avoid an ipi if doing it in CPU 7. An immediate concern is this is just like QUEUE_FLAG_SAME_FORCE, but actually not. blk_complete_request() is running in interrupt handler, and currently I/O controller doesn't support multiple interrupts (I checked several LSI cards and AHCI), so only one CPU can run blk_complete_request(). This is still quite different as QUEUE_FLAG_SAME_FORCE. Since only one CPU runs softirq, the only difference with below patch is softirq not always runs at the first CPU of a group. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: Make rq_affinity = 1 work as expectedTao Ma2011-08-051-5/+3
| | | | | | | | | | | | | | | | | | | | | Commit 5757a6d76c introduced a new rq_affinity = 2 so as to make the request completed in the __make_request cpu. But it makes the old rq_affinity = 1 not work any more. The root cause is that if the 'cpu' and 'req->cpu' is in the same group and cpu != req->cpu, ccpu will be the same as group_cpu, so the completion will be excuted in the 'cpu' not 'group_cpu'. This patch fix problem by simpling removing group_cpu and the codes are more explicit now. If ccpu == cpu, we complete in cpu, otherwise we raise_blk_irq to ccpu. Cc: Christoph Hellwig <hch@infradead.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Reviewed-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* block: strict rq_affinityDan Williams2011-07-231-4/+7
| | | | | | | | | | | | | | | | | | | | | Some systems benefit from completions always being steered to the strict requester cpu rather than the looser "per-socket" steering that blk_cpu_to_group() attempts by default. This is because the first CPU in the group mask ends up being completely overloaded with work, while the others (including the original submitter) has power left to spare. Allow the strict mode to be set by writing '2' to the sysfs control file. This is identical to the scheme used for the nomerges file, where '2' is a more aggressive setting than just being turned on. echo 2 > /sys/block/<bdev>/queue/rq_affinity Cc: Christoph Hellwig <hch@infradead.org> Cc: Roland Dreier <roland@purestorage.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* generic-ipi: remove CSD_FLAG_WAITPeter Zijlstra2009-02-251-1/+1
| | | | | | | | | | | | | | Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework the code so that we can use CSD_FLAG_LOCK for both purposes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* block: make blk_softirq_init() staticRoel Kluin2008-12-291-1/+1
| | | | | | | Sparse asked whether these could be static. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: add fault injection mechanism for faking request timeoutsJens Axboe2008-10-091-0/+2
| | | | | | | | Only works for the generic request timer handling. Allows one to sporadically ignore request completions, thus exercising the timeout handling. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: unify request timeout handlingJens Axboe2008-10-091-12/+18
| | | | | | | | | | | | | Right now SCSI and others do their own command timeout handling. Move those bits to the block layer. Instead of having a timer per command, we try to be a bit more clever and simply have one per-queue. This avoids the overhead of having to tear down and setup a timer for each command, so it will result in a lot less timer fiddling. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: add support for IO CPU affinityJens Axboe2008-10-091-31/+95
| | | | | | | | | | | | | | | | | | This patch adds support for controlling the IO completion CPU of either all requests on a queue, or on a per-request basis. We export a sysfs variable (rq_affinity) which, if set, migrates completions of requests to the CPU that originally submitted it. A bio helper (bio_set_completion_cpu()) is also added, so that queuers can ask for completion on that specific CPU. In testing, this has been show to cut the system time by as much as 20-40% on synthetic workloads where CPU affinity is desired. This requires a little help from the architecture, so it'll only work as designed for archs that are using the new generic smp helper infrastructure. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* block: split softirq handling into blk-softirq.cJens Axboe2008-10-091-0/+103
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>