memcg: also test for skip accounting at the page allocation level

The memory we used to hold the memcg arrays is currently accounted to the current memcg. But that creates a problem, because that memory can only be freed after the last user is gone. Our only way to know which is the last user, is to hook up to freeing time, but the fact that we still have some in flight kmallocs will prevent freeing to happen. I believe therefore to be just easier to account this memory as global overhead. This patch (of 2): Disabling accounting is only relevant for some specific memcg internal allocations. Therefore we would initially not have such check at memcg_kmem_newpage_charge, since direct calls to the page allocator that are marked with GFP_KMEMCG only happen outside memcg core. We are mostly concerned with cache allocations and by having this test at memcg_kmem_get_cache we are already able to relay the allocation to the root cache and bypass the memcg caches altogether. There is one exception, though: the SLUB allocator does not create large order caches, but rather service large kmallocs directly from the page allocator. Therefore, the following sequence, when backed by the SLUB allocator: memcg_stop_kmem_account(); kmalloc(<large_number>) memcg_resume_kmem_account(); would effectively ignore the fact that we should skip accounting, since it will drive us directly to this function without passing through the cache selector memcg_kmem_get_cache. Such large allocations are extremely rare but can happen, for instance, for the cache arrays. This was never a problem in practice, because we weren't skipping accounting for the cache arrays. All the allocations we were skipping were fairly small. However, the fact that we were not skipping those allocations are a problem and can prevent the memcgs from going away. As we fix that, we need to make sure that the fix will also work with the SLUB allocator. Signed-off-by: Glauber Costa <glommer@openvz.org> Reported-by: Michal Hocko <mhocko@suze.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Glauber Costa <glommer@gmail.com> 2013-07-08 16:00:00 -0700
committer: Linus Torvalds <torvalds@linux-foundation.org> 2013-07-09 10:33:21 -0700
commit: 6d42c232bd1e77288b2660153299b7d12a5c8e15 (patch)
tree: 80ead75d5c1f5a569ac476dec17c7150e1b303c6 /mm
parent: d157a55815ffff48caec311dfb543ce8a79e283e (diff)
download: linux-0-day-6d42c232bd1e77288b2660153299b7d12a5c8e15.tar.gz
linux-0-day-6d42c232bd1e77288b2660153299b7d12a5c8e15.tar.xz
1 files changed, 28 insertions, 0 deletions
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2b7cd24d4cdaf..06a595fd64004 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3637,6 +3637,34 @@ __memcg_kmem_newpage_charge(gfp_t gfp, struct mem_cgroup **_memcg, int order)
 	int ret;
 
 	*_memcg = NULL;
+
+	/*
+	 * Disabling accounting is only relevant for some specific memcg
+	 * internal allocations. Therefore we would initially not have such
+	 * check here, since direct calls to the page allocator that are marked
+	 * with GFP_KMEMCG only happen outside memcg core. We are mostly
+	 * concerned with cache allocations, and by having this test at
+	 * memcg_kmem_get_cache, we are already able to relay the allocation to
+	 * the root cache and bypass the memcg cache altogether.
+	 *
+	 * There is one exception, though: the SLUB allocator does not create
+	 * large order caches, but rather service large kmallocs directly from
+	 * the page allocator. Therefore, the following sequence when backed by
+	 * the SLUB allocator:
+	 *
+	 * 	memcg_stop_kmem_account();
+	 * 	kmalloc(<large_number>)
+	 * 	memcg_resume_kmem_account();
+	 *
+	 * would effectively ignore the fact that we should skip accounting,
+	 * since it will drive us directly to this function without passing
+	 * through the cache selector memcg_kmem_get_cache. Such large
+	 * allocations are extremely rare but can happen, for instance, for the
+	 * cache arrays. We bring this test here.
+	 */
+	if (!current->mm || current->memcg_kmem_skip_account)
+		return true;
+
 	memcg = try_get_mem_cgroup_from_mm(current->mm);
 
 	/*
author	Glauber Costa <glommer@gmail.com>	2013-07-08 16:00:00 -0700
committer	Linus Torvalds <torvalds@linux-foundation.org>	2013-07-09 10:33:21 -0700
commit	6d42c232bd1e77288b2660153299b7d12a5c8e15 (patch)
tree	80ead75d5c1f5a569ac476dec17c7150e1b303c6 /mm
parent	d157a55815ffff48caec311dfb543ce8a79e283e (diff)
download	linux-0-day-6d42c232bd1e77288b2660153299b7d12a5c8e15.tar.gz linux-0-day-6d42c232bd1e77288b2660153299b7d12a5c8e15.tar.xz