Skip to content

Commit cc65a5d

Browse files
Muchun Songjfvogel
Muchun Song
authored andcommitted
mm: memcontrol: use obj_cgroup APIs to charge the LRU pages
We will reuse the obj_cgroup APIs to charge the LRU pages. Finally, page->memcg_data will have 2 different meanings. - For the slab pages, page->memcg_data points to an object cgroups vector. - For the kmem pages (exclude the slab pages) and the LRU pages, page->memcg_data points to an object cgroup. In this patch, we reuse obj_cgroup APIs to charge LRU pages. In the end, The page cache cannot prevent long-living objects from pinning the original memory cgroup in the memory. At the same time we also changed the rules of page and objcg or memcg binding stability. The new rules are as follows. For a page any of the following ensures page and objcg binding stability: - the page lock - LRU isolation - lock_page_memcg() - exclusive reference Based on the stable binding of page and objcg, for a page any of the following ensures page and memcg binding stability: - objcg_lock - cgroup_mutex - the lruvec lock - the split queue lock (only THP page) If the caller only want to ensure that the page counters of memcg are updated correctly, ensure that the binding stability of page and objcg is sufficient. Signed-off-by: Muchun Song <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Acked-by: Roman Gushchin <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Orabug: 37405594 Conflicts: include/linux/memcontrol.h mm/memcontrol.c (Due to presence of following commits in UEK-8: i. 'commit becacb0 mm: memcg: add folio_memcg_check()' ii. 'commit 074e3e2 memcg: convert get_obj_cgroup_from_page to get_obj_cgroup_from_folio' iii. 'commit 1b1e134 mm: memcg: introduce memcontrol-v1.c' Added code in lruvec reparenting logic to account for multigen LRUs being used in UEK-8. Signed-off-by: Imran Khan <[email protected]> Reviewed-by: Kamalesh Babulal <[email protected]>
1 parent 66953c1 commit cc65a5d

File tree

5 files changed

+335
-171
lines changed

5 files changed

+335
-171
lines changed

include/linux/memcontrol.h

Lines changed: 34 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -395,8 +395,6 @@ enum objext_flags {
395395

396396
#ifdef CONFIG_MEMCG
397397

398-
static inline bool folio_memcg_kmem(struct folio *folio);
399-
400398
/*
401399
* After the initialization objcg->memcg is always pointing at
402400
* a valid memcg, but can be atomically swapped to the parent memcg.
@@ -411,43 +409,19 @@ static inline struct mem_cgroup *obj_cgroup_memcg(struct obj_cgroup *objcg)
411409
}
412410

413411
/*
414-
* __folio_memcg - Get the memory cgroup associated with a non-kmem folio
415-
* @folio: Pointer to the folio.
416-
*
417-
* Returns a pointer to the memory cgroup associated with the folio,
418-
* or NULL. This function assumes that the folio is known to have a
419-
* proper memory cgroup pointer. It's not safe to call this function
420-
* against some type of folios, e.g. slab folios or ex-slab folios or
421-
* kmem folios.
422-
*/
423-
static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
424-
{
425-
unsigned long memcg_data = folio->memcg_data;
426-
427-
VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
428-
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
429-
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_KMEM, folio);
430-
431-
return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
432-
}
433-
434-
/*
435-
* __folio_objcg - get the object cgroup associated with a kmem folio.
412+
* folio_objcg - get the object cgroup associated with a folio.
436413
* @folio: Pointer to the folio.
437414
*
438415
* Returns a pointer to the object cgroup associated with the folio,
439416
* or NULL. This function assumes that the folio is known to have a
440-
* proper object cgroup pointer. It's not safe to call this function
441-
* against some type of folios, e.g. slab folios or ex-slab folios or
442-
* LRU folios.
417+
* proper object cgroup pointer.
443418
*/
444-
static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
419+
static inline struct obj_cgroup *folio_objcg(struct folio *folio)
445420
{
446421
unsigned long memcg_data = folio->memcg_data;
447422

448423
VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
449424
VM_BUG_ON_FOLIO(memcg_data & MEMCG_DATA_OBJEXTS, folio);
450-
VM_BUG_ON_FOLIO(!(memcg_data & MEMCG_DATA_KMEM), folio);
451425

452426
return (struct obj_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
453427
}
@@ -461,23 +435,34 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio)
461435
* proper memory cgroup pointer. It's not safe to call this function
462436
* against some type of folios, e.g. slab folios or ex-slab folios.
463437
*
464-
* For a non-kmem folio any of the following ensures folio and memcg binding
465-
* stability:
438+
* For a folio any of the following ensures folio and objcg binding stability:
466439
*
467440
* - the folio lock
468441
* - LRU isolation
469442
* - folio_memcg_lock()
470443
* - exclusive reference
471444
* - mem_cgroup_trylock_pages()
472445
*
446+
* Based on the stable binding of folio and objcg, for a folio any of the
447+
* following ensures folio and memcg binding stability:
448+
*
449+
* - objcg_lock
450+
* - cgroup_mutex
451+
* - the lruvec lock
452+
* - the split queue lock (only THP page)
453+
*
454+
* If the caller only want to ensure that the page counters of memcg are
455+
* updated correctly, ensure that the binding stability of folio and objcg
456+
* is sufficient.
457+
*
473458
* Note: The caller should hold an rcu read lock to protect memcg associated
474459
* with a folio from being released.
475460
*/
476461
static inline struct mem_cgroup *folio_memcg(struct folio *folio)
477462
{
478-
if (folio_memcg_kmem(folio))
479-
return obj_cgroup_memcg(__folio_objcg(folio));
480-
return __folio_memcg(folio);
463+
struct obj_cgroup *objcg = folio_objcg(folio);
464+
465+
return objcg ? obj_cgroup_memcg(objcg) : NULL;
481466
}
482467

483468
/*
@@ -488,9 +473,7 @@ static inline struct mem_cgroup *folio_memcg(struct folio *folio)
488473
*/
489474
static inline bool folio_memcg_charged(struct folio *folio)
490475
{
491-
if (folio_memcg_kmem(folio))
492-
return __folio_objcg(folio) != NULL;
493-
return __folio_memcg(folio) != NULL;
476+
return folio_objcg(folio) != NULL;
494477
}
495478

496479
/*
@@ -503,6 +486,8 @@ static inline bool folio_memcg_charged(struct folio *folio)
503486
* folio is known to have a proper memory cgroup pointer. It's not safe
504487
* to call this function against some type of pages, e.g. slab pages or
505488
* ex-slab pages.
489+
*
490+
* The page and objcg or memcg binding rules can refer to folio_memcg().
506491
*/
507492
static inline struct mem_cgroup *get_mem_cgroup_from_folio(struct folio *folio)
508493
{
@@ -533,23 +518,20 @@ static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
533518
*
534519
* Return: A pointer to the memory cgroup associated with the folio,
535520
* or NULL.
521+
*
522+
* The folio and objcg or memcg binding rules can refer to folio_memcg().
536523
*/
537524
static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
538525
{
539526
unsigned long memcg_data = READ_ONCE(folio->memcg_data);
527+
struct obj_cgroup *objcg;
540528

541529
VM_BUG_ON_FOLIO(folio_test_slab(folio), folio);
542-
543-
if (memcg_data & MEMCG_DATA_KMEM) {
544-
struct obj_cgroup *objcg;
545-
546-
objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
547-
return obj_cgroup_memcg(objcg);
548-
}
549-
550530
WARN_ON_ONCE(!rcu_read_lock_held());
551531

552-
return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
532+
objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
533+
534+
return objcg ? obj_cgroup_memcg(objcg) : NULL;
553535
}
554536

555537
/*
@@ -562,17 +544,10 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio)
562544
* has an associated memory cgroup pointer or an object cgroups vector or
563545
* an object cgroup.
564546
*
565-
* For a non-kmem folio any of the following ensures folio and memcg binding
566-
* stability:
547+
* The page and objcg or memcg binding rules can refer to page_memcg().
567548
*
568-
* - the folio lock
569-
* - LRU isolation
570-
* - lock_folio_memcg()
571-
* - exclusive reference
572-
* - mem_cgroup_trylock_pages()
573-
*
574-
* For a kmem folio a caller should hold an rcu read lock to protect memcg
575-
* associated with a kmem folio from being released.
549+
* A caller should hold an rcu read lock to protect memcg associated with a
550+
* page from being released.
576551
*/
577552
static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
578553
{
@@ -581,18 +556,14 @@ static inline struct mem_cgroup *folio_memcg_check(struct folio *folio)
581556
* for slabs, READ_ONCE() should be used here.
582557
*/
583558
unsigned long memcg_data = READ_ONCE(folio->memcg_data);
559+
struct obj_cgroup *objcg;
584560

585561
if (memcg_data & MEMCG_DATA_OBJEXTS)
586562
return NULL;
587563

588-
if (memcg_data & MEMCG_DATA_KMEM) {
589-
struct obj_cgroup *objcg;
590-
591-
objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
592-
return obj_cgroup_memcg(objcg);
593-
}
564+
objcg = (void *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
594565

595-
return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
566+
return objcg ? obj_cgroup_memcg(objcg) : NULL;
596567
}
597568

598569
static inline struct mem_cgroup *page_memcg_check(struct page *page)

mm/huge_memory.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1040,6 +1040,7 @@ pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
10401040
}
10411041

10421042
#ifdef CONFIG_MEMCG
1043+
10431044
static inline struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
10441045
struct deferred_split *queue)
10451046
{
@@ -1056,6 +1057,39 @@ static inline struct deferred_split *folio_memcg_split_queue(struct folio *folio
10561057

10571058
return memcg ? &memcg->deferred_split_queue : NULL;
10581059
}
1060+
1061+
static void thp_sq_reparent_lock(struct mem_cgroup *src, struct mem_cgroup *dst)
1062+
{
1063+
spin_lock(&src->deferred_split_queue.split_queue_lock);
1064+
spin_lock_nested(&dst->deferred_split_queue.split_queue_lock,
1065+
SINGLE_DEPTH_NESTING);
1066+
}
1067+
1068+
static void thp_sq_reparent_relocate(struct mem_cgroup *src, struct mem_cgroup *dst)
1069+
{
1070+
int nid;
1071+
struct deferred_split *src_queue, *dst_queue;
1072+
1073+
src_queue = &src->deferred_split_queue;
1074+
dst_queue = &dst->deferred_split_queue;
1075+
1076+
if (!src_queue->split_queue_len)
1077+
return;
1078+
1079+
list_splice_tail_init(&src_queue->split_queue, &dst_queue->split_queue);
1080+
dst_queue->split_queue_len += src_queue->split_queue_len;
1081+
src_queue->split_queue_len = 0;
1082+
1083+
for_each_node(nid)
1084+
set_shrinker_bit(dst, nid, deferred_split_shrinker->id);
1085+
}
1086+
1087+
static void thp_sq_reparent_unlock(struct mem_cgroup *src, struct mem_cgroup *dst)
1088+
{
1089+
spin_unlock(&dst->deferred_split_queue.split_queue_lock);
1090+
spin_unlock(&src->deferred_split_queue.split_queue_lock);
1091+
}
1092+
DEFINE_MEMCG_REPARENT_OPS(thp_sq);
10591093
#else
10601094
static inline struct mem_cgroup *folio_split_queue_memcg(struct folio *folio,
10611095
struct deferred_split *queue)

mm/memcontrol-v1.c

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -845,12 +845,14 @@ static int mem_cgroup_move_account(struct folio *folio,
845845
*/
846846
smp_mb();
847847

848-
css_get(&to->css);
849-
css_put(&from->css);
848+
rcu_read_lock();
849+
obj_cgroup_get(rcu_dereference(to->objcg));
850+
obj_cgroup_put(rcu_dereference(from->objcg));
851+
rcu_read_unlock();
850852

851853
/* Warning should never happen, so don't worry about refcount non-0 */
852854
WARN_ON_ONCE(folio_unqueue_deferred_split(folio));
853-
folio->memcg_data = (unsigned long)to;
855+
folio->memcg_data = (unsigned long)rcu_access_pointer(to->objcg);
854856

855857
__folio_memcg_unlock(from);
856858

@@ -1382,6 +1384,43 @@ static void mem_cgroup_move_charge(void)
13821384
walk_page_range(mc.mm, 0, ULONG_MAX, &charge_walk_ops, NULL);
13831385
mmap_read_unlock(mc.mm);
13841386
atomic_dec(&mc.from->moving_account);
1387+
1388+
1389+
/*
1390+
* Moving its pages to another memcg is finished. Wait for already
1391+
* started RCU-only updates to finish to make sure that the caller
1392+
* of lock_page_memcg() can unlock the correct move_lock. The
1393+
* possible bad scenario would like:
1394+
*
1395+
* CPU0: CPU1:
1396+
* mem_cgroup_move_charge()
1397+
* walk_page_range()
1398+
*
1399+
* lock_page_memcg(page)
1400+
* memcg = folio_memcg()
1401+
* spin_lock_irqsave(&memcg->move_lock)
1402+
* memcg->move_lock_task = current
1403+
*
1404+
* atomic_dec(&mc.from->moving_account)
1405+
*
1406+
* mem_cgroup_css_offline()
1407+
* memcg_offline_kmem()
1408+
* memcg_reparent_objcgs() <== reparented
1409+
*
1410+
* unlock_page_memcg(page)
1411+
* memcg = folio_memcg() <== memcg has been changed
1412+
* if (memcg->move_lock_task == current) <== false
1413+
* spin_unlock_irqrestore(&memcg->move_lock)
1414+
*
1415+
* Once mem_cgroup_move_charge() returns (it means that the cgroup_mutex
1416+
* would be released soon), the page can be reparented to its parent
1417+
* memcg. When the unlock_page_memcg() is called for the page, we will
1418+
* miss unlock the move_lock. So using synchronize_rcu to wait for
1419+
* already started RCU-only updates to finish before this function
1420+
* returns (mem_cgroup_move_charge() and mem_cgroup_css_offline() are
1421+
* serialized by cgroup_mutex).
1422+
*/
1423+
synchronize_rcu();
13851424
}
13861425

13871426
void memcg1_move_task(void)

0 commit comments

Comments
 (0)