Skip to content
This repository was archived by the owner on Oct 31, 2024. It is now read-only.

Commit 5cd56f6

Browse files
firelzrdxanmod
authored andcommitted
mm/vmscan: Add sysctl knobs for protecting the working set [le9uo-1.5]
The kernel does not provide a way to protect the working set under memory pressure. A certain amount of anonymous and clean file pages is required by the userspace for normal operation. First of all, the userspace needs a cache of shared libraries and executable binaries. If the amount of the clean file pages falls below a certain level, then thrashing and even livelock can take place. The patch provides sysctl knobs for protecting the working set (anonymous and clean file pages) under memory pressure. == Multi-Gen LRU compatibility == le9uo 1.3 and above comes with a long-waited Multi-Gen LRU (MGLRU, orlru_gen) compatibility. It comes with the working set protection features like it has to the traditional LRU. Please be aware that there is an MGLRU-specific limitation. At the latest Linux kernel (version 6.7.5 at the time this is written), Multi-gen LRU lacks the ability to comply with the vm.swappiness sysctl knob like it was initially designed. Almost regardless of what value is put in vm.swappiness (as long as greater than 0), it seems to evict whatever it finds first. This behavior is coming from MGLRU's page-scanner design/implementation, and it causes to start to thrash much earlier and easier than the traditional LRU. MGLRU does rather temporal approach called min_ttl, but this design has another problem; it's much more difficult to estimate each system's optimal effective value than traditional LRU + le9's spacial approach, and when the value is out of the effective range, it easily results either in too early invocation of OOM killer, or thrashing. le9uo does not fix this issue, but greatly mitigates it so that these limitations due to MGLRU's design/implementation isn't a problem anymore. [1] https://github.com/firelzrd/le9uo/blob/main/le9uo_patches/stable/0001-linux6.6-le9uo-1.5.patch Signed-off-by: Alexandre Frade <kernel@xanmod.org>
1 parent ea0ae0b commit 5cd56f6

6 files changed

Lines changed: 329 additions & 7 deletions

File tree

Documentation/admin-guide/sysctl/vm.rst

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ files can be found in mm/swap.c.
2525
Currently, these files are in /proc/sys/vm:
2626

2727
- admin_reserve_kbytes
28+
- anon_min_ratio
29+
- clean_low_ratio
30+
- clean_min_ratio
2831
- compact_memory
2932
- compaction_proactiveness
3033
- compact_unevictable_allowed
@@ -106,6 +109,67 @@ On x86_64 this is about 128MB.
106109
Changing this takes effect whenever an application requests memory.
107110

108111

112+
anon_min_ratio
113+
==============
114+
115+
This knob provides *hard* protection of anonymous pages. The anonymous pages
116+
on the current node won't be reclaimed under any conditions when their amount
117+
is below vm.anon_min_ratio.
118+
119+
This knob may be used to prevent excessive swap thrashing when anonymous
120+
memory is low (for example, when memory is going to be overfilled by
121+
compressed data of zram module).
122+
123+
Setting this value too high (close to 100) can result in inability to
124+
swap and can lead to early OOM under memory pressure.
125+
126+
The unit of measurement is the percentage of the total memory of the node.
127+
128+
The default value is 15.
129+
130+
131+
clean_low_ratio
132+
================
133+
134+
This knob provides *best-effort* protection of clean file pages. The file pages
135+
on the current node won't be reclaimed under memory pressure when the amount of
136+
clean file pages is below vm.clean_low_ratio *unless* we threaten to OOM.
137+
138+
Protection of clean file pages using this knob may be used when swapping is
139+
still possible to
140+
- prevent disk I/O thrashing under memory pressure;
141+
- improve performance in disk cache-bound tasks under memory pressure.
142+
143+
Setting it to a high value may result in a early eviction of anonymous pages
144+
into the swap space by attempting to hold the protected amount of clean file
145+
pages in memory.
146+
147+
The unit of measurement is the percentage of the total memory of the node.
148+
149+
The default value is 0.
150+
151+
152+
clean_min_ratio
153+
================
154+
155+
This knob provides *hard* protection of clean file pages. The file pages on the
156+
current node won't be reclaimed under memory pressure when the amount of clean
157+
file pages is below vm.clean_min_ratio.
158+
159+
Hard protection of clean file pages using this knob may be used to
160+
- prevent disk I/O thrashing under memory pressure even with no free swap space;
161+
- improve performance in disk cache-bound tasks under memory pressure;
162+
- avoid high latency and prevent livelock in near-OOM conditions.
163+
164+
Setting it to a high value may result in a early out-of-memory condition due to
165+
the inability to reclaim the protected amount of clean file pages when other
166+
types of pages cannot be reclaimed.
167+
168+
The unit of measurement is the percentage of the total memory of the node.
169+
170+
The default value is 15.
171+
172+
109173
compact_memory
110174
==============
111175

@@ -910,6 +974,14 @@ be 133 (x + 2x = 200, 2x = 133.33).
910974
At 0, the kernel will not initiate swap until the amount of free and
911975
file-backed pages is less than the high watermark in a zone.
912976

977+
This knob has no effect if the amount of clean file pages on the current
978+
node is below vm.clean_low_ratio or vm.clean_min_ratio. In this case,
979+
only anonymous pages can be reclaimed.
980+
981+
If the number of anonymous pages on the current node is below
982+
vm.anon_min_ratio, then only file pages can be reclaimed with
983+
any vm.swappiness value.
984+
913985

914986
unprivileged_userfaultfd
915987
========================

include/linux/mm.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,14 @@ static inline void __mm_zero_struct_page(struct page *page)
195195

196196
extern int sysctl_max_map_count;
197197

198+
extern bool sysctl_workingset_protection;
199+
extern u8 sysctl_anon_min_ratio;
200+
extern u8 sysctl_clean_low_ratio;
201+
extern u8 sysctl_clean_min_ratio;
202+
int vm_workingset_protection_update_handler(
203+
struct ctl_table *table, int write,
204+
void __user *buffer, size_t *lenp, loff_t *ppos);
205+
198206
extern unsigned long sysctl_user_reserve_kbytes;
199207
extern unsigned long sysctl_admin_reserve_kbytes;
200208

kernel/sysctl.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2227,6 +2227,40 @@ static struct ctl_table vm_table[] = {
22272227
.extra1 = SYSCTL_ZERO,
22282228
},
22292229
#endif
2230+
{
2231+
.procname = "workingset_protection",
2232+
.data = &sysctl_workingset_protection,
2233+
.maxlen = sizeof(bool),
2234+
.mode = 0644,
2235+
.proc_handler = &proc_dobool,
2236+
},
2237+
{
2238+
.procname = "anon_min_ratio",
2239+
.data = &sysctl_anon_min_ratio,
2240+
.maxlen = sizeof(u8),
2241+
.mode = 0644,
2242+
.proc_handler = &vm_workingset_protection_update_handler,
2243+
.extra1 = SYSCTL_ZERO,
2244+
.extra2 = SYSCTL_ONE_HUNDRED,
2245+
},
2246+
{
2247+
.procname = "clean_low_ratio",
2248+
.data = &sysctl_clean_low_ratio,
2249+
.maxlen = sizeof(u8),
2250+
.mode = 0644,
2251+
.proc_handler = &vm_workingset_protection_update_handler,
2252+
.extra1 = SYSCTL_ZERO,
2253+
.extra2 = SYSCTL_ONE_HUNDRED,
2254+
},
2255+
{
2256+
.procname = "clean_min_ratio",
2257+
.data = &sysctl_clean_min_ratio,
2258+
.maxlen = sizeof(u8),
2259+
.mode = 0644,
2260+
.proc_handler = &vm_workingset_protection_update_handler,
2261+
.extra1 = SYSCTL_ZERO,
2262+
.extra2 = SYSCTL_ONE_HUNDRED,
2263+
},
22302264
{
22312265
.procname = "user_reserve_kbytes",
22322266
.data = &sysctl_user_reserve_kbytes,

mm/Kconfig

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,6 +486,69 @@ config ARCH_WANT_OPTIMIZE_DAX_VMEMMAP
486486
config ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
487487
bool
488488

489+
config ANON_MIN_RATIO
490+
int "Default value for vm.anon_min_ratio"
491+
depends on SYSCTL
492+
range 0 100
493+
default 15
494+
help
495+
This option sets the default value for vm.anon_min_ratio sysctl knob.
496+
497+
The vm.anon_min_ratio sysctl knob provides *hard* protection of
498+
anonymous pages. The anonymous pages on the current node won't be
499+
reclaimed under any conditions when their amount is below
500+
vm.anon_min_ratio. This knob may be used to prevent excessive swap
501+
thrashing when anonymous memory is low (for example, when memory is
502+
going to be overfilled by compressed data of zram module).
503+
504+
Setting this value too high (close to MemTotal) can result in
505+
inability to swap and can lead to early OOM under memory pressure.
506+
507+
config CLEAN_LOW_RATIO
508+
int "Default value for vm.clean_low_ratio"
509+
depends on SYSCTL
510+
range 0 100
511+
default 0
512+
help
513+
This option sets the default value for vm.clean_low_ratio sysctl knob.
514+
515+
The vm.clean_low_ratio sysctl knob provides *best-effort*
516+
protection of clean file pages. The file pages on the current node
517+
won't be reclaimed under memory pressure when the amount of clean file
518+
pages is below vm.clean_low_ratio *unless* we threaten to OOM.
519+
Protection of clean file pages using this knob may be used when
520+
swapping is still possible to
521+
- prevent disk I/O thrashing under memory pressure;
522+
- improve performance in disk cache-bound tasks under memory
523+
pressure.
524+
525+
Setting it to a high value may result in a early eviction of anonymous
526+
pages into the swap space by attempting to hold the protected amount
527+
of clean file pages in memory.
528+
529+
config CLEAN_MIN_RATIO
530+
int "Default value for vm.clean_min_ratio"
531+
depends on SYSCTL
532+
range 0 100
533+
default 15
534+
help
535+
This option sets the default value for vm.clean_min_ratio sysctl knob.
536+
537+
The vm.clean_min_ratio sysctl knob provides *hard* protection of
538+
clean file pages. The file pages on the current node won't be
539+
reclaimed under memory pressure when the amount of clean file pages is
540+
below vm.clean_min_ratio. Hard protection of clean file pages using
541+
this knob may be used to
542+
- prevent disk I/O thrashing under memory pressure even with no free
543+
swap space;
544+
- improve performance in disk cache-bound tasks under memory
545+
pressure;
546+
- avoid high latency and prevent livelock in near-OOM conditions.
547+
548+
Setting it to a high value may result in a early out-of-memory condition
549+
due to the inability to reclaim the protected amount of clean file pages
550+
when other types of pages cannot be reclaimed.
551+
489552
config HAVE_MEMBLOCK_PHYS_MAP
490553
bool
491554

mm/mm_init.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2749,6 +2749,7 @@ static void __init mem_init_print_info(void)
27492749
, K(totalhigh_pages())
27502750
#endif
27512751
);
2752+
printk(KERN_INFO "le9 Unofficial (le9uo) working set protection 1.5 by Masahito Suzuki (forked from hakavlad's original le9 patch)");
27522753
}
27532754

27542755
/*

0 commit comments

Comments
 (0)