Skip to content

Commit 2492268

Browse files
Christoph LameterLinus Torvalds
authored andcommitted
SLUB: change error reporting format to follow lockdep loosely
Changes the error reporting format to loosely follow lockdep. If data corruption is detected then we generate the following lines: ============================================ BUG <slab-cache>: <problem> -------------------------------------------- INFO: <more information> [possibly multiple times] <object dump> FIX <slab-cache>: <remedial action> This also adds some more intelligence to the data corruption detection. Its now capable of figuring out the start and end. Add a comment on how to configure SLUB so that a production system may continue to operate even though occasional slab corruption occur through a misbehaving kernel component. See "Emergency operations" in Documentation/vm/slub.txt. [[email protected]: build fix] Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 8e1f936 commit 2492268

File tree

2 files changed

+243
-171
lines changed

2 files changed

+243
-171
lines changed

Documentation/vm/slub.txt

Lines changed: 89 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -127,13 +127,20 @@ SLUB Debug output
127127

128128
Here is a sample of slub debug output:
129129

130-
*** SLUB kmalloc-8: Redzone Active@0xc90f6d20 slab 0xc528c530 offset=3360 flags=0x400000c3 inuse=61 freelist=0xc90f6d58
131-
Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
132-
Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005
133-
Redzone 0xc90f6d28: 00 cc cc cc .
134-
FreePointer 0xc90f6d2c -> 0xc90f6d58
135-
Last alloc: get_modalias+0x61/0xf5 jiffies_ago=53 cpu=1 pid=554
136-
Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
130+
====================================================================
131+
BUG kmalloc-8: Redzone overwritten
132+
--------------------------------------------------------------------
133+
134+
INFO: 0xc90f6d28-0xc90f6d2b. First byte 0x00 instead of 0xcc
135+
INFO: Slab 0xc528c530 flags=0x400000c3 inuse=61 fp=0xc90f6d58
136+
INFO: Object 0xc90f6d20 @offset=3360 fp=0xc90f6d58
137+
INFO: Allocated in get_modalias+0x61/0xf5 age=53 cpu=1 pid=554
138+
139+
Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
140+
Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005
141+
Redzone 0xc90f6d28: 00 cc cc cc .
142+
Padding 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
143+
137144
[<c010523d>] dump_trace+0x63/0x1eb
138145
[<c01053df>] show_trace_log_lvl+0x1a/0x2f
139146
[<c010601d>] show_trace+0x12/0x14
@@ -155,74 +162,108 @@ Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
155162
[<c0104112>] sysenter_past_esp+0x5f/0x99
156163
[<b7f7b410>] 0xb7f7b410
157164
=======================
158-
@@@ SLUB kmalloc-8: Restoring redzone (0xcc) from 0xc90f6d28-0xc90f6d2b
159165

166+
FIX kmalloc-8: Restoring Redzone 0xc90f6d28-0xc90f6d2b=0xcc
160167

168+
If SLUB encounters a corrupted object (full detection requires the kernel
169+
to be booted with slub_debug) then the following output will be dumped
170+
into the syslog:
161171

162-
If SLUB encounters a corrupted object then it will perform the following
163-
actions:
164-
165-
1. Isolation and report of the issue
172+
1. Description of the problem encountered
166173

167174
This will be a message in the system log starting with
168175

169-
*** SLUB <slab cache affected>: <What went wrong>@<object address>
170-
offset=<offset of object into slab> flags=<slabflags>
171-
inuse=<objects in use in this slab> freelist=<first free object in slab>
176+
===============================================
177+
BUG <slab cache affected>: <What went wrong>
178+
-----------------------------------------------
172179

173-
2. Report on how the problem was dealt with in order to ensure the continued
174-
operation of the system.
180+
INFO: <corruption start>-<corruption_end> <more info>
181+
INFO: Slab <address> <slab information>
182+
INFO: Object <address> <object information>
183+
INFO: Allocated in <kernel function> age=<jiffies since alloc> cpu=<allocated by
184+
cpu> pid=<pid of the process>
185+
INFO: Freed in <kernel function> age=<jiffies since free> cpu=<freed by cpu>
186+
pid=<pid of the process>
175187

176-
These are messages in the system log beginning with
177-
178-
@@@ SLUB <slab cache affected>: <corrective action taken>
188+
(Object allocation / free information is only available if SLAB_STORE_USER is
189+
set for the slab. slub_debug sets that option)
179190

191+
2. The object contents if an object was involved.
180192

181-
In the above sample SLUB found that the Redzone of an active object has
182-
been overwritten. Here a string of 8 characters was written into a slab that
183-
has the length of 8 characters. However, a 8 character string needs a
184-
terminating 0. That zero has overwritten the first byte of the Redzone field.
185-
After reporting the details of the issue encountered the @@@ SLUB message
186-
tell us that SLUB has restored the redzone to its proper value and then
187-
system operations continue.
188-
189-
Various types of lines can follow the @@@ SLUB line:
193+
Various types of lines can follow the BUG SLUB line:
190194

191195
Bytes b4 <address> : <bytes>
192-
Show a few bytes before the object where the problem was detected.
196+
Shows a few bytes before the object where the problem was detected.
193197
Can be useful if the corruption does not stop with the start of the
194198
object.
195199

196200
Object <address> : <bytes>
197201
The bytes of the object. If the object is inactive then the bytes
198-
typically contain poisoning values. Any non-poison value shows a
202+
typically contain poison values. Any non-poison value shows a
199203
corruption by a write after free.
200204

201205
Redzone <address> : <bytes>
202-
The redzone following the object. The redzone is used to detect
206+
The Redzone following the object. The Redzone is used to detect
203207
writes after the object. All bytes should always have the same
204208
value. If there is any deviation then it is due to a write after
205209
the object boundary.
206210

207-
Freepointer
208-
The pointer to the next free object in the slab. May become
209-
corrupted if overwriting continues after the red zone.
210-
211-
Last alloc:
212-
Last free:
213-
Shows the address from which the object was allocated/freed last.
214-
We note the pid, the time and the CPU that did so. This is usually
215-
the most useful information to figure out where things went wrong.
216-
Here get_modalias() did an kmalloc(8) instead of a kmalloc(9).
211+
(Redzone information is only available if SLAB_RED_ZONE is set.
212+
slub_debug sets that option)
217213

218-
Filler <address> : <bytes>
214+
Padding <address> : <bytes>
219215
Unused data to fill up the space in order to get the next object
220216
properly aligned. In the debug case we make sure that there are
221-
at least 4 bytes of filler. This allow for the detection of writes
217+
at least 4 bytes of padding. This allows the detection of writes
222218
before the object.
223219

224-
Following the filler will be a stackdump. That stackdump describes the
225-
location where the error was detected. The cause of the corruption is more
226-
likely to be found by looking at the information about the last alloc / free.
220+
3. A stackdump
221+
222+
The stackdump describes the location where the error was detected. The cause
223+
of the corruption is may be more likely found by looking at the function that
224+
allocated or freed the object.
225+
226+
4. Report on how the problem was dealt with in order to ensure the continued
227+
operation of the system.
228+
229+
These are messages in the system log beginning with
230+
231+
FIX <slab cache affected>: <corrective action taken>
232+
233+
In the above sample SLUB found that the Redzone of an active object has
234+
been overwritten. Here a string of 8 characters was written into a slab that
235+
has the length of 8 characters. However, a 8 character string needs a
236+
terminating 0. That zero has overwritten the first byte of the Redzone field.
237+
After reporting the details of the issue encountered the FIX SLUB message
238+
tell us that SLUB has restored the Redzone to its proper value and then
239+
system operations continue.
240+
241+
Emergency operations:
242+
---------------------
243+
244+
Minimal debugging (sanity checks alone) can be enabled by booting with
245+
246+
slub_debug=F
247+
248+
This will be generally be enough to enable the resiliency features of slub
249+
which will keep the system running even if a bad kernel component will
250+
keep corrupting objects. This may be important for production systems.
251+
Performance will be impacted by the sanity checks and there will be a
252+
continual stream of error messages to the syslog but no additional memory
253+
will be used (unlike full debugging).
254+
255+
No guarantees. The kernel component still needs to be fixed. Performance
256+
may be optimized further by locating the slab that experiences corruption
257+
and enabling debugging only for that cache
258+
259+
I.e.
260+
261+
slub_debug=F,dentry
262+
263+
If the corruption occurs by writing after the end of the object then it
264+
may be advisable to enable a Redzone to avoid corrupting the beginning
265+
of other objects.
266+
267+
slub_debug=FZ,dentry
227268

228-
Christoph Lameter, <[email protected]>, May 23, 2007
269+
Christoph Lameter, <[email protected]>, May 30, 2007

0 commit comments

Comments
 (0)