Skip to content

bpf: Support multi-attach for freplace programs #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

kernel-patches-bot
Copy link

Pull request for series with
subject: bpf: Support multi-attach for freplace programs
version: 5
url: https://patchwork.ozlabs.org/project/netdev/list/?series=201797

@kernel-patches-bot
Copy link
Author

kernel-patches-bot and others added 9 commits September 15, 2020 17:59
In preparation for moving code around, change a bunch of references to
env->log (and the verbose() logging helper) to use bpf_log() and a direct
pointer to struct bpf_verifier_log. While we're touching the function
signature, mark the 'prog' argument to bpf_check_type_match() as const.

Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
for the log struct so we can re-use the code with logging disabled.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 include/linux/bpf.h          |    2 +-
 include/linux/bpf_verifier.h |    5 +++-
 kernel/bpf/btf.c             |    6 +++--
 kernel/bpf/verifier.c        |   48 +++++++++++++++++++++---------------------
 4 files changed, 31 insertions(+), 30 deletions(-)
The check_attach_btf_id() function really does three things:

1. It performs a bunch of checks on the program to ensure that the
   attachment is valid.

2. It stores a bunch of state about the attachment being requested in
   the verifier environment and struct bpf_prog objects.

3. It allocates a trampoline for the attachment.

This patch splits out (1.) and (3.) into separate functions in preparation
for reusing them when the actual attachment is happening (in the
raw_tracepoint_open syscall operation), which will allow tracing programs
to have multiple (compatible) attachments.

No functional change is intended with this patch.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 include/linux/bpf.h          |    7 +
 include/linux/bpf_verifier.h |    9 ++
 kernel/bpf/trampoline.c      |   20 ++++
 kernel/bpf/verifier.c        |  197 ++++++++++++++++++++++++------------------
 4 files changed, 149 insertions(+), 84 deletions(-)
In preparation for allowing multiple attachments of freplace programs, move
the references to the target program and trampoline into the
bpf_tracing_link structure when that is created. To do this atomically,
introduce a new mutex in prog->aux to protect writing to the two pointers
to target prog and trampoline, and rename the members to make it clear that
they are related.

With this change, it is no longer possible to attach the same tracing
program multiple times (detaching in-between), since the reference from the
tracing program to the target disappears on the first attach. However,
since the next patch will let the caller supply an attach target, that will
also make it possible to attach to the same place multiple times.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 include/linux/bpf.h     |   15 +++++++++------
 kernel/bpf/btf.c        |    6 +++---
 kernel/bpf/core.c       |    9 ++++++---
 kernel/bpf/syscall.c    |   46 ++++++++++++++++++++++++++++++++++++++++------
 kernel/bpf/trampoline.c |   12 ++++--------
 kernel/bpf/verifier.c   |    9 +++++----
 6 files changed, 67 insertions(+), 30 deletions(-)
This enables support for attaching freplace programs to multiple attach
points. It does this by amending the UAPI for bpf_link_Create with a target
btf ID that can be used to supply the new attachment point along with the
target program fd. The target must be compatible with the target that was
supplied at program load time.

The implementation reuses the checks that were factored out of
check_attach_btf_id() to ensure compatibility between the BTF types of the
old and new attachment. If these match, a new bpf_tracing_link will be
created for the new attach target, allowing multiple attachments to
co-exist simultaneously.

The code could theoretically support multiple-attach of other types of
tracing programs as well, but since I don't have a use case for any of
those, there is no API support for doing so.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 include/linux/bpf.h            |    2 +
 include/uapi/linux/bpf.h       |    2 +
 kernel/bpf/syscall.c           |   92 ++++++++++++++++++++++++++++++++++------
 kernel/bpf/verifier.c          |    9 ++++
 tools/include/uapi/linux/bpf.h |    2 +
 5 files changed, 94 insertions(+), 13 deletions(-)
Eelco reported we can't properly access arguments if the tracing
program is attached to extension program.

Having following program:

  SEC("classifier/test_pkt_md_access")
  int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

  SEC("freplace/test_pkt_md_access")
  int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

  SEC("fentry/test_pkt_md_access_new")
  int BPF_PROG(fentry, struct sk_buff *skb)

It's not possible to access skb argument in the fentry program,
with following error from verifier:

  ; int BPF_PROG(fentry, struct sk_buff *skb)
  0: (79) r1 = *(u64 *)(r1 +0)
  invalid bpf_context access off=0 size=8

The problem is that btf_ctx_access gets the context type for the
traced program, which is in this case the extension.

But when we trace extension program, we want to get the context
type of the program that the extension is attached to, so we can
access the argument properly in the trace program.

This version of the patch is tweaked slightly from Jiri's original one,
since the refactoring in the previous patches means we have to get the
target prog type from the new variable in prog->aux instead of directly
from the target prog.

Reported-by: Eelco Chaudron <[email protected]>
Suggested-by: Jiri Olsa <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 kernel/bpf/btf.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)
This adds support for supplying a target btf ID for the bpf_link_create()
operation, and adds a new bpf_program__attach_freplace() high-level API for
attaching freplace functions with a target.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 tools/lib/bpf/bpf.c      |    1 +
 tools/lib/bpf/bpf.h      |    3 ++-
 tools/lib/bpf/libbpf.c   |   24 ++++++++++++++++++------
 tools/lib/bpf/libbpf.h   |    3 +++
 tools/lib/bpf/libbpf.map |    1 +
 5 files changed, 25 insertions(+), 7 deletions(-)
This adds a selftest for attaching an freplace program to multiple targets
simultaneously.

Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 .../selftests/bpf/prog_tests/fexit_bpf2bpf.c       |  171 ++++++++++++++++----
 .../selftests/bpf/progs/freplace_get_constant.c    |   15 ++
 2 files changed, 154 insertions(+), 32 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/freplace_get_constant.c
Adding test that setup following program:

  SEC("classifier/test_pkt_md_access")
  int test_pkt_md_access(struct __sk_buff *skb)

with its extension:

  SEC("freplace/test_pkt_md_access")
  int test_pkt_md_access_new(struct __sk_buff *skb)

and tracing that extension with:

  SEC("fentry/test_pkt_md_access_new")
  int BPF_PROG(fentry, struct sk_buff *skb)

The test verifies that the tracing program can
dereference skb argument properly.

Signed-off-by: Jiri Olsa <[email protected]>
Signed-off-by: Toke Høiland-Jørgensen <[email protected]>
---
 tools/testing/selftests/bpf/prog_tests/trace_ext.c |  113 ++++++++++++++++++++
 tools/testing/selftests/bpf/progs/test_trace_ext.c |   18 +++
 .../selftests/bpf/progs/test_trace_ext_tracing.c   |   25 ++++
 3 files changed, 156 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_trace_ext_tracing.c
@kernel-patches-bot
Copy link
Author

Master branch: c64779e
series: https://patchwork.ozlabs.org/project/netdev/list/?series=201797
version: 5

patch https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ applied successfully
patch https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ applied successfully
Pull request is NOT updated. Failed to apply https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/, error message was:
Cmd('git') failed due to: exit code(1)
cmdline: git apply -3 /tmp/tmpwtau5dmo
stderr: 'error: patch failed: kernel/bpf/core.c:98
Falling back to three-way merge...
Applied patch to 'kernel/bpf/core.c' with conflicts.
U kernel/bpf/core.c'

@kernel-patches-bot
Copy link
Author

At least one diff in series https://patchwork.ozlabs.org/project/netdev/list/?series=201797 expired. Closing PR.

kernel-patches-bot pushed a commit that referenced this pull request Jul 29, 2021
The histogram logic was allowing events with char * pointers to be used as
normal strings. But it was easy to crash the kernel with:

 # echo 'hist:keys=filename' > events/syscalls/sys_enter_openat/trigger

And open some files, and boom!

 BUG: unable to handle page fault for address: 00007f2ced0c3280
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1173fa067 P4D 1173fa067 PUD 1171b6067 PMD 1171dd067 PTE 0
 Oops: 0000 [#1] PREEMPT SMP
 CPU: 6 PID: 1810 Comm: cat Not tainted 5.13.0-rc5-test+ #61
 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
 RIP: 0010:strlen+0x0/0x20
 Code: f6 82 80 2a 0b a9 20 74 11 0f b6 50 01 48 83 c0 01 f6 82 80 2a 0b
a9 20 75 ef c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 <80> 3f 00 74
10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3

 RSP: 0018:ffffbdbf81567b50 EFLAGS: 00010246
 RAX: 0000000000000003 RBX: ffff93815cdb3800 RCX: ffff9382401a22d0
 RDX: 0000000000000100 RSI: 0000000000000000 RDI: 00007f2ced0c3280
 RBP: 0000000000000100 R08: ffff9382409ff074 R09: ffffbdbf81567c98
 R10: ffff9382409ff074 R11: 0000000000000000 R12: ffff9382409ff074
 R13: 0000000000000001 R14: ffff93815a744f00 R15: 00007f2ced0c3280
 FS:  00007f2ced0f8580(0000) GS:ffff93825a800000(0000)
knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f2ced0c3280 CR3: 0000000107069005 CR4: 00000000001706e0
 Call Trace:
  event_hist_trigger+0x463/0x5f0
  ? find_held_lock+0x32/0x90
  ? sched_clock_cpu+0xe/0xd0
  ? lock_release+0x155/0x440
  ? kernel_init_free_pages+0x6d/0x90
  ? preempt_count_sub+0x9b/0xd0
  ? kernel_init_free_pages+0x6d/0x90
  ? get_page_from_freelist+0x12c4/0x1680
  ? __rb_reserve_next+0xe5/0x460
  ? ring_buffer_lock_reserve+0x12a/0x3f0
  event_triggers_call+0x52/0xe0
  ftrace_syscall_enter+0x264/0x2c0
  syscall_trace_enter.constprop.0+0x1ee/0x210
  do_syscall_64+0x1c/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Where it triggered a fault on strlen(key) where key was the filename.

The reason is that filename is a char * to user space, and the histogram
code just blindly dereferenced it, with obvious bad results.

I originally tried to use strncpy_from_user/kernel_nofault() but found
that there's other places that its dereferenced and not worth the effort.

Just do not allow "char *" to act like strings.

Link: https://lkml.kernel.org/r/[email protected]

Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: [email protected]
Acked-by: Namhyung Kim <[email protected]>
Acked-by: Tom Zanussi <[email protected]>
Fixes: 79e577c ("tracing: Support string type key properly")
Fixes: 5967bd5 ("tracing: Let filter_assign_type() detect FILTER_PTR_STRING")
Signed-off-by: Steven Rostedt (VMware) <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Jan 28, 2023
Current imc-pmu code triggers a WARNING with CONFIG_DEBUG_ATOMIC_SLEEP
and CONFIG_PROVE_LOCKING enabled, while running a thread_imc event.

Command to trigger the warning:
  # perf stat -e thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/ sleep 5

   Performance counter stats for 'sleep 5':

                   0      thread_imc/CPM_CS_FROM_L4_MEM_X_DPTEG/

         5.002117947 seconds time elapsed

         0.000131000 seconds user
         0.001063000 seconds sys

Below is snippet of the warning in dmesg:

  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
  in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2869, name: perf-exec
  preempt_count: 2, expected: 0
  4 locks held by perf-exec/2869:
   #0: c00000004325c540 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: bprm_execve+0x64/0xa90
   #1: c00000004325c5d8 (&sig->exec_update_lock){++++}-{3:3}, at: begin_new_exec+0x460/0xef0
   #2: c0000003fa99d4e0 (&cpuctx_lock){-...}-{2:2}, at: perf_event_exec+0x290/0x510
   #3: c000000017ab8418 (&ctx->lock){....}-{2:2}, at: perf_event_exec+0x29c/0x510
  irq event stamp: 4806
  hardirqs last  enabled at (4805): [<c000000000f65b94>] _raw_spin_unlock_irqrestore+0x94/0xd0
  hardirqs last disabled at (4806): [<c0000000003fae44>] perf_event_exec+0x394/0x510
  softirqs last  enabled at (0): [<c00000000013c404>] copy_process+0xc34/0x1ff0
  softirqs last disabled at (0): [<0000000000000000>] 0x0
  CPU: 36 PID: 2869 Comm: perf-exec Not tainted 6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  Call Trace:
    dump_stack_lvl+0x98/0xe0 (unreliable)
    __might_resched+0x2f8/0x310
    __mutex_lock+0x6c/0x13f0
    thread_imc_event_add+0xf4/0x1b0
    event_sched_in+0xe0/0x210
    merge_sched_in+0x1f0/0x600
    visit_groups_merge.isra.92.constprop.166+0x2bc/0x6c0
    ctx_flexible_sched_in+0xcc/0x140
    ctx_sched_in+0x20c/0x2a0
    ctx_resched+0x104/0x1c0
    perf_event_exec+0x340/0x510
    begin_new_exec+0x730/0xef0
    load_elf_binary+0x3f8/0x1e10
  ...
  do not call blocking ops when !TASK_RUNNING; state=2001 set at [<00000000fd63e7cf>] do_nanosleep+0x60/0x1a0
  WARNING: CPU: 36 PID: 2869 at kernel/sched/core.c:9912 __might_sleep+0x9c/0xb0
  CPU: 36 PID: 2869 Comm: sleep Tainted: G        W          6.2.0-rc2-00011-g1247637727f2 #61
  Hardware name: 8375-42A POWER9 0x4e1202 opal:v7.0-16-g9b85f7d961 PowerNV
  NIP:  c000000000194a1c LR: c000000000194a18 CTR: c000000000a78670
  REGS: c00000004d2134e0 TRAP: 0700   Tainted: G        W           (6.2.0-rc2-00011-g1247637727f2)
  MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48002824  XER: 00000000
  CFAR: c00000000013fb64 IRQMASK: 1

The above warning triggered because the current imc-pmu code uses mutex
lock in interrupt disabled sections. The function mutex_lock()
internally calls __might_resched(), which will check if IRQs are
disabled and in case IRQs are disabled, it will trigger the warning.

Fix the issue by changing the mutex lock to spinlock.

Fixes: 8f95faa ("powerpc/powernv: Detect and create IMC device")
Reported-by: Michael Petlan <[email protected]>
Reported-by: Peter Zijlstra <[email protected]>
Signed-off-by: Kajol Jain <[email protected]>
[mpe: Fix comments, trim oops in change log, add reported-by tags]
Signed-off-by: Michael Ellerman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
kernel-patches-bot pushed a commit that referenced this pull request Mar 17, 2023
Currently, test_progs outputs all stdout/stderr as it runs, and when it
is done, prints a summary.

It is non-trivial for tooling to parse that output and extract meaningful
information from it.

This change adds a new option, `--json-summary`/`-J` that let the caller
specify a file where `test_progs{,-no_alu32}` can write a summary of the
run in a json format that can later be parsed by tooling.

Currently, it creates a summary section with successes/skipped/failures
followed by a list of failed tests and subtests.

A test contains the following fields:
- name: the name of the test
- number: the number of the test
- message: the log message that was printed by the test.
- failed: A boolean indicating whether the test failed or not. Currently
we only output failed tests, but in the future, successful tests could
be added.
- subtests: A list of subtests associated with this test.

A subtest contains the following fields:
- name: same as above
- number: sanme as above
- message: the log message that was printed by the subtest.
- failed: same as above but for the subtest

An example run and json content below:
```
$ sudo ./test_progs -a $(grep -v '^#' ./DENYLIST.aarch64 | awk '{print
$1","}' | tr -d '\n') -j -J /tmp/test_progs.json
$ jq < /tmp/test_progs.json | head -n 30
{
  "success": 29,
  "success_subtest": 23,
  "skipped": 3,
  "failed": 28,
  "results": [
    {
      "name": "bpf_cookie",
      "number": 10,
      "message": "test_bpf_cookie:PASS:skel_open 0 nsec\n",
      "failed": true,
      "subtests": [
        {
          "name": "multi_kprobe_link_api",
          "number": 2,
          "message": "kprobe_multi_link_api_subtest:PASS:load_kallsyms 0
nsec\nlibbpf: extern 'bpf_testmod_fentry_test1' (strong): not
resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed
to load BPF skeleton 'kprobe_multi':
-3\nkprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected
error: -3\n",
          "failed": true
        },
        {
          "name": "multi_kprobe_attach_api",
          "number": 3,
          "message": "libbpf: extern 'bpf_testmod_fentry_test1'
(strong): not resolved\nlibbpf: failed to load object
'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi':
-3\nkprobe_multi_attach_api_subtest:FAIL:fentry_raw_skel_load unexpected
error: -3\n",
          "failed": true
        },
        {
          "name": "lsm",
          "number": 8,
          "message": "lsm_subtest:PASS:lsm.link_create 0
nsec\nlsm_subtest:FAIL:stack_mprotect unexpected stack_mprotect: actual
0 != expected -1\n",
          "failed": true
        }
```

The file can then be used to print a summary of the test run and list of
failing tests/subtests:

```
$ jq -r < /tmp/test_progs.json '"Success:
\(.success)/\(.success_subtest), Skipped: \(.skipped), Failed:
\(.failed)"'

Success: 29/23, Skipped: 3, Failed: 28
$ jq -r < /tmp/test_progs.json '.results | map([
    if .failed then "#\(.number) \(.name)" else empty end,
    (
        . as {name: $tname, number: $tnum} | .subtests | map(
            if .failed then "#\($tnum)/\(.number) \($tname)/\(.name)"
else empty end
        )
    )
]) | flatten | .[]' | head -n 20
 #10 bpf_cookie
 #10/2 bpf_cookie/multi_kprobe_link_api
 #10/3 bpf_cookie/multi_kprobe_attach_api
 #10/8 bpf_cookie/lsm
 #15 bpf_mod_race
 #15/1 bpf_mod_race/ksym (used_btfs UAF)
 #15/2 bpf_mod_race/kfunc (kfunc_btf_tab UAF)
 #36 cgroup_hierarchical_stats
 #61 deny_namespace
 #61/1 deny_namespace/unpriv_userns_create_no_bpf
 #73 fexit_stress
 #83 get_func_ip_test
 #99 kfunc_dynptr_param
 #99/1 kfunc_dynptr_param/dynptr_data_null
 #99/4 kfunc_dynptr_param/dynptr_data_null
 #100 kprobe_multi_bench_attach
 #100/1 kprobe_multi_bench_attach/kernel
 #100/2 kprobe_multi_bench_attach/modules
 #101 kprobe_multi_test
 #101/1 kprobe_multi_test/skel_api
```

Signed-off-by: Manu Bretelle <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 17, 2023
Currently, test_progs outputs all stdout/stderr as it runs, and when it
is done, prints a summary.

It is non-trivial for tooling to parse that output and extract meaningful
information from it.

This change adds a new option, `--json-summary`/`-J` that let the caller
specify a file where `test_progs{,-no_alu32}` can write a summary of the
run in a json format that can later be parsed by tooling.

Currently, it creates a summary section with successes/skipped/failures
followed by a list of failed tests and subtests.

A test contains the following fields:
- name: the name of the test
- number: the number of the test
- message: the log message that was printed by the test.
- failed: A boolean indicating whether the test failed or not. Currently
we only output failed tests, but in the future, successful tests could
be added.
- subtests: A list of subtests associated with this test.

A subtest contains the following fields:
- name: same as above
- number: sanme as above
- message: the log message that was printed by the subtest.
- failed: same as above but for the subtest

An example run and json content below:
```
$ sudo ./test_progs -a $(grep -v '^#' ./DENYLIST.aarch64 | awk '{print
$1","}' | tr -d '\n') -j -J /tmp/test_progs.json
$ jq < /tmp/test_progs.json | head -n 30
{
  "success": 29,
  "success_subtest": 23,
  "skipped": 3,
  "failed": 28,
  "results": [
    {
      "name": "bpf_cookie",
      "number": 10,
      "message": "test_bpf_cookie:PASS:skel_open 0 nsec\n",
      "failed": true,
      "subtests": [
        {
          "name": "multi_kprobe_link_api",
          "number": 2,
          "message": "kprobe_multi_link_api_subtest:PASS:load_kallsyms 0
nsec\nlibbpf: extern 'bpf_testmod_fentry_test1' (strong): not
resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed
to load BPF skeleton 'kprobe_multi':
-3\nkprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected
error: -3\n",
          "failed": true
        },
        {
          "name": "multi_kprobe_attach_api",
          "number": 3,
          "message": "libbpf: extern 'bpf_testmod_fentry_test1'
(strong): not resolved\nlibbpf: failed to load object
'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi':
-3\nkprobe_multi_attach_api_subtest:FAIL:fentry_raw_skel_load unexpected
error: -3\n",
          "failed": true
        },
        {
          "name": "lsm",
          "number": 8,
          "message": "lsm_subtest:PASS:lsm.link_create 0
nsec\nlsm_subtest:FAIL:stack_mprotect unexpected stack_mprotect: actual
0 != expected -1\n",
          "failed": true
        }
```

The file can then be used to print a summary of the test run and list of
failing tests/subtests:

```
$ jq -r < /tmp/test_progs.json '"Success:
\(.success)/\(.success_subtest), Skipped: \(.skipped), Failed:
\(.failed)"'

Success: 29/23, Skipped: 3, Failed: 28
$ jq -r < /tmp/test_progs.json '.results | map([
    if .failed then "#\(.number) \(.name)" else empty end,
    (
        . as {name: $tname, number: $tnum} | .subtests | map(
            if .failed then "#\($tnum)/\(.number) \($tname)/\(.name)"
else empty end
        )
    )
]) | flatten | .[]' | head -n 20
 #10 bpf_cookie
 #10/2 bpf_cookie/multi_kprobe_link_api
 #10/3 bpf_cookie/multi_kprobe_attach_api
 #10/8 bpf_cookie/lsm
 #15 bpf_mod_race
 #15/1 bpf_mod_race/ksym (used_btfs UAF)
 #15/2 bpf_mod_race/kfunc (kfunc_btf_tab UAF)
 #36 cgroup_hierarchical_stats
 #61 deny_namespace
 #61/1 deny_namespace/unpriv_userns_create_no_bpf
 #73 fexit_stress
 #83 get_func_ip_test
 #99 kfunc_dynptr_param
 #99/1 kfunc_dynptr_param/dynptr_data_null
 #99/4 kfunc_dynptr_param/dynptr_data_null
 #100 kprobe_multi_bench_attach
 #100/1 kprobe_multi_bench_attach/kernel
 #100/2 kprobe_multi_bench_attach/modules
 #101 kprobe_multi_test
 #101/1 kprobe_multi_test/skel_api
```

Signed-off-by: Manu Bretelle <[email protected]>
kernel-patches-bot pushed a commit that referenced this pull request Mar 17, 2023
Currently, test_progs outputs all stdout/stderr as it runs, and when it
is done, prints a summary.

It is non-trivial for tooling to parse that output and extract meaningful
information from it.

This change adds a new option, `--json-summary`/`-J` that let the caller
specify a file where `test_progs{,-no_alu32}` can write a summary of the
run in a json format that can later be parsed by tooling.

Currently, it creates a summary section with successes/skipped/failures
followed by a list of failed tests and subtests.

A test contains the following fields:
- name: the name of the test
- number: the number of the test
- message: the log message that was printed by the test.
- failed: A boolean indicating whether the test failed or not. Currently
we only output failed tests, but in the future, successful tests could
be added.
- subtests: A list of subtests associated with this test.

A subtest contains the following fields:
- name: same as above
- number: sanme as above
- message: the log message that was printed by the subtest.
- failed: same as above but for the subtest

An example run and json content below:
```
$ sudo ./test_progs -a $(grep -v '^#' ./DENYLIST.aarch64 | awk '{print
$1","}' | tr -d '\n') -j -J /tmp/test_progs.json
$ jq < /tmp/test_progs.json | head -n 30
{
  "success": 29,
  "success_subtest": 23,
  "skipped": 3,
  "failed": 28,
  "results": [
    {
      "name": "bpf_cookie",
      "number": 10,
      "message": "test_bpf_cookie:PASS:skel_open 0 nsec\n",
      "failed": true,
      "subtests": [
        {
          "name": "multi_kprobe_link_api",
          "number": 2,
          "message": "kprobe_multi_link_api_subtest:PASS:load_kallsyms 0 nsec\nlibbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "multi_kprobe_attach_api",
          "number": 3,
          "message": "libbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_attach_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "lsm",
          "number": 8,
          "message": "lsm_subtest:PASS:lsm.link_create 0 nsec\nlsm_subtest:FAIL:stack_mprotect unexpected stack_mprotect: actual 0 != expected -1\n",
          "failed": true
        }
```

The file can then be used to print a summary of the test run and list of
failing tests/subtests:

```
$ jq -r < /tmp/test_progs.json '"Success: \(.success)/\(.success_subtest), Skipped: \(.skipped), Failed: \(.failed)"'

Success: 29/23, Skipped: 3, Failed: 28
$ jq -r < /tmp/test_progs.json '.results | map([
    if .failed then "#\(.number) \(.name)" else empty end,
    (
        . as {name: $tname, number: $tnum} | .subtests | map(
            if .failed then "#\($tnum)/\(.number) \($tname)/\(.name)" else empty end
        )
    )
]) | flatten | .[]' | head -n 20
 #10 bpf_cookie
 #10/2 bpf_cookie/multi_kprobe_link_api
 #10/3 bpf_cookie/multi_kprobe_attach_api
 #10/8 bpf_cookie/lsm
 #15 bpf_mod_race
 #15/1 bpf_mod_race/ksym (used_btfs UAF)
 #15/2 bpf_mod_race/kfunc (kfunc_btf_tab UAF)
 #36 cgroup_hierarchical_stats
 #61 deny_namespace
 #61/1 deny_namespace/unpriv_userns_create_no_bpf
 #73 fexit_stress
 #83 get_func_ip_test
 #99 kfunc_dynptr_param
 #99/1 kfunc_dynptr_param/dynptr_data_null
 #99/4 kfunc_dynptr_param/dynptr_data_null
 #100 kprobe_multi_bench_attach
 #100/1 kprobe_multi_bench_attach/kernel
 #100/2 kprobe_multi_bench_attach/modules
 #101 kprobe_multi_test
 #101/1 kprobe_multi_test/skel_api
```

Signed-off-by: Manu Bretelle <[email protected]>
Signed-off-by: Andrii Nakryiko <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 23, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 23, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 23, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 23, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 24, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
kuba-moo pushed a commit to linux-netdev/testing-bpf-ci that referenced this pull request Jun 25, 2025
… runtime

8021q(vlan_device_event) will add VLAN 0 when enabling the device, and
remove it on disabling it if NETIF_F_HW_VLAN_CTAG_FILTER set.
However, if changing filter feature during netdev runtime,
null-ptr-unref[1] or bug_on[2] will be triggered by unregister_vlan_dev()
for refcount imbalance.

[1]
BUG: KASAN: null-ptr-deref in unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
Write of size 8 at addr 0000000000000000 by task ip/382
CPU: 2 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#60 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <TASK>
dump_stack_lvl (lib/dump_stack.c:123)
kasan_report (mm/kasan/report.c:636)
unregister_vlan_dev (net/8021q/vlan.h:90 net/8021q/vlan.c:110)
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)

[2]
kernel BUG at net/8021q/vlan.c:99!
Oops: invalid opcode: 0000 [kernel-patches#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 382 Comm: ip Not tainted 6.16.0-rc3 kernel-patches#61 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:unregister_vlan_dev (net/8021q/vlan.c:99 (discriminator 1))
RSP: 0018:ffff88810badf310 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810da84000 RCX: ffffffffb47ceb9a
RDX: dffffc0000000000 RSI: 0000000000000008 RDI: ffff88810e8b43c8
RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff6cefe80
R10: ffffffffb677f407 R11: ffff88810badf3c0 R12: ffff88810e8b4000
R13: 0000000000000000 R14: ffff88810642a5c0 R15: 000000000000017e
FS:  00007f1ff68c20c0(0000) GS:ffff888163a24000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1ff5dad240 CR3: 0000000107e56000 CR4: 00000000000006f0
Call Trace:
 <TASK>
rtnl_dellink (net/core/rtnetlink.c:3511 net/core/rtnetlink.c:3553)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6945)
netlink_rcv_skb (net/netlink/af_netlink.c:2535)
netlink_unicast (net/netlink/af_netlink.c:1314 net/netlink/af_netlink.c:1339)
netlink_sendmsg (net/netlink/af_netlink.c:1883)
____sys_sendmsg (net/socket.c:712 net/socket.c:727 net/socket.c:2566)
___sys_sendmsg (net/socket.c:2622)
__sys_sendmsg (net/socket.c:2652)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall_64.c:94)

Root cause is as below:
step1: add vlan0 for real_dev, such as bond, team.
register_vlan_dev
    vlan_vid_add(real_dev,htons(ETH_P_8021Q),0) //refcnt=1
step2: disable vlan filter feature and enable real_dev
step3: change filter from 0 to 1
vlan_device_event
    vlan_filter_push_vids
    	ndo_vlan_rx_add_vid //No refcnt added to real_dev vlan0
step4: real_dev down
vlan_device_event
    vlan_vid_del(dev, htons(ETH_P_8021Q), 0); //refcnt=0
        vlan_info_rcu_free //free vlan0
step5: real_dev up
vlan_device_event
    vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
        vlan_info_alloc //alloc new empty vid0. refcnt=1
step6: delete vlan0
unregister_vlan_dev
    BUG_ON(!vlan_info); //will trigger it if step5 was not executed
    vlan_group_set_device
        array = vg->vlan_devices_arrays
	//null-ptr-ref will be triggered after step5

E.g. the following sequence can reproduce null-ptr-ref

$ ip link add bond0 type bond mode 0
$ ip link add link bond0 name vlan0 type vlan id 0 protocol 802.1q
$ ethtool -K bond0 rx-vlan-filter off
$ ifconfig bond0 up
$ ethtool -K bond0 rx-vlan-filter on
$ ifconfig bond0 down
$ ifconfig bond0 up
$ ip link del vlan0

Fix it by correctly modifying vid0 refcnt of real_dev when changing
the NETIF_F_HW_VLAN_CTAG_FILTER flag during runtime.

Fixes: ad1afb0 ("vlan_dev: VLAN 0 should be treated as "no vlan tag" (802.1p packet)")
Reported-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=a8b046e462915c65b10b
Signed-off-by: Dong Chenchen <[email protected]>
Signed-off-by: NipaLocal <nipa@local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants