一:背景
1. 讲故事
这段时间都在跑外卖,感觉好久都没写文章了,今天继续给大家带来一篇崩溃类的生产事故,这是微信上有位老朋友找到我的,让我帮忙看下为啥崩溃了,dump也在手,接下来就可以一顿分析。
二:崩溃分析
1. 为什么会崩溃
双击打开dump文件,会看到崩溃信息通览,参考如下:- Executable search path is:
- Windows 10 Version 17763 MP (48 procs) Free x64
- Product: Server, suite: TerminalServer DataCenter SingleUserTS
- Edition build lab: 17763.1.amd64fre.rs5_release.180914-1434
- Debug session time: Fri Oct 31 17:38:42.000 2025 (UTC + 8:00)
- System Uptime: 14 days 2:42:29.643
- Process Uptime: 0 days 0:00:58.000
- ................................................................
- .......................................
- Loading unloaded module list
- .
- This dump file has an exception of interest stored in it.
- The stored exception information can be accessed via .ecxr.
- (5a74.6250): Unknown exception - code c0000374 (first/second chance not available)
- For analysis of this file, run !analyze -v
- ntdll!NtWaitForMultipleObjects+0x14:
- 00007ffe`57baf0e4 c3 ret
复制代码 从卦中看崩溃码是 c0000374,即 ntheap 损坏,哈哈,到这里一下子就把范围给缩小了。
2. 为什么ntheap 损坏
那为什么ntheap会损坏呢?可以使用 .ecxr 切到崩溃时的调用栈,观察崩溃行为。- 0:032> .ecxr
- 0:032> k
- *** Stack trace for last set context - .thread/.cxr resets it
- # Child-SP RetAddr Call Site
- 00 000000b4`8503ede0 00007ffe`57c0b313 ntdll!RtlReportFatalFailure+0x9
- 01 000000b4`8503ee30 00007ffe`57c13b9e ntdll!RtlReportCriticalFailure+0x97
- 02 000000b4`8503ef20 00007ffe`57c13eaa ntdll!RtlpHeapHandleError+0x12
- 03 000000b4`8503ef50 00007ffe`57bae109 ntdll!RtlpHpHeapHandleError+0x7a
- 04 000000b4`8503ef80 00007ffe`57bbbb0e ntdll!RtlpLogHeapFailure+0x45
- 05 000000b4`8503efb0 00007ffe`17d17b3f ntdll!RtlFreeHeap+0x9d3ce
- 06 000000b4`8503f050 00007ffe`541392af AcLayers!NS_FaultTolerantHeap::APIHook_RtlFreeHeap+0x41f
- 07 000000b4`8503f0b0 00007ffe`3773b17e KERNELBASE!LocalFree+0x2f
- 08 000000b4`8503f0f0 00007ffe`37661d12 mscorlib_ni+0x58b17e
- 09 000000b4`8503f1a0 00007ffd`e49fe127 mscorlib_ni!System.Runtime.InteropServices.Marshal.FreeHGlobal+0x22 [f:\dd\ndp\clr\src\BCL\system\runtime\interopservices\marshal.cs @ 1212]
- ...
- 0:032> !clrstack
- OS Thread Id: 0x6250 (32)
- Child SP IP Call Site
- 000000b48503f118 00007ffe57baf0e4 [InlinedCallFrame: 000000b48503f118] Microsoft.Win32.Win32Native.LocalFree(IntPtr)
- 000000b48503f118 00007ffe3773b17e [InlinedCallFrame: 000000b48503f118] Microsoft.Win32.Win32Native.LocalFree(IntPtr)
- 000000b48503f0f0 00007ffe3773b17e DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr)
- 000000b48503f1a0 00007ffe37661d12 System.Runtime.InteropServices.Marshal.FreeHGlobal(IntPtr) [f:\dd\ndp\clr\src\BCL\system\runtime\interopservices\marshal.cs @ 1212]
- 000000b48503f1e0 00007ffde49fe127 b.B+A.MoveNext()
- 000000b48503f240 00007ffe376b3423 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 954]
- 000000b48503f310 00007ffe376b32b4 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:\dd\ndp\clr\src\BCL\system\threading\executioncontext.cs @ 902]
- ...
- 000000b48503f5c0 00007ffde49fb04e DomainBoundILStubClass.IL_STUB_ReversePInvoke(Int32, Int32, Int64)
复制代码 从卦中可以清晰的看到是 b.B+A.MoveNext 方法中调用了 FreeHGlobal 导致的NTHeap崩溃,如果你经验比较足的话,看到这个 FreeHGlobal 就应该想到 double free 问题,这是一个经典的问题。
3. 何为 double free
双释放即对一个 block 块进行二次释放,windows 的 RtlFreeHeap 方法会在业务逻辑中对这种情况直接判为异常,接下来你或许想知道这个 block 的地址是什么?这个可以用 !heap -s 观察,参考代码如下:- 0:032> !heap -s
- ************************************************************************************************************************
- NT HEAP STATS BELOW
- ************************************************************************************************************************
- Details:
- Heap address: 0000028c75bb0000
- Error address: 0000028c786018a0
- Error type: HEAP_FAILURE_BLOCK_NOT_BUSY
- Details: The caller performed an operation (such as a free
- or a size check) that is illegal on a free block.
- Follow-up: Check the error's stack trace to find the culprit.
- Stack trace:
- Stack trace at 0x00007ffe57c72848
- 00007ffe57bae109: ntdll!RtlpLogHeapFailure+0x45
- 00007ffe57bbbb0e: ntdll!RtlFreeHeap+0x9d3ce
- 00007ffe17d17b3f: AcLayers!NS_FaultTolerantHeap::APIHook_RtlFreeHeap+0x41f
- 00007ffe541392af: KERNELBASE!LocalFree+0x2f
- 00007ffe3773b17e: mscorlib_ni+0x58b17e
- 00007ffe37661d12: mscorlib_ni!System.Runtime.InteropServices.Marshal.FreeHGlobal+0x22
- 00007ffde49fe127: +0xe49fe127
- LFH Key : 0x765363a7204cf973
- Termination on corruption : ENABLED
- Heap Flags Reserv Commit Virt Free List UCR Virt Lock Fast
- (k) (k) (k) (k) length blocks cont. heap
- -------------------------------------------------------------------------------------
- 0000028c75bb0000 00000002 17920 9256 16364 2120 214 5 1 a LFH
- External fragmentation 23 % (214 free blocks)
- 0000028c75b40000 00008000 64 4 64 2 1 1 0 0
- 0000028c75de0000 00001002 2636 132 1080 20 5 2 0 0 LFH
- 0000028c76190000 00001002 4680 2268 3124 1420 40 3 0 0 LFH
- External fragmentation 62 % (40 free blocks)
- 0000028c76130000 00001002 2636 472 1080 5 27 2 0 0 LFH
- 0000028c767f0000 00041002 60 8 60 5 1 1 0 0
- 0000028c77020000 00041002 60 16 60 2 2 1 0 0
- -------------------------------------------------------------------------------------
复制代码 从卦中可以看到 Heap address: 0000028c75bb0000 即为 block 地址,接下来使用 !heap -x 0000028c786018a0 观察这个 block 块的状态,可以看到此时确实是 free 的。- 0:032> !heap -x 0000028c786018a0
- Entry User Heap Segment Size PrevSize Unused Flags
- -------------------------------------------------------------------------------------------------------------
- 0000028c786018a0 0000028c786018b0 0000028c75bb0000 0000028c785c80d0 e0 - 0 LFH;free
复制代码 到这里问题的成因我们是完全搞清楚了,接下来就是反推问题代码的时候了。
4. 问题代码在哪里
应该有朋友知道问题是在 b.B+A.MoveNext() 方法中,从名字上看这个项目应该是混淆的,有点搞哈。。。得要费点眼力,截图如下:
从卦中的 IntPtr intPtr = Interlocked.Exchange(ref b.A, IntPtr.Zero); 来看,这个 intPtr 是一个类级别变量,看样子是多个方法在操控类级别变量时没有合理的控制好,为了一探究竟,再次分析源代码,果然是的,截图如下:
到这里就真相大白了,让朋友修改源码自己控制好这个变量。
三:总结
这次生产事故是一个比较经典的 doublefree 问题,没接触过的话可能还是需要走一些弯路的,像我们这种老江湖,看到一二个特征这个问题就经注定解开!
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |