PS:要转载请注明出处,本人版权所有。
PS: 这个只是基于《我自己》的理解,
如果和你的原则及想法相冲突,请谅解,勿喷。
环境说明
无
前言
在集成和定制llama.cpp工程的时候,做了许多工作,也遇到了很多问题,但是绝大部分问题都是很快就能解决的,少部分问题花一些时间也能解决掉,其中有两个关联问题是让我最印象深刻的。为了整理和探究这两个问题的根源,特在此编写本文。且在写本文这段时间内,也整理和提了一个关联的pr给llama.cpp(https://github.com/ggml-org/llama.cpp/pull/17653)。
首先我们有如下的代码示例:- try{
- {
- // ... ...
- if (!std::filesystem::exists("/bugreports"))
- // ... ...
- }
- {
- std::filesystem::directory_iterator dir_it("/", fs::directory_options::skip_permission_denied);
- for (const auto & entry : dir_it) {
- // ... ...
- }
- // ... ...
- }
- return ;
- }
- catch (const std::exception& e){
- printf("exception: %s\n", e.what());
- return ;
- }
- catch(...){
- printf("Fatal Error, Unkown exception\n");
- return ;
- }
复制代码 根据上面的代码示例,在不同的编译条件、同一个执行环境(软、硬件)下它3个code-block分支都会走,这让我简直头大。下面是两个catch-code-block部分的输出:- exception: filesystem error: in posix_stat: failed to determine attributes for the specified path: Permission denied ["/bugreports"]
复制代码- Fatal Error, Unkown exception
复制代码 当然,上面的3个code-block其实对应这几个问题:
- 为什么同一个设备,同一段代码在不同条件下执行3个不同的分支,尤其是什么情况下正常执行,什么情况下抛出异常?
- std::filesystem::exists/std::filesystem::directory_iterator 什么情况下会抛出异常?
- 对于std::filesystem::exists/std::filesystem::directory_iterator抛出的异常来说,为什么捕获路径不一样(是否能抓到filesystem error)?
下面我们分别对这几个问题进行分析(以std::filesystem::exists为例)。
问题初步分析
为什么同一设备,同一代码,不同编译条件可以正常或者异常运行?
在我的例子里面,根据我的实际测试反馈来看,在build.gradle里面,【 compileSdk = 34,minSdk = 34,ndk=26】【 compileSdk = 34,minSdk = 34,ndk=26】两种不同配置,导致运行结果不一样,当minSdk=26时,代码会抛出异常,当minSdk=34时,代码正常运行。
经过上面的分析和测试,我们可以得到一个猜(可能性极大)的原因:因为ndk版本是一样的,意味着上面的标准库实现是一样的,因此这个现象的主要原因还是不同的编译条件,让我们使用posix api访问/bugreports目录时,posix api有不同的返回。
更底层的原因导致posix api有不同的返回,我不是很了解、不熟悉android的底层系统细节,因此就不继续排查了,有缘再说,下次一定。
接着我们排查一下c++标准库的std::filesystem::exists实现,看看异常从哪里来?
什么情况下std::filesystem::exists会抛出异常?
我们先查看https://en.cppreference.com/w/cpp/filesystem/exists.html,其定义如下:- bool exists( std::filesystem::file_status s ) noexcept; (1) (since C++17)
- bool exists( const std::filesystem::path& p ); (2) (since C++17)
- bool exists( const std::filesystem::path& p, std::error_code& ec ) noexcept; (3) (since C++17)
- /*
- Exceptions
- Any overload not marked noexcept may throw std::bad_alloc if memory allocation fails.
- 2) Throws std::filesystem::filesystem_error on underlying OS API errors, constructed with p as the first path argument and the OS error code as the error code argument.
- */
复制代码 因此,对于我们上文的用法,如果底层OS的API出现问题,那么会抛出异常,这个现象是符合标准定义的。
下面我们来看看exists的源码具体实现(libcxx):- inline _LIBCPP_HIDE_FROM_ABI bool exists(const path& __p) { return exists(__status(__p)); }
- _LIBCPP_EXPORTED_FROM_ABI file_status __status(const path&, error_code* __ec = nullptr);
- file_status __status(const path& p, error_code* ec) { return detail::posix_stat(p, ec); }
- inline file_status posix_stat(path const& p, error_code* ec) {
- StatT path_stat;
- return posix_stat(p, path_stat, ec);
- }
- inline file_status posix_stat(path const& p, StatT& path_stat, error_code* ec) {
- error_code m_ec;
- if (detail::stat(p.c_str(), &path_stat) == -1)
- m_ec = detail::capture_errno();
- return create_file_status(m_ec, p, path_stat, ec);
- }
- namespace detail {
- using ::stat; //<sys/stat.h>
- } // end namespace detail
- inline file_status create_file_status(error_code& m_ec, path const& p, const StatT& path_stat, error_code* ec) {
- if (ec)
- *ec = m_ec;
- if (m_ec && (m_ec.value() == ENOENT || m_ec.value() == ENOTDIR)) {
- return file_status(file_type::not_found);
- } else if (m_ec) {
- ErrorHandler<void> err("posix_stat", ec, &p);
- err.report(m_ec, "failed to determine attributes for the specified path");
- return file_status(file_type::none);
- }
- // ... ... other code
- }
复制代码 因此exists()抛异常的根本原因就是,调用detail::stat的时候,产生了Permission denied 错误,然后在create_file_status中抛出了异常。
对于std::filesystem::filesystem_error异常,在不同位置捕获的原因?
根据上面的最小化测试代码,再一次对整体构建过程进行排查后,有如下发现:
- 当上面的代码在一个so中,如果启用了-Wl,--version-script功能,导致未导出vtable和typeinfo对象的符号(Android)。
- 在x86里面构建上面同样的实例时,发现启用了-Wl,--version-script功能,默认也能导出了vtable和typeinfo对象的符号。
上面的现象把我搞郁闷了,经过编译器、链接器、编译参数、链接参数和符号等相关的排查,终于在一个位置发现了一些奇怪的东西:- # readelf -sW build/libnativelib.so|grep fs10filesystem16filesystem_errorE
- # 下面的so能在catch (const std::exception& e)中捕获异常,nm -CD 也有fs10filesystem16filesystem_errorE相关的符号
- 12: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND _ZTINSt6__ndk14__fs10filesystem16filesystem_errorE
- 18: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND _ZTVNSt6__ndk14__fs10filesystem16filesystem_errorE
- 235: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND _ZTINSt6__ndk14__fs10filesystem16filesystem_errorE
- 241: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND _ZTVNSt6__ndk14__fs10filesystem16filesystem_errorE
- # 下面的so只能在catch(...)捕获异常,nm -CD 没有fs10filesystem16filesystem_errorE相关的符号
- 393: 0000000000036340 24 OBJECT LOCAL DEFAULT 17 _ZTINSt6__ndk14__fs10filesystem16filesystem_errorE
- 395: 0000000000036318 40 OBJECT LOCAL DEFAULT 17 _ZTVNSt6__ndk14__fs10filesystem16filesystem_errorE
- 410: 000000000000ad5a 47 OBJECT LOCAL DEFAULT 11 _ZTSNSt6__ndk14__fs10filesystem16filesystem_errorE
复制代码 上面我们可以知道,正常的so,其相关的typeinfo/vtable是GLOBAL 且未定义的,其定义应该在libc++.so或者libstdc++.so的。而异常的so相关的typeinfo/vtable的符号是LOCAL且已经定义了。
经过一系列查询,上面问题的差异出在ANDROID_STL在cmake中默认是c++_static的(https://developer.android.com/ndk/guides/cpp-support?hl=zh-cn#selecting_a_c_runtime),这个时候c++标准库的实现是以静态库的方式链接到我的so,因此相关的实现是local的,现在只需要改为c++_shared就解决了上面的异常路径不一致的情况。
此外,当我还是用c++_static继续编译,只是手动把typeinfo/vtable的符号都导出为依赖libc++.so或者libstdc++.so时,发现也能够正常捕获异常了。
上面我们只是找到了引起问题的地方,但是没有回答,为什么nm -CD 没有fs10filesystem16filesystem_errorE相关的typeinfo/vtable符号的时候,只有catch(...)能捕获异常。要回答这个问题,我们得去初步看一下c++异常机制是怎么实现的,下面我们继续分析。
c++标准库异常实现原理简单分析
为了尽可能的贴近我的遇到问题的场景和方便调试,且不同ABI的异常实现可能不一致,下面基于clang,x64,来分析c++异常实现的基本原理(Itanium C++ ABI)。
首先我们来看看我们throw一个异常的时候调用的汇编代码是什么?- extern "C" __attribute__((visibility("default"))) void pp()
- {
- throw std::runtime_error("test_exception");
- }
复制代码- 0x00007ffff7f9a380 <+0>: push %rbp
- 0x00007ffff7f9a381 <+1>: mov %rsp,%rbp
- 0x00007ffff7f9a384 <+4>: sub $0x20,%rsp
- 0x00007ffff7f9a388 <+8>: mov $0x10,%edi
- => 0x00007ffff7f9a38d <+13>: call 0x7ffff7fb48e0 <__cxa_allocate_exception>
- 0x00007ffff7f9a392 <+18>: mov %rax,%rdi
- 0x00007ffff7f9a395 <+21>: mov %rdi,%rax
- 0x00007ffff7f9a398 <+24>: mov %rax,-0x18(%rbp)
- 0x00007ffff7f9a39c <+28>: lea -0x902d(%rip),%rsi # 0x7ffff7f91376
- 0x00007ffff7f9a3a3 <+35>: call 0x7ffff7fb5e80 <_ZNSt13runtime_errorC2EPKc>
- 0x00007ffff7f9a3a8 <+40>: jmp 0x7ffff7f9a3ad <pp()+45>
- 0x00007ffff7f9a3ad <+45>: mov -0x18(%rbp),%rdi
- 0x00007ffff7f9a3b1 <+49>: lea 0x1d158(%rip),%rsi # 0x7ffff7fb7510 <_ZTISt13runtime_error>
- 0x00007ffff7f9a3b8 <+56>: lea 0xb1(%rip),%rdx # 0x7ffff7f9a470 <_ZNSt15underflow_errorD2Ev>
- 0x00007ffff7f9a3bf <+63>: call 0x7ffff7fb4b00 <__cxa_throw>
- 0x00007ffff7f9a3c4 <+68>: mov -0x18(%rbp),%rdi
- 0x00007ffff7f9a3c8 <+72>: mov %rax,%rcx
- 0x00007ffff7f9a3cb <+75>: mov %edx,%eax
- 0x00007ffff7f9a3cd <+77>: mov %rcx,-0x8(%rbp)
- 0x00007ffff7f9a3d1 <+81>: mov %eax,-0xc(%rbp)
- 0x00007ffff7f9a3d4 <+84>: call 0x7ffff7fb49c0 <__cxa_free_exception>
- 0x00007ffff7f9a3d9 <+89>: mov -0x8(%rbp),%rdi
- 0x00007ffff7f9a3dd <+93>: call 0x7ffff7fb6160 <_Unwind_Resume@plt>
复制代码 从上面的代码可以知道,先调用__cxa_allocate_exception在特定空间分配内存(不是一般的堆栈空间,避免干扰堆栈),然后调用placement new 在前面的空间上面构造std::runtime_error对象,然后执行__cxa_throw开始堆栈展开,查找异常链。这个链接介绍了cpp标准里面对异常展开流程的描述(https://en.cppreference.com/w/cpp/language/throw.html)。
下面我们通过查看__cxa_throw的源码,看看libc++对异常展开是怎么实现的。
libcxxabi\src\cxa_exception.cpp- void
- __cxa_throw(void *thrown_object, std::type_info *tinfo, void (_LIBCXXABI_DTOR_FUNC *dest)(void *)) {
- __cxa_eh_globals *globals = __cxa_get_globals();
- __cxa_exception* exception_header = cxa_exception_from_thrown_object(thrown_object);
- exception_header->unexpectedHandler = std::get_unexpected();
- exception_header->terminateHandler = std::get_terminate();
- exception_header->exceptionType = tinfo;
- exception_header->exceptionDestructor = dest;
- setOurExceptionClass(&exception_header->unwindHeader);
- exception_header->referenceCount = 1; // This is a newly allocated exception, no need for thread safety.
- globals->uncaughtExceptions += 1; // Not atomically, since globals are thread-local
- exception_header->unwindHeader.exception_cleanup = exception_cleanup_func;
- #if __has_feature(address_sanitizer)
- // Inform the ASan runtime that now might be a good time to clean stuff up.
- __asan_handle_no_return();
- #endif
- #ifdef __USING_SJLJ_EXCEPTIONS__
- _Unwind_SjLj_RaiseException(&exception_header->unwindHeader);
- #else
- _Unwind_RaiseException(&exception_header->unwindHeader);
- #endif
- // This only happens when there is no handler, or some unexpected unwinding
- // error happens.
- failed_throw(exception_header);
- }
复制代码 这里可以看到,首先函数3个参数分别是:刚刚的std::runtime_error对象,异常对象的typeinfo,std::runtime_error对应的析构函数。然后就开始根据不同的异常实现,开始展开堆栈。此外,这里有个地方可以值得注意:exceptionType 很明显就是我们本文的问题有关系,如果没有导出对应的typeinfo,很有可能在其他地方无法匹配这个异常。
还有这里补充一个细节:现在常见的异常模型大概有3类,SJLJ(setjump-longjump),DWARF,SEH (Windows),当前类linux用的异常模型是DWARF中的定义。
根据上面的执行流,我们接着来看_Unwind_RaiseException的实现。
libunwind\src\UnwindLevel1.c- /// Called by __cxa_throw. Only returns if there is a fatal error.
- _LIBUNWIND_EXPORT _Unwind_Reason_Code
- _Unwind_RaiseException(_Unwind_Exception *exception_object) {
- _LIBUNWIND_TRACE_API("_Unwind_RaiseException(ex_obj=%p)",
- static_cast<void *>(exception_object));
- unw_context_t uc;
- unw_cursor_t cursor;
- __unw_getcontext(&uc);
- // This field for is for compatibility with GCC to say this isn't a forced
- // unwind. EHABI #7.2
- exception_object->unwinder_cache.reserved1 = 0;
- // phase 1: the search phase
- _Unwind_Reason_Code phase1 = unwind_phase1(&uc, &cursor, exception_object);
- if (phase1 != _URC_NO_REASON)
- return phase1;
- // phase 2: the clean up phase
- return unwind_phase2(&uc, &cursor, exception_object, false);
- }
复制代码 从这里来看,异常展开分为了两个阶段,phase1和phase2,从备注来看就是搜索、清理。下面我们先来看unwind_phase1的做了什么。
libunwind\src\UnwindLevel1.c- static _Unwind_Reason_Code
- unwind_phase1(unw_context_t *uc, unw_cursor_t *cursor, _Unwind_Exception *exception_object) {
- __unw_init_local(cursor, uc);
- // Walk each frame looking for a place to stop.
- while (true) {
- // Ask libunwind to get next frame (skip over first which is
- // _Unwind_RaiseException).
- int stepResult = __unw_step(cursor);
- // ... ...
- // See if frame has code to run (has personality routine).
- unw_proc_info_t frameInfo;
- unw_word_t sp;
- if (__unw_get_proc_info(cursor, &frameInfo) != UNW_ESUCCESS) {
- // ... ...
- }
- // ... ...
- // If there is a personality routine, ask it if it will want to stop at
- // this frame.
- if (frameInfo.handler != 0) {
- _Unwind_Personality_Fn p =
- (_Unwind_Personality_Fn)(uintptr_t)(frameInfo.handler);
- _LIBUNWIND_TRACE_UNWINDING(
- "unwind_phase1(ex_ojb=%p): calling personality function %p",
- (void *)exception_object, (void *)(uintptr_t)p);
- _Unwind_Reason_Code personalityResult =
- (*p)(1, _UA_SEARCH_PHASE, exception_object->exception_class,
- exception_object, (struct _Unwind_Context *)(cursor));
- switch (personalityResult) {
- case _URC_HANDLER_FOUND:
- // found a catch clause or locals that need destructing in this frame
- // stop search and remember stack pointer at the frame
- __unw_get_reg(cursor, UNW_REG_SP, &sp);
- exception_object->private_2 = (uintptr_t)sp;
- _LIBUNWIND_TRACE_UNWINDING(
- "unwind_phase1(ex_ojb=%p): _URC_HANDLER_FOUND",
- (void *)exception_object);
- return _URC_NO_REASON;
- case _URC_CONTINUE_UNWIND:
- _LIBUNWIND_TRACE_UNWINDING(
- "unwind_phase1(ex_ojb=%p): _URC_CONTINUE_UNWIND",
- (void *)exception_object);
- // continue unwinding
- break;
- default:
- // something went wrong
- _LIBUNWIND_TRACE_UNWINDING(
- "unwind_phase1(ex_ojb=%p): _URC_FATAL_PHASE1_ERROR",
- (void *)exception_object);
- return _URC_FATAL_PHASE1_ERROR;
- }
- }
- }
- return _URC_NO_REASON;
- }
复制代码- static _Unwind_Reason_Code
- unwind_phase2(unw_context_t *uc, unw_cursor_t *cursor, _Unwind_Exception *exception_object) {
- __unw_init_local(cursor, uc);
- _LIBUNWIND_TRACE_UNWINDING("unwind_phase2(ex_ojb=%p)",
- (void *)exception_object);
- // uc is initialized by __unw_getcontext in the parent frame. The first stack
- // frame walked is unwind_phase2.
- unsigned framesWalked = 1;
- // Walk each frame until we reach where search phase said to stop.
- while (true) {
- // Ask libunwind to get next frame (skip over first which is
- // _Unwind_RaiseException).
- int stepResult = __unw_step(cursor);
- // ... ...
- // Get info about this frame.
- unw_word_t sp;
- unw_proc_info_t frameInfo;
- __unw_get_reg(cursor, UNW_REG_SP, &sp);
- if (__unw_get_proc_info(cursor, &frameInfo) != UNW_ESUCCESS) {
- // ... ...
- }
- // ... ...
- ++framesWalked;
- // If there is a personality routine, tell it we are unwinding.
- if (frameInfo.handler != 0) {
- _Unwind_Personality_Fn p =
- (_Unwind_Personality_Fn)(uintptr_t)(frameInfo.handler);
- _Unwind_Action action = _UA_CLEANUP_PHASE;
- if (sp == exception_object->private_2) {
- // Tell personality this was the frame it marked in phase 1.
- action = (_Unwind_Action)(_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME);
- }
- _Unwind_Reason_Code personalityResult =
- (*p)(1, action, exception_object->exception_class, exception_object,
- (struct _Unwind_Context *)(cursor));
- switch (personalityResult) {
- case _URC_CONTINUE_UNWIND:
- // Continue unwinding
- _LIBUNWIND_TRACE_UNWINDING(
- "unwind_phase2(ex_ojb=%p): _URC_CONTINUE_UNWIND",
- (void *)exception_object);
- if (sp == exception_object->private_2) {
- // Phase 1 said we would stop at this frame, but we did not...
- _LIBUNWIND_ABORT("during phase1 personality function said it would "
- "stop here, but now in phase2 it did not stop here");
- }
- break;
- case _URC_INSTALL_CONTEXT:
- _LIBUNWIND_TRACE_UNWINDING(
- "unwind_phase2(ex_ojb=%p): _URC_INSTALL_CONTEXT",
- (void *)exception_object);
- // Personality routine says to transfer control to landing pad.
- // We may get control back if landing pad calls _Unwind_Resume().
- if (_LIBUNWIND_TRACING_UNWINDING) {
- unw_word_t pc;
- __unw_get_reg(cursor, UNW_REG_IP, &pc);
- __unw_get_reg(cursor, UNW_REG_SP, &sp);
- _LIBUNWIND_TRACE_UNWINDING("unwind_phase2(ex_ojb=%p): re-entering "
- "user code with ip=0x%" PRIxPTR
- ", sp=0x%" PRIxPTR,
- (void *)exception_object, pc, sp);
- }
- __unw_phase2_resume(cursor, framesWalked);
- // __unw_phase2_resume() only returns if there was an error.
- return _URC_FATAL_PHASE2_ERROR;
- default:
- // Personality routine returned an unknown result code.
- _LIBUNWIND_DEBUG_LOG("personality function returned unknown result %d",
- personalityResult);
- return _URC_FATAL_PHASE2_ERROR;
- }
- }
- }
- // Clean up phase did not resume at the frame that the search phase
- // said it would...
- return _URC_FATAL_PHASE2_ERROR;
- }
复制代码 这里的代码也很明晰,首先获取了当前栈帧的信息,然后将frameInfo.handler转换为_Unwind_Personality_Fn处理函数,然后调用这个函数进行处理。这里有两种情况:
- unwind_phase1,当action=_UA_SEARCH_PHASE时,代码我们当前阶段是通过_Unwind_Personality_Fn搜索catch代码块,当找到处理块时,返回_URC_HANDLER_FOUND,并给exception_object->private_2赋值,方便在第二阶段进行执行。
- unwind_phase2,exception_object->private_2 == sp时,当action=(_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME)时,我们开始调用_Unwind_Personality_Fn安装对应的catch-block,然后返回_URC_INSTALL_CONTEXT,最后执行__unw_phase2_resume开始执行异常处理。
此外,这里的 __unw_init_local执行了一个非常重要的操作,那就是找到了.eh_frame的位置,下面简单看一下代码流程:- inline bool LocalAddressSpace::findUnwindSections(pint_t targetAddr,
- UnwindInfoSections &info) {
- // ... ...
- info.dso_base = 0;
- // Bare metal is statically linked, so no need to ask the dynamic loader
- info.dwarf_section_length = (size_t)(&__eh_frame_end - &__eh_frame_start);
- info.dwarf_section = (uintptr_t)(&__eh_frame_start);
- // ... ...
- }
- template <typename A, typename R>
- void UnwindCursor::setInfoBasedOnIPRegister(bool isReturnAddress) {
- // ... ...
- // Ask address space object to find unwind sections for this pc.
- UnwindInfoSections sects;
- if (_addressSpace.findUnwindSections(pc, sects))
- // ... ...
- }
- // template <typename A, typename R>
- // int UnwindCursor::step() {
- // // ... ...
- // this->setInfoBasedOnIPRegister(true);
- // // ... ...
- // }
- _LIBUNWIND_HIDDEN int __unw_init_local(unw_cursor_t *cursor,
- unw_context_t *context) {
- // ... ...
- // Use "placement new" to allocate UnwindCursor in the cursor buffer.
- new (reinterpret_cast<UnwindCursor<LocalAddressSpace, REGISTER_KIND> *>(cursor))
- UnwindCursor<LocalAddressSpace, REGISTER_KIND>(
- context, LocalAddressSpace::sThisAddressSpace);
- #undef REGISTER_KIND
- AbstractUnwindCursor *co = (AbstractUnwindCursor *)cursor;
- co->setInfoBasedOnIPRegister();
- return UNW_ESUCCESS;
- }
复制代码 这里的_Unwind_Personality_Fn函数是itanium-cxx-abi 定义的,定义文档在这个位置https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html#cxx-throw。主要作用就是和c++特性相关的堆栈展开特定代码,这个函数在gcc/clang里面叫做:__gxx_personality_v0,我们直接去看他的源码。
libcxxabi\src\cxa_personality.cpp- #if !defined(_LIBCXXABI_ARM_EHABI)
- #if defined(__SEH__) && !defined(__USING_SJLJ_EXCEPTIONS__)
- static _Unwind_Reason_Code __gxx_personality_imp
- #else
- _LIBCXXABI_FUNC_VIS _Unwind_Reason_Code
- #ifdef __USING_SJLJ_EXCEPTIONS__
- __gxx_personality_sj0
- #elif defined(__MVS__)
- __zos_cxx_personality_v2
- #else
- __gxx_personality_v0
- #endif
- #endif
- (int version, _Unwind_Action actions, uint64_t exceptionClass,
- _Unwind_Exception* unwind_exception, _Unwind_Context* context)
- {
- if (version != 1 || unwind_exception == 0 || context == 0)
- return _URC_FATAL_PHASE1_ERROR;
- bool native_exception = (exceptionClass & get_vendor_and_language) ==
- (kOurExceptionClass & get_vendor_and_language);
- scan_results results;
- // Process a catch handler for a native exception first.
- if (actions == (_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME) &&
- native_exception) {
- // Reload the results from the phase 1 cache.
- __cxa_exception* exception_header =
- (__cxa_exception*)(unwind_exception + 1) - 1;
- results.ttypeIndex = exception_header->handlerSwitchValue;
- results.actionRecord = exception_header->actionRecord;
- results.languageSpecificData = exception_header->languageSpecificData;
- results.landingPad =
- reinterpret_cast<uintptr_t>(exception_header->catchTemp);
- results.adjustedPtr = exception_header->adjustedPtr;
- // Jump to the handler.
- set_registers(unwind_exception, context, results);
- // Cache base for calculating the address of ttype in
- // __cxa_call_unexpected.
- if (results.ttypeIndex < 0) {
- #if defined(_AIX)
- exception_header->catchTemp = (void *)_Unwind_GetDataRelBase(context);
- #else
- exception_header->catchTemp = 0;
- #endif
- }
- return _URC_INSTALL_CONTEXT;
- }
- // In other cases we need to scan LSDA.
- scan_eh_tab(results, actions, native_exception, unwind_exception, context);
- if (results.reason == _URC_CONTINUE_UNWIND ||
- results.reason == _URC_FATAL_PHASE1_ERROR)
- return results.reason;
- if (actions & _UA_SEARCH_PHASE)
- {
- // Phase 1 search: All we're looking for in phase 1 is a handler that
- // halts unwinding
- assert(results.reason == _URC_HANDLER_FOUND);
- if (native_exception) {
- // For a native exception, cache the LSDA result.
- __cxa_exception* exc = (__cxa_exception*)(unwind_exception + 1) - 1;
- exc->handlerSwitchValue = static_cast<int>(results.ttypeIndex);
- exc->actionRecord = results.actionRecord;
- exc->languageSpecificData = results.languageSpecificData;
- exc->catchTemp = reinterpret_cast<void*>(results.landingPad);
- exc->adjustedPtr = results.adjustedPtr;
- }
- return _URC_HANDLER_FOUND;
- }
- assert(actions & _UA_CLEANUP_PHASE);
- assert(results.reason == _URC_HANDLER_FOUND);
- set_registers(unwind_exception, context, results);
- // Cache base for calculating the address of ttype in __cxa_call_unexpected.
- if (results.ttypeIndex < 0) {
- __cxa_exception* exception_header =
- (__cxa_exception*)(unwind_exception + 1) - 1;
- #if defined(_AIX)
- exception_header->catchTemp = (void *)_Unwind_GetDataRelBase(context);
- #else
- exception_header->catchTemp = 0;
- #endif
- }
- return _URC_INSTALL_CONTEXT;
- }
复制代码 我们从整体来看这段代码,从上面可以知道,phase1,phase2都会调用到这里来:
- phase1, action=_UA_SEARCH_PHASE, 调用scan_eh_tab查找catch-block,并返回_URC_HANDLER_FOUND
- phase2, action=(_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME),通过set_registers设置对应的catch-block,然后返回_URC_INSTALL_CONTEXT,然后在__unw_phase2_resume执行对应的catch-block。
从上面的实现来看,scan_eh_tab是核心,其正是展开异常搜索和匹配的关键。其源码如下- static void scan_eh_tab(scan_results &results, _Unwind_Action actions,
- bool native_exception,
- _Unwind_Exception *unwind_exception,
- _Unwind_Context *context) {
- // Initialize results to found nothing but an error
- results.ttypeIndex = 0;
- results.actionRecord = 0;
- results.languageSpecificData = 0;
- results.landingPad = 0;
- results.adjustedPtr = 0;
- results.reason = _URC_FATAL_PHASE1_ERROR;
- // Check for consistent actions
- // ... ...
- // Start scan by getting exception table address.
- const uint8_t *lsda = (const uint8_t *)_Unwind_GetLanguageSpecificData(context);
- if (lsda == 0)
- {
- // There is no exception table
- results.reason = _URC_CONTINUE_UNWIND;
- return;
- }
- results.languageSpecificData = lsda;
- #if defined(_AIX)
- uintptr_t base = _Unwind_GetDataRelBase(context);
- #else
- uintptr_t base = 0;
- #endif
- // Get the current instruction pointer and offset it before next
- // instruction in the current frame which threw the exception.
- uintptr_t ip = _Unwind_GetIP(context) - 1;
- // Get beginning current frame's code (as defined by the
- // emitted dwarf code)
- uintptr_t funcStart = _Unwind_GetRegionStart(context);
- #ifdef __USING_SJLJ_EXCEPTIONS__
- if (ip == uintptr_t(-1))
- {
- // no action
- results.reason = _URC_CONTINUE_UNWIND;
- return;
- }
- else if (ip == 0)
- call_terminate(native_exception, unwind_exception);
- // ip is 1-based index into call site table
- #else // !__USING_SJLJ_EXCEPTIONS__
- uintptr_t ipOffset = ip - funcStart;
- #endif // !defined(_USING_SLJL_EXCEPTIONS__)
- const uint8_t* classInfo = NULL;
- // Note: See JITDwarfEmitter::EmitExceptionTable(...) for corresponding
- // dwarf emission
- // Parse LSDA header.
- uint8_t lpStartEncoding = *lsda++;
- const uint8_t* lpStart =
- (const uint8_t*)readEncodedPointer(&lsda, lpStartEncoding, base);
- if (lpStart == 0)
- lpStart = (const uint8_t*)funcStart;
- uint8_t ttypeEncoding = *lsda++;
- if (ttypeEncoding != DW_EH_PE_omit)
- {
- // Calculate type info locations in emitted dwarf code which
- // were flagged by type info arguments to llvm.eh.selector
- // intrinsic
- uintptr_t classInfoOffset = readULEB128(&lsda);
- classInfo = lsda + classInfoOffset;
- }
- // Walk call-site table looking for range that
- // includes current PC.
- uint8_t callSiteEncoding = *lsda++;
- #ifdef __USING_SJLJ_EXCEPTIONS__
- (void)callSiteEncoding; // When using SjLj exceptions, callSiteEncoding is never used
- #endif
- uint32_t callSiteTableLength = static_cast<uint32_t>(readULEB128(&lsda));
- const uint8_t* callSiteTableStart = lsda;
- const uint8_t* callSiteTableEnd = callSiteTableStart + callSiteTableLength;
- const uint8_t* actionTableStart = callSiteTableEnd;
- const uint8_t* callSitePtr = callSiteTableStart;
- while (callSitePtr < callSiteTableEnd)
- {
- // There is one entry per call site.
- #ifndef __USING_SJLJ_EXCEPTIONS__
- // The call sites are non-overlapping in [start, start+length)
- // The call sites are ordered in increasing value of start
- uintptr_t start = readEncodedPointer(&callSitePtr, callSiteEncoding);
- uintptr_t length = readEncodedPointer(&callSitePtr, callSiteEncoding);
- uintptr_t landingPad = readEncodedPointer(&callSitePtr, callSiteEncoding);
- uintptr_t actionEntry = readULEB128(&callSitePtr);
- if ((start <= ipOffset) && (ipOffset < (start + length)))
- #else // __USING_SJLJ_EXCEPTIONS__
- // ip is 1-based index into this table
- uintptr_t landingPad = readULEB128(&callSitePtr);
- uintptr_t actionEntry = readULEB128(&callSitePtr);
- if (--ip == 0)
- #endif // __USING_SJLJ_EXCEPTIONS__
- {
- // Found the call site containing ip.
- #ifndef __USING_SJLJ_EXCEPTIONS__
- if (landingPad == 0)
- {
- // No handler here
- results.reason = _URC_CONTINUE_UNWIND;
- return;
- }
- landingPad = (uintptr_t)lpStart + landingPad;
- #else // __USING_SJLJ_EXCEPTIONS__
- ++landingPad;
- #endif // __USING_SJLJ_EXCEPTIONS__
- results.landingPad = landingPad;
- if (actionEntry == 0)
- {
- // Found a cleanup
- results.reason = actions & _UA_SEARCH_PHASE
- ? _URC_CONTINUE_UNWIND
- : _URC_HANDLER_FOUND;
- return;
- }
- // Convert 1-based byte offset into
- const uint8_t* action = actionTableStart + (actionEntry - 1);
- bool hasCleanup = false;
- // Scan action entries until you find a matching handler, cleanup, or the end of action list
- while (true)
- {
- const uint8_t* actionRecord = action;
- int64_t ttypeIndex = readSLEB128(&action);
- if (ttypeIndex > 0)
- {
- // Found a catch, does it actually catch?
- // First check for catch (...)
- const __shim_type_info* catchType =
- get_shim_type_info(static_cast<uint64_t>(ttypeIndex),
- classInfo, ttypeEncoding,
- native_exception, unwind_exception,
- base);
- if (catchType == 0)
- {
- // Found catch (...) catches everything, including
- // foreign exceptions. This is search phase, cleanup
- // phase with foreign exception, or forced unwinding.
- assert(actions & (_UA_SEARCH_PHASE | _UA_HANDLER_FRAME |
- _UA_FORCE_UNWIND));
- results.ttypeIndex = ttypeIndex;
- results.actionRecord = actionRecord;
- results.adjustedPtr =
- get_thrown_object_ptr(unwind_exception);
- results.reason = _URC_HANDLER_FOUND;
- return;
- }
- // Else this is a catch (T) clause and will never
- // catch a foreign exception
- else if (native_exception)
- {
- __cxa_exception* exception_header = (__cxa_exception*)(unwind_exception+1) - 1;
- void* adjustedPtr = get_thrown_object_ptr(unwind_exception);
- const __shim_type_info* excpType =
- static_cast<const __shim_type_info*>(exception_header->exceptionType);
- if (adjustedPtr == 0 || excpType == 0)
- {
- // Something very bad happened
- call_terminate(native_exception, unwind_exception);
- }
- if (catchType->can_catch(excpType, adjustedPtr))
- {
- // Found a matching handler. This is either search
- // phase or forced unwinding.
- assert(actions &
- (_UA_SEARCH_PHASE | _UA_FORCE_UNWIND));
- results.ttypeIndex = ttypeIndex;
- results.actionRecord = actionRecord;
- results.adjustedPtr = adjustedPtr;
- results.reason = _URC_HANDLER_FOUND;
- return;
- }
- }
- // Scan next action ...
- }
- else if (ttypeIndex < 0)
- {
- // Found an exception specification.
- if (actions & _UA_FORCE_UNWIND) {
- // Skip if forced unwinding.
- } else if (native_exception) {
- // Does the exception spec catch this native exception?
- __cxa_exception* exception_header = (__cxa_exception*)(unwind_exception+1) - 1;
- void* adjustedPtr = get_thrown_object_ptr(unwind_exception);
- const __shim_type_info* excpType =
- static_cast<const __shim_type_info*>(exception_header->exceptionType);
- if (adjustedPtr == 0 || excpType == 0)
- {
- // Something very bad happened
- call_terminate(native_exception, unwind_exception);
- }
- if (exception_spec_can_catch(ttypeIndex, classInfo,
- ttypeEncoding, excpType,
- adjustedPtr,
- unwind_exception, base))
- {
- // Native exception caught by exception
- // specification.
- assert(actions & _UA_SEARCH_PHASE);
- results.ttypeIndex = ttypeIndex;
- results.actionRecord = actionRecord;
- results.adjustedPtr = adjustedPtr;
- results.reason = _URC_HANDLER_FOUND;
- return;
- }
- } else {
- // foreign exception caught by exception spec
- results.ttypeIndex = ttypeIndex;
- results.actionRecord = actionRecord;
- results.adjustedPtr =
- get_thrown_object_ptr(unwind_exception);
- results.reason = _URC_HANDLER_FOUND;
- return;
- }
- // Scan next action ...
- } else {
- hasCleanup = true;
- }
- const uint8_t* temp = action;
- int64_t actionOffset = readSLEB128(&temp);
- if (actionOffset == 0)
- {
- // End of action list. If this is phase 2 and we have found
- // a cleanup (ttypeIndex=0), return _URC_HANDLER_FOUND;
- // otherwise return _URC_CONTINUE_UNWIND.
- results.reason = hasCleanup && actions & _UA_CLEANUP_PHASE
- ? _URC_HANDLER_FOUND
- : _URC_CONTINUE_UNWIND;
- return;
- }
- // Go to next action
- action += actionOffset;
- } // there is no break out of this loop, only return
- }
- #ifndef __USING_SJLJ_EXCEPTIONS__
- else if (ipOffset < start)
- {
- // There is no call site for this ip
- // Something bad has happened. We should never get here.
- // Possible stack corruption.
- call_terminate(native_exception, unwind_exception);
- }
- #endif // !__USING_SJLJ_EXCEPTIONS__
- } // there might be some tricky cases which break out of this loop
- // It is possible that no eh table entry specify how to handle
- // this exception. By spec, terminate it immediately.
- call_terminate(native_exception, unwind_exception);
- }
复制代码 从这里可以看到,这里的核心就是获取lsda数据(_Unwind_GetLanguageSpecificData, .gcc_except_table段),然后用上下文传过来的抛出的异常信息来匹配,如果匹配上,就找到了对应的catch字段,我们就返回并执行,如果没有匹配上,就只有调用std::terminate了。
其实这里的解析lsda,就能找到对应的catch-block,因此我们需要了解一下lsda的大致结构:- /*
- Exception Handling Table Layout:
- +-----------------+--------+
- | lpStartEncoding | (char) |
- +---------+-------+--------+---------------+-----------------------+
- | lpStart | (encoded with lpStartEncoding) | defaults to funcStart |
- +---------+-----+--------+-----------------+---------------+-------+
- | ttypeEncoding | (char) | Encoding of the type_info table |
- +---------------+-+------+----+----------------------------+----------------+
- | classInfoOffset | (ULEB128) | Offset to type_info table, defaults to null |
- +-----------------++--------+-+----------------------------+----------------+
- | callSiteEncoding | (char) | Encoding for Call Site Table |
- +------------------+--+-----+-----+------------------------+--------------------------+
- | callSiteTableLength | (ULEB128) | Call Site Table length, used to find Action table |
- +---------------------+-----------+---------------------------------------------------+
- +---------------------+-----------+------------------------------------------------+
- | Beginning of Call Site Table The current ip is a 1-based index into |
- | ... this table. Or it is -1 meaning no |
- | action is needed. Or it is 0 meaning |
- | terminate. |
- | +-------------+---------------------------------+------------------------------+ |
- | | landingPad | (ULEB128) | offset relative to lpStart | |
- | | actionEntry | (ULEB128) | Action Table Index 1-based | |
- | | | | actionEntry == 0 -> cleanup | |
- | +-------------+---------------------------------+------------------------------+ |
- | ... |
- +----------------------------------------------------------------------------------+
- +---------------------------------------------------------------------+
- | Beginning of Action Table ttypeIndex == 0 : cleanup |
- | ... ttypeIndex > 0 : catch |
- | ttypeIndex < 0 : exception spec |
- | +--------------+-----------+--------------------------------------+ |
- | | ttypeIndex | (SLEB128) | Index into type_info Table (1-based) | |
- | | actionOffset | (SLEB128) | Offset into next Action Table entry | |
- | +--------------+-----------+--------------------------------------+ |
- | ... |
- +---------------------------------------------------------------------+-----------------+
- | type_info Table, but classInfoOffset does *not* point here! |
- | +----------------+------------------------------------------------+-----------------+ |
- | | Nth type_info* | Encoded with ttypeEncoding, 0 means catch(...) | ttypeIndex == N | |
- | +----------------+------------------------------------------------+-----------------+ |
- | ... |
- | +----------------+------------------------------------------------+-----------------+ |
- | | 1st type_info* | Encoded with ttypeEncoding, 0 means catch(...) | ttypeIndex == 1 | |
- | +----------------+------------------------------------------------+-----------------+ |
- | +---------------------------------------+-----------+------------------------------+ |
- | | 1st ttypeIndex for 1st exception spec | (ULEB128) | classInfoOffset points here! | |
- | | ... | (ULEB128) | | |
- | | Mth ttypeIndex for 1st exception spec | (ULEB128) | | |
- | | 0 | (ULEB128) | | |
- | +---------------------------------------+------------------------------------------+ |
- | ... |
- | +---------------------------------------+------------------------------------------+ |
- | | 0 | (ULEB128) | throw() | |
- | +---------------------------------------+------------------------------------------+ |
- | ... |
- | +---------------------------------------+------------------------------------------+ |
- | | 1st ttypeIndex for Nth exception spec | (ULEB128) | | |
- | | ... | (ULEB128) | | |
- | | Mth ttypeIndex for Nth exception spec | (ULEB128) | | |
- | | 0 | (ULEB128) | | |
- | +---------------------------------------+------------------------------------------+ |
- +---------------------------------------------------------------------------------------+
- */
复制代码 从这里可以知道,其实lsda的核心,就是遍历 Call Site Table,获取到Action Table Index,然后在Action Table中获取到ttypeIndex,然后根据ttypeIndex在type_info Table中开始搜索和匹配异常对象和catch对象是否匹配。如果匹配,返回,如果不匹配,循环遍历Action Table中的action链表,直到处理完。
本文不同异常捕获的原因分析
根据上文的分析,本文的问题肯定出在lsda的Action Table和type_info Table上面。- int main(int argc, char* argv[])
- {
- try{
- p();
- }
- catch(std::exception& e){
- printf("std::exception: %s\n", e.what());
- }
- catch(...){
- printf("unkown exception\n");
- }
- return 0;
- }
复制代码- # objdump -d --disassemble=main ./build/test
- # 此时是正常捕获std异常
- 0000000000001a70 <main>:
- 1a70: 55 push %rbp
- 1a71: 48 89 e5 mov %rsp,%rbp
- 1a74: 48 83 ec 30 sub $0x30,%rsp
- 1a78: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
- 1a7f: 89 7d f8 mov %edi,-0x8(%rbp)
- 1a82: 48 89 75 f0 mov %rsi,-0x10(%rbp)
- 1a86: e8 35 01 00 00 call 1bc0 <p@plt>
- 1a8b: e9 00 00 00 00 jmp 1a90 <main+0x20>
- 1a90: e9 51 00 00 00 jmp 1ae6 <main+0x76>
- 1a95: 48 89 c1 mov %rax,%rcx
- 1a98: 89 d0 mov %edx,%eax
- 1a9a: 48 89 4d e8 mov %rcx,-0x18(%rbp)
- 1a9e: 89 45 e4 mov %eax,-0x1c(%rbp)
- 1aa1: 8b 45 e4 mov -0x1c(%rbp),%eax
- 1aa4: b9 02 00 00 00 mov $0x2,%ecx
- 1aa9: 39 c8 cmp %ecx,%eax
- 1aab: 0f 85 3d 00 00 00 jne 1aee <main+0x7e>
- 1ab1: 48 8b 7d e8 mov -0x18(%rbp),%rdi
- 1ab5: e8 16 01 00 00 call 1bd0 <__cxa_begin_catch@plt>
- 1aba: 48 89 45 d8 mov %rax,-0x28(%rbp)
- 1abe: 48 8b 7d d8 mov -0x28(%rbp),%rdi
- 1ac2: 48 8b 07 mov (%rdi),%rax
- 1ac5: 48 8b 40 10 mov 0x10(%rax),%rax
- 1ac9: ff d0 call *%rax
- 1acb: 48 89 c6 mov %rax,%rsi
- 1ace: 48 8d 3d a1 ed ff ff lea -0x125f(%rip),%rdi # 876 <_IO_stdin_used+0x16>
- 1ad5: 31 c0 xor %eax,%eax
- 1ad7: e8 04 01 00 00 call 1be0 <printf@plt>
- 1adc: e9 00 00 00 00 jmp 1ae1 <main+0x71>
- 1ae1: e8 0a 01 00 00 call 1bf0 <__cxa_end_catch@plt>
- 1ae6: 31 c0 xor %eax,%eax
- 1ae8: 48 83 c4 30 add $0x30,%rsp
- 1aec: 5d pop %rbp
- 1aed: c3 ret
- 1aee: 48 8b 7d e8 mov -0x18(%rbp),%rdi
- 1af2: e8 d9 00 00 00 call 1bd0 <__cxa_begin_catch@plt>
- 1af7: 48 8d 3d 66 ed ff ff lea -0x129a(%rip),%rdi # 864 <_IO_stdin_used+0x4>
- 1afe: 31 c0 xor %eax,%eax
- 1b00: e8 db 00 00 00 call 1be0 <printf@plt>
- 1b05: e9 00 00 00 00 jmp 1b0a <main+0x9a>
- 1b0a: e8 e1 00 00 00 call 1bf0 <__cxa_end_catch@plt>
- 1b0f: e9 d2 ff ff ff jmp 1ae6 <main+0x76>
- 1b14: 48 89 c1 mov %rax,%rcx
- 1b17: 89 d0 mov %edx,%eax
- 1b19: 48 89 4d e8 mov %rcx,-0x18(%rbp)
- 1b1d: 89 45 e4 mov %eax,-0x1c(%rbp)
- 1b20: e8 cb 00 00 00 call 1bf0 <__cxa_end_catch@plt>
- 1b25: e9 00 00 00 00 jmp 1b2a <main+0xba>
- 1b2a: e9 1b 00 00 00 jmp 1b4a <main+0xda>
- 1b2f: 48 89 c1 mov %rax,%rcx
- 1b32: 89 d0 mov %edx,%eax
- 1b34: 48 89 4d e8 mov %rcx,-0x18(%rbp)
- 1b38: 89 45 e4 mov %eax,-0x1c(%rbp)
- 1b3b: e8 b0 00 00 00 call 1bf0 <__cxa_end_catch@plt>
- 1b40: e9 00 00 00 00 jmp 1b45 <main+0xd5>
- 1b45: e9 00 00 00 00 jmp 1b4a <main+0xda>
- 1b4a: 48 8b 7d e8 mov -0x18(%rbp),%rdi
- 1b4e: e8 ad 00 00 00 call 1c00 <_Unwind_Resume@plt>
- 1b53: 48 89 c7 mov %rax,%rdi
- 1b56: e8 05 00 00 00 call 1b60 <__clang_call_terminate>
复制代码 当正常捕获异常时,cmp %ecx,%eax位置的eax的值是2,正常进入异常分支。当异常捕获异常时,cmp %ecx,%eax位置的eax的值是1,进入异常捕获分支。意味着在异常情况下:get_shim_type_info(scan_eh_tab中)返回值是0。(注意,第一次查找到了类型,但是不匹配,循环遍历链表下一此匹配到了catch(...))
上面是我们的猜测,我们直接重新构建libcxx/libcxxabi的debug版本,然后再构建我们的测试程序,然后在scan_eh_tab中我们得到了如下的图的核心结果:
从上面可知,我们不同的构建方法,导致了cxx底层无法对两个class类型进行dynamic_cast,导致无法匹配,因此进入了catch(...)的代码段。有兴趣的人可以去追踪dynamic_cast的底层实现函数如下:- __dynamic_cast(const void *static_ptr, const __class_type_info *static_type,
- const __class_type_info *dst_type,
- std::ptrdiff_t src2dst_offset)
复制代码 也就是说,我们的核心原因就是__class_type_info在静态编译、动态编译不同情况下,虽然定义是一样的,当两个符号分别在libc++.so和libuser.so的不同符号的时候(地址不一样),但是无法进行cast操作,这是合理的。
后记
总的来说,上面的内容解答了如下两个问题:
- 为什么会捕获到异常:编译条件导致的android系统底层对某些api有不同的控制行为?
- 为什么符号都存在的情况下,走了不一样的异常捕获路径:核心在于typeinfo对象无法dynamic_cast
此次问题调查,加深了我对stl_static/stl_shared的理解,同时加深了我对c++底层实现的了解。加深了我对gcc/clang等编译器的底层功能结构的了解。
同时,根据这次折腾llvm源码的过程,下次再一次想了解c++底层的实现的话,会快捷、方便不少。
参考文献
- https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html#cxx-throw
- https://en.cppreference.com/w/cpp/filesystem/exists.html
- https://developer.android.com/ndk/guides/cpp-support?hl=zh-cn#selecting_a_c_runtime
- https://en.cppreference.com/w/cpp/language/throw.html
打赏、订阅、收藏、丢香蕉、硬币,请关注公众号(攻城狮的搬砖之路) PS: 请尊重原创,不喜勿喷。
PS: 要转载请注明出处,本人版权所有。
PS: 有问题请留言,看到后我会第一时间回复。
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |