2023年10月29日 星期日

Reconstruct Call Flow from SIGSEGV

存取不合法位址時會產生 signal SIGSEGV。重建呼叫流程需要的步驟如下:

  1. 程式加 SIGSEGV 的 signal handler 來 dump process 資訊、CPU 暫存器、和 stack。
  2. 編譯時加「-g」選項,並保留 unstripped 的 obj 檔或執行檔備用。
  3. 程式發生 SIGSEGV,取回 dump 資訊。
  4. 分析 dump 資訊找出「程式碼位址」。
  5. 用「addr2line」轉換「程式碼位址」成原始碼檔名和行號,或者無除錯資訊時用「nm」取得函數名稱。

透過 sigaction() 安裝 SIGSEGV 時的呼叫函數「sa_sigaction」,其第三個引數會指到 ucontext_t 格式的資料 (定義在 ucontext.h),其中 mcontext_t 的資訊是處理器相關的,包括 CPU 暫存器等,定義在 sigcontext.h,在 ARM 是:

struct sigcontext {
    unsigned long trap_no;
    unsigned long error_code;
    unsigned long oldmask;
    unsigned long arm_r0;
    unsigned long arm_r1;
    unsigned long arm_r2;
    unsigned long arm_r3;
    unsigned long arm_r4;
    unsigned long arm_r5;
    unsigned long arm_r6;
    unsigned long arm_r7;
    unsigned long arm_r8;
    unsigned long arm_r9;
    unsigned long arm_r10;
    unsigned long arm_fp; // frame pointer only if optimization disabled
    unsigned long arm_ip; // temp workspace
    unsigned long arm_sp; // top of stack,可反向建構出呼叫流程
    unsigned long arm_lr; // link address/workspace,return 位址 (在 caller)
    unsigned long arm_pc; // program counter,SIGSEGV 時的程式位址
    unsigned long arm_cpsr;
    unsigned long fault_address; // fault data address
};

PC 是當機時的 Program Counter,查詢 /proc/[pid]/maps 可以知道落在哪個檔案,如果是執行檔,「addr2line -ife <unstripped執行檔> <PC>」可以查到在原始碼檔名和行號。如果是 relocatable object (so 檔),除非編譯時有下「-g」參數,不然只能知道落在哪個 symbol。「arm-linum-nm -lnDS <so檔>」列出 symbol 的偏移及大小,PC 減去 so 檔載入的位址就是偏移,查表可知是哪個函數。

LR 是呼叫函數的下個指令,照 PC 的方式,可知道在是哪個函數裡面的位置,其前一個呼叫的函數就是 PC 所在的函數。

再來是透過 SP 回朔更多 link addresses,有幾個技巧:

  • ARM 指令不在 thumb 模式都是 32-bit 對齊,所以只需要找 4 倍數的位址。
  • 只會落在 maps 中可執行的檔案才可能是。
  • addr2line 找 LR 所在的函數可能的呼叫者,一直往回推。
  • arm-linux-nm -nS <unstripped執行檔> 可列出 symbol,但不會有 inline 函數,這個時候可以找原始檔 symbol 的位址範圍。
  • PC 和 LR 可能都在函式庫。

可能進階改善

  • 程式解析 log 資料自動重新建立呼叫流程。
  • locate 並回復函數引數和 local 變數。
  • 讓 log 資料 loadable by GDB。

注意:signal 呼叫的「sa_sigaction」函數是執行在 the context of a signal handler,呼叫的函數有限制,請「man 7 signal」查看 section  “Async-signal-safe functions”。

2023年10月27日 星期五

syscall sigaction()

系統呼叫 sigaction() 用來改變接收到特定 signal 時執行的動作。

#include <signal.h>
int sigaction(
    int signum, // signal 號碼,除了 SIGKILL 和 SIGSTOP
    const struct sigaction *act, // 不是 NULL 時安裝新動作
    struct sigaction *oldact); // 不是 NULL 時存舊動作

struct sigaction {
    void     (*sa_handler)(int signum);
    void     (*sa_sigaction)(int signum, siginfo_t *, void *ucontext);
    sigset_t   sa_mask;
    int        sa_flags;
    void     (*sa_restorer)(void);
};
  • sa_handler 和 sa_sigaction 指定相關的執行動作,sa_flags 有 SA_SIGINFO 用後者否則用前者,有些架構 sa_handler 和 sa_sigaction 是 union。
  • sa_handler 可以是 SIG_DFL 用預設動作,SIG_IGN 忽略這個 signal,或指到只收 signal number 的 signal handling 函數。
  • sa_sigaction 接收 3 個的引數,signum 是 signal number,siginfo_t 是包含 signal 進一步資訊的 structure,ucontext。
  • sa_mask 指定 siganl 發生執行動作時,要遮蔽的 signal。此外,驅動的 signal 也會遮蔽,除非使用 SA_NODEFER flag。
  • sa_flags 指定 a set of flags which modify the behavior of the signal. It is formed by the bitwise OR of zero or more of the following:
    • SA_NOCLDSTOP:
    • SA_NOCLDWAIT:
    • SA_NODEFER:
    • SA_ONSTACK
    • SA_RESETHAND
    • SA_RESTART
    • SA_RESTORER
    • SA_SIGINFO
  • The sa_restorer field is not intended for application use. (POSIX does not specify a sa_restorer field.) Some further details of the purpose of this field can be found in sigreturn(2).

void (*sa_sigaction)(int sig, siginfo_t *info, void *ucontext); 有三個引數,sig 是 singal number,siginfo_t 資料結構如下:

siginfo_t {
    int      si_signo;     /* Signal number */
    int      si_errno;     /* An errno value,在 Linux 一般不使用 */
    int      si_code;      /* Signal code */
    int      si_trapno;    /* Trap number that caused hardware-generated signal
                              (unused on most architectures) */
    pid_t    si_pid;       /* Sending process ID */
    uid_t    si_uid;       /* Real user ID of sending process */
    int      si_status;    /* Exit value or signal */
    clock_t  si_utime;     /* User time consumed */
    clock_t  si_stime;     /* System time consumed */
    sigval_t si_value;     /* Signal value */
    int      si_int;       /* POSIX.1b signal */
    void    *si_ptr;       /* POSIX.1b signal */
    int      si_overrun;   /* Timer overrun count; POSIX.1b timers */
    int      si_timerid;   /* Timer ID; POSIX.1b timers */
    void    *si_addr;      /* Memory location which caused fault */
    long     si_band;      /* Band event (was int in glibc 2.3.2 and earlier) */
    int      si_fd;        /* File descriptor */
    short    si_addr_lsb;  /* Least significant bit of address (Linux 2.6.32+) */
    void    *si_lower;     /* Lower bound when address violation occurred (Linux 3.19+) */
    void    *si_upper;     /* Upper bound when address violation occurred (Linux 3.19+) */
    int      si_pkey;      /* Protection key on PTE that caused fault (Linux 4.6+) */
    void    *si_call_addr; /* Address of system call instruction (Linux 3.5+) */
    int      si_syscall;   /* Number of attempted system call (Linux 3.5+) */
    unsigned int si_arch;  /* Architecture of attempted system call (Linux 3.5+) */
}

ucontext 其實是指到 ucontext_t 結構 (為什麼要 cast to void *?),包含 signal context information that was saved on the user-space stack by the kernel; for details, see sigreturn(2). Further information about the ucontext_t structure can be found in getcontext(3).

sigaction, rt_sigaction - examine and change a signal action

參考

  1. man sigaction
  2. man sigreturn

2023年10月21日 星期六

adjtimex()

adjtimex() 是 Linux 才有的調校時間系統呼叫,使用 David L. Mills 的 clock adjustment algorithm (見 RFC 5905),透過 struct timex * 設定參數和回傳資料。ntp_adjtime() 只是用不同的 mode 名稱,偏好用在 NTP daemon。

#include <sys/timex.h>
int adjtimex(struct timex *buf);
int ntp_adjtime(struct timex *buf);

struct timex {
    int  modes;      /* Mode selector */
    long offset;     /* Time offset; nanoseconds (有設 STA_NANO) 或 microseconds */
    long freq;       /* Frequency offset; see NOTES for units */
    long maxerror;   /* Maximum error (microseconds) */
    long esterror;   /* Estimated error (microseconds) */
    int  status;     /* Clock command/status */
    long constant;   /* PLL time constant */
    long precision;  /* Clock precision
                        (microseconds, read-only) */
    long tolerance;  /* Clock frequency tolerance (read-only);
                        see NOTES for units */
    struct timeval time;
                     /* Current time (read-only, except for
                        ADJ_SETOFFSET); upon return, time.tv_usec
                        contains nanoseconds, if STA_NANO status
                        flag is set, otherwise microseconds */
    long tick;       /* Microseconds between clock ticks */
    long ppsfreq;    /* PPS (pulse per second) frequency
                        (read-only); see NOTES for units */
    long jitter;     /* PPS jitter (read-only); nanoseconds, if
                        STA_NANO status flag is set, otherwise
                        microseconds */
    int  shift;      /* PPS interval duration
                        (seconds, read-only) */
    long stabil;     /* PPS stability (read-only);
                        see NOTES for units */
    long jitcnt;     /* PPS count of jitter limit exceeded
                        events (read-only) */
    long calcnt;     /* PPS count of calibration intervals
                        (read-only) */
    long errcnt;     /* PPS count of calibration errors
                        (read-only) */
    long stbcnt;     /* PPS count of stability limit exceeded
                        events (read-only) */
    int tai;         /* TAI offset, as set by previous ADJ_TAI
                        operation (seconds, read-only,
                        since Linux 2.6.26) */
    /* Further padding bytes to allow for future expansion */
};

modes 決定哪些參數需要設,是下列的 bit‐wise 組合:

  • ADJ_OFFSET:offset 設定時間偏移。Linux 2.6.26 以後會切去大於 ±0.5s 的部份,之前超出回 EINVAL。
  • ADJ_FREQUENCY:freq 設定頻率偏移。Linux 2.6.26 以後會切去大於 ±32768000 的部份,之前超出範圍回 EINVAL。
  • ADJ_MAXERROR:maxerror 設定 maximum time error
  • ADJ_ESTERROR:esterror 設定 estimated time error
  • ADJ_STATUS:status 設定 clock status bits,如下:
    • The buf.status field is a bit mask that is used to set and/or  retrieve
             status  bits  associated with the NTP implementation.  Some bits in the
             mask are both readable and settable, while others are read-only.
      
             STA_PLL (read-write)
                    Enable phase-locked loop (PLL) updates via ADJ_OFFSET.
      
             STA_PPSFREQ (read-write)
                    Enable PPS (pulse-per-second) frequency discipline.
      
             STA_PPSTIME (read-write)
                    Enable PPS time discipline.
      
             STA_FLL (read-write)
                    Select frequency-locked loop (FLL) mode.
      
             STA_INS (read-write)
                    Insert a leap second after the last second of the UTC day,  thus
                    extending the last minute of the day by one second.  Leap-second
                    insertion will occur each day, so long as this flag remains set.
      
             STA_DEL (read-write)
                    Delete a leap second at the last second of the  UTC  day.   Leap
                    second  deletion  will  occur each day, so long as this flag re‐
                    mains set.
      
             STA_UNSYNC (read-write)
                    Clock unsynchronized.
      
             STA_FREQHOLD (read-write)
                    Hold frequency.  Normally adjustments made via ADJ_OFFSET result
                    in  dampened frequency adjustments also being made.  So a single
                    call corrects the current offset, but as offsets in the same di‐
                    rection  are  made  repeatedly,  the small frequency adjustments
                    will accumulate to fix the long-term skew.
      
                    This flag prevents the small  frequency  adjustment  from  being
                    made when correcting for an ADJ_OFFSET value.
      
             STA_PPSSIGNAL (read-only)
                    A valid PPS (pulse-per-second) signal is present.
      
             STA_PPSJITTER (read-only)
                    PPS signal jitter exceeded.
      
             STA_PPSWANDER (read-only)
                    PPS signal wander exceeded.
      
             STA_PPSERROR (read-only)
                    PPS signal calibration error.
      
             STA_CLOCKERR (read-only)
                    Clock hardware fault.
      
             STA_NANO (read-only; since Linux 2.6.26)
                    Resolution   (0  =  microsecond,  1  =  nanoseconds).   Set  via
                    ADJ_NANO, cleared via ADJ_MICRO.
      
             STA_MODE (since Linux 2.6.26)
                    Mode (0 = Phase Locked Loop, 1 = Frequency Locked Loop).
      
             STA_CLK (read-only; since Linux 2.6.26)
                    Clock source (0 = A, 1 = B); currently unused.
      
             Attempts to set read-only status bits are silently ignored.
  • ADJ_TIMECONST:constant 設定 PLL time constant。If the STA_NANO status flag (see below) is clear, the kernel adds 4 to this value.
  • ADJ_SETOFFSET (Linux 2.6.39+):time 增加到 current time. If buf.status includes the ADJ_NANO flag, then buf.time.tv_usec is interpreted as a nanosecond value; otherwise it is interpreted as microseconds.
  • ADJ_MICRO、ADJ_NANO (Linux 2.6.26+):分別選擇 microsecond 或 nanosecond resolution,兩者不能同時使用。
  • ADJ_TAI (Linux 2.6.26+):constant 設 TAI (Atomic International Time) offset。 ADJ_TAI should not be used in conjunction with ADJ_TIMECONST, since the latter mode also employs the buf.constant field. For a complete explanation of TAI and the difference between TAI and UTC, see BIPM ⟨http://www.bipm.org/en/bipm/tai/tai.html⟩
  • ADJ_TICK:tick 設定 tick value。

另外,modes 可以使用下列多 bit 組成的值,此時其它 bit 不能使用:

  • ADJ_OFFSET_SINGLESHOT (含有 ADJ_OFFSET):傳統 adjtime() 方式,使用 offset 的 µs,kernel 進行每次最多 MAX_TICKADJ 逐步調整。offset 回傳之前剩餘未調整的。
  • ADJ_OFFSET_SS_READ (Linux 2.6.28+):offset 回傳先前 ADJ_OFFSET_SINGLESHOT 還有多少未調整。

一般使用者 modes 只能用 0 或 ADJ_OFFSET_SS_READ,其它要 superuser。

回傳值:成功回傳如下 clock state,失敗回傳 -1 並設 errno。

  • TIME_OK:Clock synchronized, no leap second adjustment pending.
  • TIME_INS:Indicates that a leap second will be added at the end of the UTC day.
  • TIME_DEL:Indicates that a leap second will be deleted at the end of the UTC day.
  • TIME_OOP:Insertion of a leap second is in progress.
  • TIME_WAIT:A leap-second insertion or deletion has been completed. This value will be returned until the next ADJ_STATUS oper‐ ation clears the STA_INS and STA_DEL flags.
  • TIME_ERROR (或 TIME_BAD):The system clock is not synchronized to a reliable server. This value is returned when any of the following holds true:
    • Either STA_UNSYNC or STA_CLOCKERR is set.
    • STA_PPSSIGNAL is clear and either STA_PPSFREQ or STA_PPSTIME is set.
    • STA_PPSTIME and STA_PPSJITTER are both set.
    • STA_PPSFREQ is set and either STA_PPSWANDER or STA_PPSJITTER is set.

註:Linux 3.4 開始,the call operates asynchronously and the return value usually will not reflect a state change caused by the call itself.

失敗 errno 有:

  • EFAULT:參數不可寫。
  • EINVAL (kernels before Linux 2.6.26) An attempt was made to set buf.freq to a value outside the range (-33554432, +33554432).
  • EINVAL (kernels before Linux 2.6.26) An attempt was made to set buf.offset to a value outside the permitted range. In kernels before Linux 2.0, the permitted range was (-131072, +131072). From Linux 2.0 onwards, the per‐ mitted range was (-512000, +512000).
  • EINVAL An attempt was made to set buf.status to a value other than those listed above.
  • EINVAL An attempt was made to set buf.tick to a value outside the range 900000/HZ to 1100000/HZ, where HZ is the system timer interrupt frequency.
  • EPERM buf.modes is neither 0 nor ADJ_OFFSET_SS_READ, and the caller does not have sufficient privilege. Under Linux, the CAP_SYS_TIME capability is required.
NOTES
       In  struct timex, freq, ppsfreq, and stabil are ppm (parts per million)
       with a 16-bit fractional part, which means that a value of 1 in one  of
       those  fields  actually means 2^-16 ppm, and 2^16=65536 is 1 ppm.  This
       is the case for both input values (in the case of freq) and output val‐
       ues.

       The  leap-second processing triggered by STA_INS and STA_DEL is done by
       the kernel in timer context.  Thus, it will take one tick into the sec‐
       ond for the leap second to be inserted or deleted.

SEE ALSO
       settimeofday(2),  adjtime(3), ntp_gettime(3), capabilities(7), time(7),
       adjtimex(8), hwclock(8)

       NTP "Kernel Application Program Interface"
       ⟨http://www.slac.stanford.edu/comp/unix/package/rtems/src/ssrlApps/
       ntpNanoclock/api.htm⟩

#define ADJ_OFFSET              0x0001  /* time offset */
#define ADJ_FREQUENCY           0x0002  /* frequency offset */
#define ADJ_MAXERROR            0x0004  /* maximum time error */
#define ADJ_ESTERROR            0x0008  /* estimated time error */
#define ADJ_STATUS              0x0010  /* clock status */
#define ADJ_TIMECONST           0x0020  /* pll time constant */
#define ADJ_TICK                0x4000  /* tick value */
#define ADJ_OFFSET_SINGLESHOT   0x8001  /* old-fashioned adjtime */

asmlinkage long sys_adjtimex(struct timex __user *txc_p)
{
        struct timex txc;               /* Local copy of parameter */
        int ret;

        /* Copy the user data space into the kernel copy
         * structure. But bear in mind that the structures
         * may change
         */
        if(copy_from_user(&txc, txc_p, sizeof(struct timex)))
                return -EFAULT;
        ret = do_adjtimex(&txc);
        return copy_to_user(txc_p, &txc, sizeof(struct timex)) ? -EFAULT : ret;
}

參考

  1. man adjtimex
  2. Linux kernel 原始碼
  • 軟體時鐘使用系統呼叫 adjtimex 來同步其它外部時鐘來源時,可以每 11 分鐘去調整硬體時鐘。

2023年10月20日 星期五

Linux sleep

sleep (暫停執行一段時間):

  • 系統呼叫 nanosleep(const strct timespec *reltime, struct timespec *rem):
    • 暫停 thread 一段時間或收到 signal 提早結束。
    • 提早結束回 -1 ,errno 設為 EINTR,有 rem 的話會回傳剩餘時間。
    • 會收到哪些 signal?遮蔽 signal。signalfd(2), sigprocmask(2), signal()
  • 系統呼叫 clock_nanosleep():跟 nanosleep() 一樣,但可以指定用哪個 clock 量測時間及....
  • library 函式 sleep():回傳剩餘時間
  • library 函式 usleep()

SIP header Via

所有 SIP 訊息 都要有 Via,縮寫 v。一開始的 UAC 和後續途經的每個 proxy 都會疊加一個 Via 放傳送的位址,依序作為回應的路徑。 格式 sent-protocol sent-by [ ;branch= branch ][ ; 參數 ...] s...