Race Condition

Zero, One, Else

November 29, 2009 · Leave a Comment

Sometimes, there are only three numbers that matter.

Stepping from Zero to One is when a componet is being introduced. Stepping from One to Zero is when a component is being elliminated. From these systems engineering processes, the latter is preferred, but the former is the reality. Still, these two processes are usually well defined, and are driven by functionality.

The Else is the realm of non-functional requirements, the gate through which scalability problems enter the scene. Even if Else is only 4. Tomorrow it may grow to 57.

The Else is where concurency becomes a problem. The Else is where one should decide which way to scale – up or sideways. The Else is when no one can speak for sure without referring to historical data. The Else is the 2 am call when all is down and nothing works because memory was upgraded, and cache miss ratio is in the sky

→ Leave a CommentCategories: programming
Tagged:

Stack traces on Linux/x86

October 16, 2009 · Leave a Comment

While reading stackoverflow I recalled a little snippet I wrote 4 years ago.

This snippet allows the programmer to get a stack trace in C environment on Linux/x86 whenever a memory access problem happens. x86_64 is not supported yet.


/* */
#include <stdio.h>
#include <stdlib.h>

#include <signal.h>

#include <execinfo.h>

#include <unistd.h>

#define __USE_GNU
#include <ucontext.h>

typedef void signal_specific_handler(int signum, siginfo_t *pinfo, void *arg) ;

struct sigaction_pack
{
 struct sigaction         current ;
 struct sigaction         original ;
 signal_specific_handler *pspecific ;
} ;

/* we are instrumenting the first 32 signals. We are NOT instrumenting the real-time signals */
static struct sigaction_pack g_pbs_signal_actions[32] ;
static size_t max_pbs_size = 32 ;

static void* g_a_activation_record_list[64] ;
static size_t max_activation_record_count = 64 ;
static size_t activation_record_count ;

/* This is a global variable set at program start time.  It marks the  highest used stack address.  */
extern void *__libc_stack_end;

/*------------------------------------------------------------------------------
 *
 *   A short refresh on the x86 assembly
 *   -----------------------------------
 *
 *    %eip -> current instruction
 *
 *    %esp -> top of the stack (the used element)
 *
 *   This is the stack layout we see with every stack frame.
 *
 *           +-----------------+
 *   %esp -> | local variable  |
 *           | local variable  |
 *           +-----------------+
 *   %ebp -> | %ebp last frame |-+
 *           |                 | |
 *           | return address  | |
 *           |(old instruction | |
 *           |   pointer )     | |
 *           +-----------------+ |
 *           | parameters 1..  | |
 *      +----| parameters 2... |--+
 *      |    | parameters---   |
 *      |    +-----------------+
 *      |    | local variable  |
 *      |    | local variable  |
 *      |    +-----------------+
 *      + -> | %ebp last frame----....
 *           |                 |
 *           | return address  |
 *           +-----------------+
 *
 *
 */

/**
 * Activation record layout
 * Used to parse stack contents while unwinding the stack.
 */
struct activation_record_layout
{
 struct activation_record_layout * next;
 void *return_address;
};

/**
 * Unwind the stack at any location.
 * This function works like the backtrace function provided by the GNU libc,
 * with the only difference that this function doesn't unwind _current_ stack.
 * Instead, it takes values of the relevant registers (EIP, EBP and ESP) as
 * parameters.
 * Thus, this function is suited for being called from a signal handler.
 */
int
rhyno_backtrace (
   void **activation_frame_array,
   int    max_size,
   void  *eip,
   void  *ebp,
   void  *esp)
{
 /* used to parse the stack */
 struct activation_record_layout *current;

 /* keeps track on visited stack trace */
 int count = 0;

 /* save the EIP into the frames list */
 activation_frame_array[count++] = eip ;

 /* Initialize the frame list with EBP */
 current = (struct activation_record_layout *) ebp;

 while (count < max_size)
 {
 /* If we reach past stack pointer, or end of stack segment,
 * we have problem (possible memory overrun.)
 */
 if ((void *) current < esp || (void *) current > __libc_stack_end)
 break;

 /* Save the return address of frame i into the frame list */
 activation_frame_array[count++] = current->return_address;

 /* Advance the frame list pointer */
 current = current->next;
 }

 /* How many frames we had collected */
 return count;
}

void sa_segv_specific_hanlder(int signum, siginfo_t *pinfo, void *arg)
{
 /*
 Example for SEGV...
 */
 printf("  Segmentation violation: Reason: %d: %s\n",
 pinfo->si_code,
 pinfo->si_code == SEGV_MAPERR ?
 "Address not mapped" : pinfo->si_code == SEGV_ACCERR ?
 "Invalid permissions" : "?????") ;
 printf("    Offending address: %p\n", pinfo->si_addr ) ;
}

void sa_bluescreen_handler(int signum, siginfo_t *pinfo, void *arg)
{

 /*
 siginfo_t {
 int      si_signo;  / * Signal number * /
 int      si_errno;  / * An errno value * /
 int      si_code;   / * Signal code * /
 pid_t    si_pid;    / * Sending process ID * /
 uid_t    si_uid;    / * Real user ID of sending process * /
 int      si_status; / * Exit value or signal * /
 clock_t  si_utime;  / * User time consumed * /
 clock_t  si_stime;  / * System time consumed * /
 sigval_t si_value;  / * Signal value * /
 int      si_int;    / * POSIX.1b signal * /
 void *   si_ptr;    / * POSIX.1b signal * /
 void *   si_addr;   / * Memory location which caused fault * /
 int      si_band;   / * Band event * /
 int      si_fd;     / * File descriptor * /
 }
 */
 printf("Got deadly signal %d, following information (%p) is available:\n",
 signum,pinfo ) ;

 printf(" Signal %d (code %d, errno %d) send by process %d (uid %d)\n",
 pinfo->si_signo,
 pinfo->si_code,
 pinfo->si_errno,
 pinfo->si_pid,
 pinfo->si_uid ) ;

 /* Call message specific handler.
 For example, the segmentation fault handler will
 print the offending address
 */
 if (g_pbs_signal_actions[signum].pspecific != NULL)
 {
 (*g_pbs_signal_actions[signum].pspecific)(signum,pinfo,arg) ;
 }

 if ( arg == NULL )
 {
 printf("No register/stack information is available\n") ;
 }
 else
 {
 printf("Important CPU Registers:\n" ) ;
 printf("CS:  %012p DS:  %012p  SS:  %012p \n",
 ((ucontext_t*)(arg))->uc_mcontext.gregs[REG_CS],
 ((ucontext_t*)(arg))->uc_mcontext.gregs[REG_DS],
 ((ucontext_t*)(arg))->uc_mcontext.gregs[REG_SS]
 ) ;

 printf("EIP: %012p EBP: %012p  ESP: %012p\n",
 ((ucontext_t*)(arg))->uc_mcontext.gregs[REG_EIP],
 ((ucontext_t*)(arg))->uc_mcontext.gregs[REG_EBP],
 ((ucontext_t*)(arg))->uc_mcontext.gregs[REG_ESP]
 ) ;

 printf("\n\nStack Trace:\n") ;

 /*------------------------------------------------------------------------------
 * Collect activation records in format similar to that of
 * glibc's `backtrace', but use the foreign stack.
 */
 activation_record_count = rhyno_backtrace(
 g_a_activation_record_list,
 max_activation_record_count,
 (void*)(((ucontext_t*)(arg))->uc_mcontext.gregs[REG_EIP]),
 (void*)(((ucontext_t*)(arg))->uc_mcontext.gregs[REG_EBP]),
 (void*)(((ucontext_t*)(arg))->uc_mcontext.gregs[REG_ESP])
 ) ;

 /*------------------------------------------------------------------------------
 * Use internal glibc reporting mechanism to convert
 * activation records to symbolic names (requires -rdynamic linker flag)
 */
 backtrace_symbols_fd(g_a_activation_record_list,activation_record_count,1) ;

 } /* register information */

 /* propagate the signal to the original handler */
 if ( g_pbs_signal_actions[signum].original &&
 (g_pbs_signal_actions[signum].original.sa_sigaction != NULL ))
 {
 printf("\n\nPropagating the signal %d to the original handler\n",
 signum) ;
 g_pbs_signal_actions[signum].original.sa_sigaction(signum,pinfo,arg)  ;
 }
 else
 {
 printf("No original handler is registered for singal %d, continuing\n",
 signum ) ;
 }
}

int register_signal_handlers()
{
 int signum ;
 int result = -1 ;

 for ( signum = 0 ; signum < max_pbs_size ; ++signum)
 {
 /* zero the signal action */
 memset(&g_pbs_signal_actions[signum].current,0,sizeof (struct sigaction)) ;

 /* register the generic handler */
 g_pbs_signal_actions[signum].current.sa_sigaction = sa_bluescreen_handler ;

 /* turn on the flag that tells the runtime to invoke the extended signal handler,
 see man sigaction for details */
 g_pbs_signal_actions[signum].current.sa_flags |= SA_SIGINFO ;

 /* Register the signal handler, store the original signal handler for propagation */
 result = sigaction(signum,&g_pbs_signal_actions[signum].current,
 &g_pbs_signal_actions[signum].original) ;
 /* Report result */
 printf ("-- Registered signal handler for signal %d, result %d, handler %p original handler %p \n", signum,result,
 g_pbs_signal_actions[signum].current.sa_sigaction,
 g_pbs_signal_actions[signum].original.sa_sigaction) ;
 }

 g_pbs_signal_actions[SIGSEGV].pspecific = sa_segv_specific_hanlder ;

 return result ;
}

/* Function prototype. Here we will do THINGS */
int malicious_code(int) ;

int
main (int argc, char* argv[], char* envp[])
{
 int sleep_interval = 300 ;
 /* Very important - register the handlers */
 if ( argc > 1 )
 register_signal_handlers() ;

 sleep(sleep_interval) ;
 malicious_code(10) ;

 return 0 ;
}

int malicious_code(int arg)
{
 if ( ! arg )
 {
 int * garbage_pointer = (int*)441231324;
 return *((int*)(garbage_pointer)) ;
 }
 else
 {
 return malicious_code(--arg) ;
 }
}

→ Leave a CommentCategories: programming

Data Leak Prevention and Birthday Attacks

September 25, 2009 · Leave a Comment

Data Leak Prevention is set of technologies used to monitor information that is leaving corporate network, and taking actions in case sensitive information “leaves the building”. Detection of sensitive information is done usually using pattern matching with checksums (e.g. credit card numbers, which have quite a bit of redundant information for validation), content proximity heuristics (e.g. US social security numbers, which don’t have any redundant information for validation, and without taking content into equation can be recognized as Canadian driver license numbers), digital fingerprints and the combination of the above.

Birthday Attack is a way to meddle with digital signatures to greatly reduce key strength. This attack is based on (and takes its name from) the Birthday Paradox. It resorts to slightly tweaking (by inserting commas, reordering words and using synonyms) two documents – one which is considered to be correct, and one which is not – until both have same digital signature. Once digital signatures collide, one can use the “good” document (e.g. request for a day off) to generate digital signature, and then use “bad” document (e.g. approval for fund transfer) as legitimate. The Birthday Attack reduces key depth by two, effectively turning 64bit digital signatures int 32bit.

Returning to the DLP problem. Usually, corporations have some sort of legal banners, footers, headers and sidebars, which comprise all kinds of disclaimers, terms, conditions and threats. As these are usually not considered sensitive, good fingerprinting schemas will contain “black lists” for “sensitive” fingerprints, along with “white lists” for insensitive fingerprints.

If the potential “extruder” – the person who wants to get the information out from the organization is willing to get away with it – can slightly influence the white lists, he can use the Birthday Attack to mask sensitive bits as legalese, which provide plenty of opportunity for meddling by being inhumanely long and unreadable. Once fingerprint for the sensitive information collide with fingerprint for the legalese, the “extruder” can safely circumvent DLP system.

The DLP system, in order to prevent this kind of attack, should do at least one of following:

  • Disallow meddling with white lists (keep lawyers on track)
  • Use hash functions with sufficient key depth, so that keys will be cryptographically resistant to BA. For example, 2^64 is pretty strong key depth; this suggests that  hash function with key depth of 128 will be used.
  • Combine fingerprints with context information. Seeing one fingerprint from a banner in the middle of neutral text should set some bits on

→ Leave a CommentCategories: programming

Autonomous Software, circa 1997

September 11, 2009 · 1 Comment

Take your software, put it in a box, throw it out of window.

Sometimes, throwing the software out of the window is not act of frustration. Sometimes, it is just a part of the requirements. In particular, if you put inside the box some tricky hardware, and throw it hard enough, it may reach other planets.

However, one must make sure the software works while being thrown out of window, to reach the final destination.

I am referring, of course, to the Pathfinder project. The Authoritative Account on the infamous software failure is a must read for software engineers.

→ 1 CommentCategories: programming

Art of Leadership

September 10, 2009 · Leave a Comment

In the Santa Cruz mountains, they would say: “Ben Zander is full of awesome”.

Ben Zander is famous conductor.  I have to admit, I did not know he also writes books. Not until we went to a book store, where my wife found  little yellow book, called “The Art of Possibility” by Ben and Rosamund Zander.

“I have to read this. Ben Zander is a great writer”, she said.

Ben Zander is great conductor. And a great writer. And a great teacher.  The little yellow book has changed the way I tell people what has to be done.

Ben starts by telling, that “everything” is “invented.” Our perception is subjective, and, in some form, “invented.” By changing the perception, we can change our reactions and our actions, this is the preach.

As experienced teacher, Ben sets a homework – exercises for changing our perception. These are not easy, not at all. These excercizes work, though. Slowly, more and more possibilities unfold.

As the rule of 10,000 tells us,  it would take 10,000 exercises to master this kind of emotion control. Ben does not leave us at this kind of emotion control, but rather goes on and explains, how a leader can unfold possibilities to these whom he leads.

I really glad my wife gave me this book.

→ Leave a CommentCategories: Uncategorized

Reflecting on the poor performance on Saturday

June 4, 2009 · Leave a Comment

The reason behind my poor performance was more mental than physical. I crashed by pushing the envelope too much, which was caused by anxiety, which was caused by lack of confidence.

Next race I will stick to the game plan and will

Dr. Cohn writes in his article on mental abilities for motocross racers about mental tune-up before the race.  He highlights the following points to cover:

  • confidence
  • focus
  • ability to perform naturally and instinctively
  • relax and try less when you want to win badly
  • handle errors
  • mental prep

While these suggestions are more for a motocross racer, I find all of these very important for XC racing.

→ Leave a CommentCategories: Uncategorized

Sour race at CCCX #8

June 1, 2009 · Leave a Comment

5/30, really bad race day. I arrived to the race venue slightly late. After registration I started warmup with a few little sprints on pavement, then moved on to the singletrack. Lots of sand, and really fast course.

Standing in line before the start, I stretched legs, did a few squats and made sure the bike is in right gear for the fast start.  Juniors started at 9.03, we went off one minute later.

Just before the singletrack I was 4th in position, but I sprinted into 2nd. Then we started gaining speed, passing slow Juniors. The course was flat, I was not working too hard, remembering to pace myself. The first 5 miles the race went rather well. I followed Wade from the team, and this was fun.

Then I overbaked a turn, and fell really hard on my right hip.  After a few minutes of pain and misery, I crawled back to the bike, and started pedalling. I was passed by almost everyone, except for the slowest Juniors. I was actually riding  with left leg only, the right leg refused to act.

At the lap I was asked whether I need a medic. I said “yes”, then I thought  and said, “no, dammit, I am going on for 2 laps.” Well, maybe thinking was not involved there.

The second was horribly painful, but I sort of passed a few kids and women :) At the lap gate I decided to go for another lap.

Then my bike decided that I need a rest. The next thing I knew was that I was standing in a side of a trail, trying to pull the chain out of the rear wheel. Somehow the chain has been pushed in between the cassette and the hub flange, twisted ninety degrees and stuck there. After five minutes of trying to pull the chain the old way, I unfastened the chain (yay to PowerLinks) to two parts, pulled as hard as I can while standing on the frame, and bingo – the chain was free. I fastened the chain into whole (yay again to PowerLinks),  and resumed the race.

The universe was made of pain.

When I was closing to the lap, I heard some people talking behind my back. I was sure these were faster riders that lapped me, so I pulled to the side.

- “Are you racing? You are so slow…” one of those riders asked.

- “Yes” I muttered,

- “Don’t pay attention to us, go ahead! This is the last climb, it’s all downhill from here! Relax, and push” – the rider cheered me.

Push push push push push, damn.  When I passed the lap gate, I almost fell from the bike.

That was one painful race.

→ Leave a CommentCategories: Racing

The Grand Unified Theorem of Reverse Engineering

May 26, 2009 · Leave a Comment

The Grand Unified Theorem of Software Engineering is not a theorem, actually. It’s a joke, with a grain of truth in it. It states that “You can solve every problem with another level of abstraction, except for the problem of too many levels of abstraction.” 

The Grand Unified Theorem of Reverse Engineering is not a theorem either. This joke refers to the above mock theorem, by stating that “You can find answer to any question by breaking another level of abstraction, except for the problem of having no useable abstraction to start with”

Some would say that the former refers to Math, while the latter refers to Science. Some other would say exactly the opposite.

Yosemite Half Dome at sunset

My daily job includes a fair share of researching implementations of ad-hoc defined network protocols. This process is tedious and sometimes frustrating, yet sometimes it brings its “Aha!” moments. Sometimes I like to compare reverse engineering with hiking in the mountains. Lots of striding in even pace, intermixed with moments of excitement. Just like the natural wonders protect themselves from visitors by steep trails and annoying weather, so do software designs hide behind mundane details. Just like in hiking, the excitement of seeing the monolith of architecture, when it reveals itself, ,  

Modern computer architectures make learning protocols both easier and harder. Java, Python or .NET programs have very rich runtime environment, which brings structure and metainformation to start with. On the other hand, “native” or “unmanaged” code do not have the same structure? or has it? 

While working with native code, it’s very imporatant to undestand the assumptions that were taken when the code was written. Is the code multi-platform? In this case, chances are the software was written in the “stratified” way: specifics of different platforms are abstracted out into common APIs. Is the code network specific? In this case, probably it will use Berkeley Sockets API or its equivalent. 

Short list of tools I have to use routinely:

  • Wireshark, the protocol analyzer
  • IDA Pro, the interactive disassembler 
  • dumpbin, which is actually a binary process visualizer
  • VMWare, which is a great way to set up an environment

Some other tools I use infrequently:

  • SCAPY, the interactive network protocol client
  • OllyDbg, the debugger

In general, the process of reversing application consists of trial and error, in understanding the processes and implementation.  This is where VMWare Snapshots become very useful feature. 

Ilfak Guilfanov, the author of IDA interactive disassembler, has a blog: www.hexblog.com.

→ Leave a CommentCategories: Uncategorized

Unicode Byte Order Detection

May 13, 2009 · Leave a Comment

Once in a while I run into the following problem: give a stream of bytes, which are know to represent Unicode characters, what is the correct byte order?

Assuming that the Unicode byte stream is valid Unicode stream, we can clearly see that if character Ci is in certain byte order, then the next character, Ci+1 will be in same encoding as well, unless a byte order mark (BOM) specifies the byte order for following bytes.  Moreover, assuming the Unicode byte stream represents character stream, there’s certain probability that adjacent characters belong to same word.

Same words are usually written in same language, which implies same Unicode code page. In other words: in human text, number of different code pages will be low.

This is important property, which we are going to exploit.

This number of different code pages clearly depends on the document: for a math text in English, some words will be in English, some in Greek and some will be special symbols. On the other hand, a biology text in Romanian will include Romanian words intermixed with biology terms in Latin (probably, in italic).

The location of codepage bytes is governed by the byte order used by the Unicode byte stream. By determining, which bytes are used for codepages, we can detect the byte order. 

The algorithm is easy:

  1. take a sampling buffer, _buf  from the stream. The size of _buf should be sufficiently large for detection, but not too large so to slow the algorithm down. 64 bytes sounds like a good guess.
  2. For a first pass: 
    1. Allocate 2 hash sets: LO_BYTE and HI_BYTE
    2. Iterate over the _buf in 2 byte offset increments (stored in offset). Insert _buf[offset] to HI_BYTE and _buf[offset+1] to LO_BYTE 
  3. For a second pass:
    1. Allocate 2 hash sets: LO_NIBBLE and HI_NIBBLE
    2. Iterate over the _buf in 4 byte offset offset. Insert (_buf[offset]&0xFF)<<8|_buf[offset+1]&0xFF to HI_NIBBLE and _buf[offset+2]&0xFF)<<8|_buf[offset+3]&0xFF to LO_NIBBLE 
  4. Chose the hash set best with lowest number of different entries nbest.
    1. if nbest > EXPECTED_NUMBER_OF_LANGUAGES  probably the encoding is UTF8
    2. otherwise, if best is LO_BYTE, the encoding is UTF-16LE 
    3. otherwise, if best is HI_BYTE, the encoding is UTF-16BE
    4. otherwise, if best is HI_NIBBLE, the encoding is UTF-32BE
    5. otherwise, if best is LO_NIBBLE, the encoding is UTF-32LE
    6. otherwise, check your implementation, it probably contains a bug.

Of course, if the BOM (byte order mark) is present, the whole issue of detecting the encoding becomes moot.

→ Leave a CommentCategories: programming
Tagged: ,

Sea Otter Classic

April 24, 2009 · 1 Comment

This weekend I competed at Sea Otter Classic Cross Country race.

I did not have the time to pre-ride the course. This was too bad, because there were lots of sections I held down speed, just becase I was not sure what’s coming next.   Before the race I did 35 minute warmup, which did wonders to my legs. 

The course was sandy, with numerous sections of foot deep sand. 48039-164-007f

We started at the Laguna Seca motor-racing track, where the whole field just accelerated up.  It was great fun just being in a wave of 50 riders, swaying from side to side as the track turned.

I was feeling good on climbs, although was constantly bugged by a mechanical – my chain kept sliding between the cassette and the spokes. I had to stop 5 times to pull the chain out, losing a few minutes in total. 

In the beginning I tried to see how aggressively I can corner, and apparently I can corner with more gusto than technique. After the first crash, I decided to stay within the “safe” limits. 

Another crash happened on the “technical descent section” where I was totally scared by the ruts.

After these two, I rode rather steadily.

→ 1 CommentCategories: Uncategorized