Tools to find similarity of various common procedures across compilers and versions

Prof. Eran Yahav | Computer Science


Information and Computer Science

The Technology

We address the problem of finding similar procedures in stripped binaries (computer machine language executable artifacts). Previous solutions cannot find similarity between binaries compiled with different compilers (compiler = translator from high-level programmer written source code to machine code), or that hold some variation due to code patching. Previous approached that apply to this problem yield low accuracy and a high number of false matches (i.e. two procedures were flagged as similar, although they are not). This is mostly due to a syntactic approach (i.e. looking at the code’s form, and not it’s meaning).The novel technology is a computer implemented method of estimating a similarity of binary records comprising executable code, comprising converting a first binary record and a second binary record to a first intermediate representation (IR) and a second IR respectively, decomposing each of the first IR and the second IR to a plurality of strands which are partial dependent chains of program instructions, calculating a probability score for each of the plurality of strands of the first IR to have an equivalent counterpart in the second IR by comparing each strand of the first IR to one or more strands of the second IR, adjusting the probability score for each strand according to a significance value calculated for each strand and calculating a similarity score defining a functional similarity between the first IR and the second IR by aggregating the adjusted probability score of the plurality of strands.


  • Higher accuracy as compared to existing solutions

Applications and Opportunities

  • Finding vulnerable code in binaries of unknown origins
  • Finding code clones to allow for code re-use
  • Finding plagiarism in source code
arrow Business Development Contacts
Ofer Shneyour
Director of Business Development, ICT