Code ranking

Researcher:
Prof. Eran Yahav | Computer Science

Categories:

Information and Computer Science

The Technology

Developers facing a programming task often look for readily available code snippets to be incorporated into their project. Such segments may be taken from publically-available projects, and in particular open source libraries and repositories, as well as from the user’s organizational repositories or from code previously used or coded by the user. Such incorporation has many benefits. First, it may save the user a significant amount of programming time. In some cases, a user may not even have the required expertise, for example programming for a specific graphic environment, in which case the usage of ready code may save a huge amount of time. Second, the code may have been used many times by developers of different organizations and disciplines, such that it has been tested over and over and its correctness is assured, thus saving long debugging periods. Furthermore, the already tremendous amount of code available on the Internet grows on a daily basis. Code hosting sites may host millions of repositories containing tens of millions of source files. Search engines, including general purpose engines as well as dedicated search engines, are used for searching for pieces of code to answer a user’s specific need. Many of these search engines merely check a simple textual correspondence with the search query and rank the results by the level of this correspondence, the recency of the code segments, etc. The novel technology is a method, computerized apparatus and computer program product for providing a code segment in response to a query, the method comprising using at least one hardware processor for: receiving a multiplicity of code segments and meta data related to the code segments; analyzing each code segment of the multiplicity of code segments, said analyzing comprising: semantically analyzing the code segment to obtain a first rank, structurally analyzing the code segment to obtain a second rank, and analyzing the meta data associated with the code segment to obtain a third rank; combining the first rank, second rank and third rank into a total rank associated with the code segment; receiving a query; matching the query to each of the multiplicity of code segments to identify matching code segments; and providing the matching code segments in accordance with total ranks associated with each of the matching code segments.

Advantages

  • Efficient usage of existing code repositories ; efficient code retrieval and reuse ; increased usability

Applications and Opportunities

  • Computerized search, handling code embedded in still images and/or video frames
arrow Business Development Contacts
Shikma Litmanovitz
Director of Business Development, Physical Science