High-Performance Chemical Database Searching
NextMove Software's Arthor technology (named after Merlin's apprentice) pushes the performance limits of chemical database search on current computer hardware. Building upon NextMove Software's Patsy chemical pattern matching engine, Arthor easily outperforms current chemical cartridges, scaling to handle the hundreds of millions of compounds found in next generation chemical databases.
Traditional chemical database search engines rely on successful fingerprint screening to achieve their high performance substructure search. This requirement means that relatively broad queries that have poor fingerprint screening have significantly worse performance, adversely affecting average and worst case search times. By tackling the computationally intensive SMARTS matching phase of a search, Arthor dramatically improves worst-case (and therefore average) search times, achieving the real-time performance bounds required by interactive users.
Similarity searches using fingerprint-based Tanimoto scores typically rely on a popcount sorted index to bound and improve search times. Unfortunately the popular search bounds described by Swamidass and Baldi (2007) are only effective for denser path-based fingerprints. Sparser circular fingerprints (e.g. ECFP) see little or no benefit from bounding using these bounds and other techniques are required to improve search speed. Arthor uses on-the-fly code generation to create query specific machine instructions and a linear-time sort algorithm is used to rank and page results. Databases containing hundreds of millions hits can be interactively queried in real time.