To set up SIFTER on your computer, please see the README in the top directory, and to build the files see the readme in the scripts folder. There is a hypothetical family there as a test family so you can see what all of the files look like and how output is organized. You can also use the Python scripts to make your own families from Pfam or use the family files included below to see how SIFTER performs.
Download the newest version of the code here: [SIFTER_2.0]
We encourage you to use this code in your own research. If you publish results using SIFTER, please cite the most recent SIFTER paper (2010).
We also include here the previous version of SIFTER, described in the ICML paper posted above, for additional files and as reference: [SIFTER_1.1]
We include three of the gold-standard data sets referenced in the most recent SIFTER paper here. All of these include the SIFTER files and the phylogenies used, and will work with SIFTER 2.0. Please see the README files for more specific details.
Data set 1: AMP/adenosine deaminase data set. Used in both the ICML paper and the most recent paper.
Data set 2: Sulfotransferase data set. Used in the most recent SIFTER paper to test that the truncation approximation produces similar answers to the exact computation of posterior probabilities.
Data set 3: Hundred family data set. Contains 100 families from Pfam 24.0 with no family having more than eight candidate functions. We did not perform reconciliation in this data set.
We plan on releasing gold-standard data sets for the Nudix family and the fungal protein data set referenced in the most recent paper. In order to do this, two additional papers need to be submitted; please email me in the meanwhile and I may be able to get you the data sets.