There are some divergence about the methodologies used to classify the agroupment of piRNA that are closer on a cluster. Some method use moving windows with a minimum of piRNA occourence in this window to consider a real cluster, other methods use a SVM to identify the agroupment.
The position and composition of the clusters stored in this database can be differente from anothers databases and softwares by the fact that we use the moving window to classify the clusters, like the first papper to describe these RNAs. So any difference need to be considered caused by the origin of the methodologies n the construction f this database.
Cluster Criteria
Informations about clusters are essential today to the piRNA study, basically due to its characteristics of being the major density point of piRNAs on the genome and for being the transcribed part that will bring the mature piRNAs. Thus, this information is needed on any database of piRNAs of big proportions, like piRRNAdb.
We use the methodology of Lau and his colleagues in 2006 to group the piRNAs into clusters. This metodology is based on moving window thar works the following way:
I) All the piRNA unique alignments are sorted by the genomic position
II) A 20kb window is created and searched for the density of piRNA alignments in this region.
If there are at lest 7 piRNAs inside this window:
III) This window is moved 1kb to the rigth, and again is searched for the density.
This step occour several time in a loop if stil at least 7 piRNAs are found inside this window. In the case of amount of piRNAs bellow 7, the windows is closed and its characteristics, like start, end, strand and chromosome is stored on piRNAdb.
IV) So, a new window of 20kb is created on the end of the last clsuter. This process continues until the end of the chromosome.
Despite the year of creation of this methodology is 2006, until today it is used on piRNAs. For instance, two of the more famous softwares to predict piRNA clsuters named proTRAC (Rosenkranz & Zischler, 2012) and PILFER (Ray & Pandey, 2017) uses this same methodology, howerver using different window parameters and the utilization of the piRNA expression to give more score and importance to one cluster found.
Therefore, we as in a constant process to evaluate these existent methodologies to define piRNA clusters, both biologic and in-silico. One of our objectives os to provide information more rliable to our users and researchers, what bring us to the utilization of the more advanced methodologies, that also brings to constant updates and the user need to be attentive to those updates. However, we are planning to tuse the software proTRAC to build our database of clusters of piRNAs or use the genomic coordinates already provides by their website.
Lau NC, Seto AG, Kim J, Kuramochi-Miyagawa S, Nakano T, Bartel DP, Kingston RE. Characterization of the piRNA complex from rat testes. Science. 2006;313(5785):363–367.
Ray R, Pandey P. piRNA analysis framework from small RNA-Seq data by a novel cluster prediction tool - PILFER. Genomics. 2017 Dec 19. pii: S0888-7543(17)30153-2.
Rosenkranz D & Zischler H. proTRAC - a software for probabilistic piRNA cluster detection, visualization and analysis. BMC Bioinformatics. 2012 13:5