Datasets are collections of data about piRNAs, witch informations from nucleotide sequences to methodologies used to find them.
Today we have 16 datasets from 6 organism published in their paper supplementary data. These informations can be downloaded in the section "Downloads" of this database.
Dataset Selection Criteria
The gold standart methodology that collect piRNAs is the so called Immnunoprecipitation of PIWI protein and following sequencing (IP-Seq). This way, as RNAs inside the cell are removed, remaining only those short that are bound to any protein of the PIWI family. After all, this binding is separated and the short RNA sequencing occours. Thus, when we look for datasets containning piRNAs to include on piRNAdb database, we focus extensively on PIWI IP-Seq.
However, there are different methods to support the dectetion of these short RNAs, like the realization of periodation of the samples to remove all short RNAs that does not display the metilation of oxygen number 2 of the portion 3' (For instance, the microRNAs). Despite the expressive increase on piRNA quantity, there are still many miRNAs and another short RNAS inside the sample. However, sequences from SmallRNA-Seq that were filtered to collect only piRNAs, like the execution of a piRNA preditor, were revauted and now we are accepting this kind of information on our piRNAdb database.
By the end, we understand that, today, piRNAdb had fewer datasets included and displayed fr the user, but we are here to clarify that one of our objetives is the quality of our stores data and not the quantity. As we can see on other databases that today uses sequences from a SmallRNA-Seq without any kind of extra filtering step, just to have more sequences inside theirs databases. This case is serious e will generate errors to the reserachers and users that uses these databases without knowing this immense pitfal.