Molecular characterization of national cocoa collection from the leading traditional growing areas in Ecuador

: Ecuador is the leading producer and exporter of fine cocoa, with plantations over 80 years old, preserving distinctive aroma and flavor characteristics. The research objective was to screen the genetic variability of a collection of National cocoa from Ecuador's leading traditional cocoa growing areas, denominated as Centennial National Cocoa Plants (CCNC). This germplasm collection with 243 accessions was analyzed with 20 microsatellites (SSR) markers. DNA genotyping was highly informative, generating a total of 109 SSR alleles with an average of 5.5 alleles per locus. Only 0.8% of duplicate accessions were identified. The average genetic diversity obtained was 0.447, and the polymorphic content index was 0.414, which shows a high genetic diversity. The clustering, main coordinates, and population assignment analysis revealed that the samples are classified into two subpopulations (GN and GM), differentiated by their level of heterozygosity, with a fixation index value of 0.105. The results showed that microsatellite markers and statistical tools provide useful information that favors managing and conserving genetic variability in CCNC collection.


Introduction
Theobroma cacao L. is a fruit tree belonging to the genus Theobroma, corresponding to the Malvaceae family1.It is a diploid and allogamous species with a high degree of genetic diversity in its segregating populations 2,3 .Cocoa is an important crop that grows in tropical conditions, mainly in areas ranging from warm to humid, and on continents such as Asia, Africa, and the Americas.It is considered one of the world's most lucrative and commercialized products due to its organoleptic attributes 4,5 .
The diversity of cocoa begins with the Criollo cocoa, followed by the Forastero cocoa, and finally, a kind of hybrid cocoa, a result of the mixture between these two kinds of cocoa called Trinitario, based on morphological traits of the crop 6 .Likewise, with more recent molecular data 7 , a new classification of cocoa types as proposed in 10 genetic populations called Marañón, Curaray, Criollo, Iquitos, Nanay, Contamana, Amelonado, Purús, Nacional (hereafter National), and Guyana.
In Ecuador, the first plantations of the National cocoa variety date back to the 1600s 8 , which were located along the shores of the Guayas River.Until the beginning of the 20th century, National cocoa was the only type of cocoa cultivated in Ecuador.From that time, there are still trees over 100 years old, called Centennial National Cocoa, which still retain the characteristics of fine cocoa and the aroma flavor.Genetic material was introduced between 1916 and 1919 to conserve the crop and reduce the diseases that affected the trees, which resulted in this type of cocoa disappearing from the production area and being replaced by hybrid materials, which nowadays present a high genetic variability 9 .

INIAP and the Tenguel Aroma Cocoa Center (CCAT) in
Ecuador, in search of the rescue and preservation of these native National trees, established a collection of Centennial trees for study and utilization.Many plants were collected and preserved in ex-situ collections in its leading cocoa germplasm banks.
For the characterization of the cocoa germplasm collected, microsatellite markers or simple sequence repetition (SSR) are often used in cocoa.SSRs are the most commonly used markers in studies of plant genetic diversity, assignment of individuals to their population of origin, and determination of population structure, because they are very polymorphic and codominant, providing more genetic information than other types of markers 10 .There is an excellent variety of cocoa-specific microsatellite markers with sequences previously described [11][12][13][14] , and ones employed with molecular markers in National cocoa 8,[15][16][17][18][19] .
The study is part of a broader investigation into preserving Ecuador's National Centenary cocoa collection.It is highlighted that of the samples of trees over 100 years old, their genetic variability is unknown.These trees supposedly preserve their purity (homozygosity) and preserve distinctive characteristics of fine cocoa and aroma.Due to the above, the present study aimed to molecularly characterize a collection of Centenary National cocoa trees from Ecuador's main traditional cocoa growing areas, using a panel of 20 SSR markers.

Biological material
A total of 243 plant samples were collected from the National Centenary cocoa collection (CCNC) of the South Coast Experimental Station of INIAP, located in the Yaguachi canton of the Guayas province.Each sample was coded depending on the origin of the trees from which they were taken: "M" for samples taken from trees from the region of Manabí and "Lr" for those that were taken from trees from the province of Los Rios.

DNA genotyping
DNA was extracted from cocoa leaf tissue by spectrophotometry and stored at -20°C 20, 21.For SSR analysis, 20 cocoa-specific genomic microsatellite markers were used.The forward primers were marked with fluorescent dyes (M13 tailing), and the PCR products were separated by vertical electrophoresis in the LICOR-4300 equipment 22 .The allelic profile obtained was visualized in the SAGA-GT-SSR™ program (LI-COR Biosciences), where the genotyping was performed, and a genotypic matrix was obtained that listed the size of the alleles of each sample for each marker.

Identification of representative accessions
Duplicate samples were identified by pairwise comparisons among the 243 samples based on their available alleles reported in their allelic profile, using the Microsatellite Toolkit program 23 .From the refined genotypic matrix, samples were identified that presented a single allele for at least 16 microsatellite markers of the 20 used, which had a high level of homozygosity (≥80%).These samples are representative since, having this level of homozygosity, they are considered pure samples, and it is estimated that they retain characteristics of fine National cocoa and aroma.

Genetic diversity analysis
The study of the entire population was performed using the PowerMarker v3.25 program 24 .Several statistical parameters were determined, such as the effective number of alleles, allelic frequencies, genotypic frequencies, observed heterozygosity (Ho), expected heterozygosity (He), and the polymorphic information content (PIC) 25 .Using the same program, a bootstrap analysis of 999 permutations and 100 repetitions was performed.These data were used in the PHYLIP 3.67 program to generate a consensus tree.Bayesian clustering analysis using Structure v.2.3.4 software was applied to determine the population structure, with K values from 1 to 6, with a Burnin period of 50000, a Marko v Chain Monte Carlo (MCMC) value of 50000 with 10 simulations.The Structure Harvester software was also used to establish the maximum value of ∆k 26.The pairwise distances were indicated in a Principal Coordinate Analysis (PcoA).With the formed subpopulations, a molecular analysis of variance (ANOVA) was performed, and the F statistics, Fis (intrapopulation inbreeding index), Fit (total inbreeding index), and Fst (fixation index) were established using the software GenAlex v6.5 27.

Identification of duplicates
In the pairwise comparison based on the SSR multilocus profile, only two duplicate accessions were identified within the CCNC collection.These two samples shared 38 alleles.Total duplication represents 0.8% of genotyped samples from the CCNC collection.

Identification of representative samples
Thirteen samples were identified as representative samples within the CCNC collection.These samples are those that presented high levels of homozygosity.

Population structure analysis
The population structure simulation established a ∆k value equal to 2; that is, the population was divided into two main clusters or subpopulations (Fig. 1).From the Q index, 173 samples were assigned to one of these two subpopulations with a high level of probability (Q index 0.9-1), 99 samples belonging to the subpopulation identified in green color and 74 samples belonging to the subpopulation identified in red.Within the 99 samples grouped in the green subpopulation, 86 presented a homozygosity level of less than 80%, and the remaining 13 samples presented a high level of homozygosity (≥80%), which is why this group was called GN (National Group), and the subpopulation identified in red, made up of the 74 samples, was called GM, referring to the subpopulation of hybrid samples.
This population organization was identified in the PcoA analysis (Fig. 2).The GN subpopulation is more homogeneous since the samples present a high genetic similarity; the GM group is much more diverse since it offers more significant genetic divergence between the samples.

Genetic diversity analysis
A total of 109 alleles were identified for the entire population, with a mean of 5.5 alleles per locus.The mean genetic diversity (He) was 0.447, the observed heterozygosity (Ho) was 0.331, and the polymorphic information index (PIC) was 0.414.The GN subpopulation presented a mean genetic diversity of 29.5%, and the GM subpopulation of 55.4%.Thirty-six exclusive alleles for the GM subpopulation were also identified.The results of the ANOVA and the F statistics for the two subpopulations determined that there is a variability of 70% (Fis=0.178)within the subpopulations and a variability of 30% (Fst=0.105)between the subpopulations.

Discussion
Molecular markers have proven adequate for characterizing genetic variability in T. cocoa 17,18,[28][29][30][31][32][33] .In the present study, samples from a cocoa collection called National Centenarian cocoa collection (NCCC) were used, which is made up of 260 accessions collected from farms where trees with characteristics of National cocoa were found and whose ages were 75-100 years, located in the northern area of the province of Manabí, and in the province of Los Rios in Ecuador.
In a general context, the CCNC collection presented considerable genetic diversity, taking into account both the level of He and the fact that it was found only one case of duplication among the accessions.This corresponds to 0.8%, much lower than the data reported in other studies, such as 19.6% of duplicates 28, 9.1% 29, and 12.9% 34 .
It was shown that CCNC genetic diversity is structured into two groups or subpopulations: the GN and the GM groups.Similar results were reported in previous studies 15 by grouping, and a dendrogram identified two clusters or subpopulations in a population of National cocoa obtained from plantations that were 80 to 100 years old.These two subpopulations showed significant differences based on their average level of heterozygosity; the first subpopulation is characterized by being more homogeneous since it only includes samples of National cocoa and a low level of heterozygosity (5%), and the second presents a high level of heterozygosity (44%) because it is more heterogeneous and includes samples of National cocoa trees from Vene-zuelan and hybrid models.
Furthermore, results similar to those reported by other researchers were evidenced, showing that Ecuadorian cocoa is divided into two subpopulations 16 : a subpopulation with a low level of heterozygosity (5%), considered to be representative of National cocoa; and a subpopulation with a higher level of heterozygosity (32%), considered to be representative of modern National cocoa, made up of hybrid samples.Previous researchers established that a large part of modern National cocoa cultivation corresponds to the so-called National Trinitarian complex 35 , formed from the introduction of Trinitarian-type Venezuelan cocoa at the end of the 19th century and beginning of the 20th century and its subsequent gene flow with the National Centennial cocoa population.It was also established that the high homozygosity of several Ecuadorian cocoa samples may be due to the self-compatibility of the oldest National cocoa from Ecuador 8 and that the compatibility variation of the modern National cocoa is also due to the introgression of Molecular characterization of national cocoa collection from the leading traditional growing areas in Ecuador the Trinitario cocoa genome.It can be inferred that the GN subpopulation group samples of the purest National type present less gene introgression from other cocoa varieties.On the other hand, the GM subpopulation includes the accessions with the highest level of heterozygosity.The percentage of heterozygosity presented by the GN subpopulation is 29.5%, which is considerably high compared to those published by other researchers 15,16 ; this is because only 13 samples belonging to this subpopulation presented a high level of homozygosity.The remaining 86 genotypes did not present the same level of homozygosity but rather a genetic difference with the samples of the GM subpopulation.This genetic differentiation allows these two subpopulations to appear separate from one another in the multivariate PCoA analysis.
The level of genetic diversity of the GM subpopulation was also evidenced due to the presence of the 36 exclusive alleles.It is inferred that the level of heterozygosity of this subpopulation could be the effect of the introgression of genes from various cocoa varieties, not only from Trinitarian cocoa 8 .However, further studies are needed to establish the origin of this genetic diversity.The results obtained in this research are essential and valuable to continue characterization studies in Ecuador cocoa crops since they allow the identification of National cocoa trees with a high level of homozygosity.In addition, the results make it possible to establish genetic improvement programs to recover the characteristics of delicate and aroma cocoa that distinguish Ecuadorian National cocoa as the best worldwide.Improving the quality of this type of cocoa will allow it to be reactivated and commercialized in the international market, giving a new impetus to the country's economy.

Conclusions
We showed the efficiency of the panel of SSR markers used here in the genetic assignment of accessions.The CCNC collection presented a high genetic diversity with little duplication.Genetic diversity was identified in two groups.Firstly, the GN subpopulation includes 99 samples, of which 13 presented homozygosity ≥ 80%; hence they are considered samples of native National cocoa that still conserve their characteristics of fine cocoa and aroma.Likewise, the GM subpopulation includes 74 samples with high diversity, with 36 exclusive alleles, probably due to the introgression of genes from other cocoa varieties.

Figure 2 .
Figure 2. PcoA graph of the 173 accessions assigned to one of the two subpopulations (first coordinate = 18.1% of the total, and the second = 6.1)