Biogeosciences, 13, 4135–4149, 4/bg-13-4135-2016 Author(s) 2016. CC Attribution 3.0 License.Global analysis of gene expression dynamics within the marinemicrobial community during the VAHINE mesocosmexperiment in the southwest PacificUlrike Pfreundt1 , Dina Spungin2 , Sophie Bonnet3,4 , Ilana Berman-Frank2 , and Wolfgang R. Hess11 Institutefor Biology III, University of Freiburg, Freiburg, Germanyand Everard Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel3 Aix Marseille Université, CNRS/INSU, Université de Toulon, IRD, Mediterranean Institute of Oceanography(MIO), Marseille, France4 Institut de Recherche pour le Développement, AMU/ CNRS/INSU, Université de Toulon, Mediterranean Instituteof Oceanography (MIO) Nouméa, New Caledonia, France2 MinaCorrespondence to: Wolfgang R. Hess ([email protected])Received: 5 November 2015 – Published in Biogeosciences Discuss.: 23 February 2016Revised: 21 June 2016 – Accepted: 4 July 2016 – Published: 19 July 2016Abstract. Microbial gene expression was followed for 23days within a mesocosm (M1) isolating 50 m3 of seawaterand in the surrounding waters in the Nouméa lagoon, NewCaledonia, in the southwest Pacific as part of the VAriability of vertical and tropHIc transfer of diazotroph derived Nin the south wEst Pacific (VAHINE) experiment. The aim ofVAHINE was to examine the fate of diazotroph-derived nitrogen (DDN) in a low-nutrient, low-chlorophyll ecosystem.On day 4 of the experiment, the mesocosm was fertilizedwith phosphate. In the lagoon, gene expression was dominated by the cyanobacterium Synechococcus, closely followed by Alphaproteobacteria. In contrast, drastic changesin the microbial community composition and transcriptionalactivity were triggered within the mesocosm within the first4 days, with transcription bursts from different heterotrophicbacteria in rapid succession. The microbial composition andactivity of the surrounding lagoon ecosystem appeared morestable, although following similar temporal trends as in M1.We detected significant gene expression from Chromerida inM1, as well as the Nouméa lagoon, suggesting these photoautotrophic alveolates were present in substantial numbers in the open water. Other groups contributing substantially to the metatranscriptome were affiliated with marineEuryarchaeota Candidatus Thalassoarchaea (inside and outside) and Myoviridae bacteriophages likely infecting Synechococcus, specifically inside M1. High transcript abun-dances for ammonium transporters and glutamine synthetasein many different taxa (e.g., Pelagibacteraceae, Synechococcus, Prochlorococcus, and Rhodobacteraceae) was consistent with the known preference of most bacteria for thisnitrogen source. In contrast, Alteromonadaceae highly expressed urease genes; Rhodobacteraceae and Prochlorococcus showed some urease expression, too. Nitrate reductase transcripts were detected on day 10 very prominentlyin Synechococcus and in Halomonadaceae. Alkaline phosphatase was expressed prominently only between days 12and 23 in different organisms, suggesting that the microbialcommunity was not limited by phosphate, even before thefertilization on day 4, whereas the post-fertilization community was.We observed high expression of the Synechococcus sqdBgene, only transiently lowered following phosphate fertilization. SqdB encodes UDP-sulfoquinovose synthase, possiblyenabling marine picocyanobacteria to minimize their phosphorus requirements by substitution of phospholipids withsulphur-containing glycerolipids. This result suggests a linkbetween sqdB expression and phosphate availability in situ.Gene expression of diazotrophic cyanobacteria wasmainly attributed to Trichodesmium and Richelia intracellularis (diatom–diazotroph association) in the Nouméa lagoonand initially in M1. UCYN-A (Candidatus Atelocyanobacterium) transcripts were the third most abundant and de-Published by Copernicus Publications on behalf of the European Geosciences Union.

4136U. Pfreundt et al.: Global analysis of gene expression dynamicsclined both inside and outside after day 4, consistent with16S- and nifH-based analyses. Transcripts related to the Epithemia turgida endosymbiont and Cyanothece ATCC 51142increased during the second half of the experiment.1we present the community-wide gene expression based onmetatranscriptomic data from one representative mesocosm(M1). Throughout the course of the experiment (23 days), wesampled water from both M1 and the surrounding Nouméalagoon every second day from the surface (1 m) and inferredthe metatranscriptomes for the plankton fraction ( 1 mm).Introduction2In the study of natural marine microbial populations, it isof fundamental interest to identify the biota these populations consist of and to elucidate their transcriptional activities in response to biotic or abiotic changes in the environment. Metatranscriptomics gives insight into these processesat high functional and taxonomic resolution, as shown, e.g.,in the analysis of a wide range of marine microbial populations (Frias-Lopez et al., 2008; Ganesh et al., 2015; Giffordet al., 2014; Hewson et al., 2010; Hilton et al., 2015; Jones etal., 2015; Moran et al., 2013; Pfreundt et al., 2014; Poretskyet al., 2009; Shi et al., 2009; Steglich et al., 2015; Wemheueret al., 2015). Here, we report the results of a metatranscriptome analysis from the VAriability of vertical and tropHIctransfer of diazotroph derived N in the south wEst Pacific(VAHINE) mesocosm experiment, whose overarching objective was to examine the fate of diazotroph-derived nitrogen(DDN) in a low-nutrient, low-chlorophyll (LNLC) ecosystem (Bonnet et al., 2016b). In this experiment, three largescale ( 50 m3 ) mesocosms were deployed enclosing ambient oligotrophic water from the Nouméa (New Caledonia)lagoon in situ. To alleviate any potential phosphate limitation and stimulate the growth of diazotrophs, the mesocosmswere fertilized on day 4 with 0.8 µmol KH2 PO4 as a source ofdissolved inorganic phosphorus (DIP). The mesocosms weresampled daily for 23 days and analyzed for carbon, nitrogen,and phosphorus pools and fluxes (Berthelot et al., 2015), thediazotroph community composition on the basis of nifH tagsequencing (Turk-Kubo et al., 2015), N2 fixation dynamics,and the fate of DDN in the ecosystem (Berthelot et al., 2015;Bonnet et al., 2016a; Knapp et al., 2015). Furthermore, thecomposition, succession, and productivity of the autotrophicand heterotrophic communities were studied (Leblanc et al.,2016; Pfreundt et al., 2016; Van Wambeke et al., 2016). During days 15–23 of the VAHINE experiment, N2 fixation ratesincreased dramatically, reaching 60 nmol N L 1 d 1 (Bonnet et al., 2016a), which are among the highest rates reported for marine waters (Bonnet et al., 2016a; Luo et al.,2012). Based on the analysis of nifH sequences, N2 -fixingcyanobacteria of the UCYN-C type were suggested to dominate the diazotroph community in the mesocosms at this time(Turk-Kubo et al., 2015). Evidence from 15 N isotope labeling analyses indicated that the dominant source of nitrogenfueling export production shifted from subsurface nitrate assimilated prior to the start of the 23-day experiment to N2fixation by the end (Knapp et al., 2015). To link these data tothe actual specific activities of different microbial taxa, hereBiogeosciences, 13, 4135–4149, 20162.1MethodsSampling, preparation of RNA and sequencinglibrariesSamples were collected in January 2013 every other day at07:00 LT from mesocosm 1 (hereafter called M1) and fromthe Nouméa lagoon (outside the mesocosms) in 10 L carboysusing a Teflon pump connected to PVC tubing. To ensurequick processing of samples, the carboys were immediatelytransferred to the inland laboratory setup on Amédée Island,located 1 nautical mile off the mesocosms. Samples for RNAwere prefiltered through a 1 mm mesh to keep out larger eukaryotes and then filtered on 0.45 µm polyethersulfone filters (Pall Supor). These filters were immediately immersedin RNA resuspension buffer (10 mM NaAc pH 5.2, 200 mMD( )-sucrose, 100 mM NaCl, 5 mM EDTA) and snap frozenin liquid nitrogen. Tubes with filters were vortexed, thenagitated in a Precellys bead beater (Peqlab, Erlangen, Germany) 2 15 s each at 6500 rpm after adding 0.25 mL glassbeads (0.10–0.25 mm, Retsch, Frimley, UK) and 1 mL PGTX(39.6 g phenole, 6.9 mL glycerol, 0.1 g 8-hydroxyquinoline,0.58 g EDTA, 0.8 g NaAc, 9.5 g guanidine thiocyanate, 4.6 gguanidine hydrochloride, H2 O to 100 mL; Pinto et al., 2009).We isolated RNA for metatranscriptomics and DNA for 16Stag-based community analysis (Pfreundt et al., 2016) fromthe same samples by adding 0.7 mL chloroform, vigorousshaking, incubation at 24 C for 10 min, and subsequentphase separation by centrifugation. RNA and DNA was retained in the aqueous phase, precipitated together and storedat 80 C for further use.The samples were treated by TurboDNase (Ambion,Darmstadt, Germany), purified with RNA Clean & Concentrator columns (Zymo Research, Irvine, USA), followed byRibozero (Illumina Inc., USA) treatment for the depletionof ribosomal RNAs. To remove the high amounts of tRNAfrom the rRNA depleted samples, these were purified furtherusing the Agencourt RNAClean XP kit (Beckman CoulterGenomics). Then, first-strand cDNA synthesis was primedwith an N6 randomized primer. After fragmentation, Illumina TruSeq sequencing adapters were ligated in a strandspecific manner to the 5’ and 3’ ends of the cDNA fragments,allowing the strand-specific PCR amplification of the cDNAwith a proof-reading enzyme in 17–20 cycles, dependingon yields. To secure that the origin of each sequence couldbe tracked after sequencing, hexameric TruSeq barcode sequences were used as part of the 3’ TruSeq

U. Pfreundt et al.: Global analysis of gene expression dynamicsadapters. The cDNA samples were purified with the Agencourt AMPure XP kit (Beckman Coulter Genomics), quality controlled by capillary electrophoresis and sequenced bya commercial vendor (Vertis Biotechnologie AG, Germany)on an Illumina NextSeq 500 system using the paired-end(2 150 bp) setup. All raw reads can be downloaded fromthe NCBI Sequence Read Archive under the BioProject accession number PRJNA304389.2.2Pretreatment and de-novo assembly ofmetatranscriptomic dataRaw paired-end Illumina data in FASTQ format were pretreated as follows (read pairs were treated together in allsteps to not produce singletons): adapters were removed andeach read trimmed to a minimum Phred score of 20 using cutadapt. This left 386 010 015 pairs of good-quality raw readsfor the 22 samples. Ribosomal RNA reads were removed using SortMeRNA (Kopylova et al., 2012). The resulting nonrRNA reads (corresponding to a total of 155 022 426 pairsof raw reads binned from all samples) were used as inputfor de-novo transcript assembly with Trinity (Haas et al.,2013) using digital normalization prior to assembly to evenout kmer coverage and reduce the amount of input data. Remarkably, data reduction by digital normalization was only 35 %, hinting at a high complexity of the data set. Thiscomplexity is not surprising regarding that the sample poolcontained transcripts from 3 weeks of experiment in two locations (mesocosm vs. lagoon), yet it also means that therewill be a relatively large number of transcripts with very lowsequencing coverage. This study thus misses the very raretranscripts in the analyzed community.The transcript assembly led to 5 594 171 transcript contigswith an N50 of 285 nt, a median contig length of 264 nt, andan average of 326 nt. Transcript abundance estimation andnormalization was done using scripts included in the Trinity package. Align and estimate abundance.plused bowtie (Langmead, 2010) to align all reads againstall transcript contigs in paired-end mode, then ran RSEM(Li and Dewey, 2011) to estimate expected counts, TPM,and FPKM values for each transcript in each sample. Onlypaired-end read support was taken into account. The scriptAbundance estimates to was modified slightly to create a matrix with RSEM expected countsand a TMM-normalized (trimmed mean of M-values normalization method) TPM matrix (the original script usesFPKM here) using the R package edgeR. The latter matrixwas used to discard transcript contigs with very low support (maxTPM 0.25 and meanTPM 0.02). The remaining 3 844 358 transcript contigs were classified usingthe Diamond tool (Buchfink et al., 2015) with a blastX-likedatabase search (BLOSUM62 scoring matrix, maximum evalue 0.001, minimum identity 10 %, minimum bit score 50)against the NCBI nonredundant protein database from October 2015. Normalized TPM values for each transcript added as a weight to the query ID in the Diamond tabular output samplewise with a custom script, creating oneDiamond table per sample which served as input to Megan5.11.3 (Huson and Weber, 2013). Megan is an interactive toolused here to explore the distribution of blast hits within theNCBI taxonomy and KEGG hierarchy. The parameters usedto import the diamond output into Megan were minimum evalue 0.01, minimum bit score 30, LCA 5 % (the transcriptwill be assigned to the last common ancestor of all hits witha bit score within 5 % of the best hit), minimum complexity0.3.During manual analysis of the top 100 transcript contigsaccording to their mean expression over all samples, wefound 9 transcripts to be residual ribosomal RNA or internal transcribed spacer. These contigs were removed from thecount and TPM matrices for all multivariate statistics analyses. Absence of these rRNA transcripts in the Diamond output was checked and verified.2.3Sample clustering and multivariate analysisThe matrix with expected counts for each transcriptcontig (see Sect. 2.2) was used as input for differential expression (DE) analysis with edgeR (Robinson etal., 2010) as implemented in the Trinity package scriptrun DE for the set of samples taken fromM1 and the Nouméa lagoon, respectively. The edgeR package can compute a DE analysis without true replicates by using a user-defined dispersion value, in this case 0.1. We areaware that significance values are highly dependent on thechosen dispersion, and thus only considered transcripts withat least a 4-fold expression change for further DE analysis.The script analyze diff was used (parameters -P 1e-3 -C 2) to extract those transcripts that were at least4-fold differentially expressed at a significance of 0.001 inany of the pairwise sample comparisons, followed by hierarchical clustering of samples and differentially expressedtranscripts depending on normalized expression values(log2 (TPM 1)). The resulting clustering dendrogram wascut using define clusters by cutting tree.plat 20 % of the tree height, producing subclusters of similarlyresponding transcripts.Nonmetric multidimensional scaling (NMDS) was performed in R on the transposed matrix containing all3 844 358 transcript contigs and their respective TMMnormalized TPM values. First, the matrix values were standardized to raw totals (sample totals) with the decostandfunction of the vegan package (Oksanen et al., 2015). Then,metaMDS was used for calculation of Bray–Curtis dissimilarity and the unconstrained ordination.2.4Analysis of specific transcriptsA list with genes of interest was created using the IntegratedMicrobial Genomes (IMG) system (Markowitz et al., 2015).Biogeosciences, 13, 4135–4149, 2016

4138U. Pfreundt et al.: Global analysis of gene expression dynamicsFirst, 147 genomes close to bacteria and archaea found inthe samples (based on 16S rRNA sequences; Pfreundt et al.,2016) were selected using “find genomes”. Then, the “findgenes” tool was used to find a gene of interest (for example nifH) in all the preselected genomes and the resultinggenes added to the “gene cart”. This was done for all genesof interest and the full gene list including 50 nt upstream ofeach gene (possible 5’ UTR) downloaded in FASTA format.Usearch local (Edgar, 2010) was used to find all transcripts mapping to any of the genes in the list with a minimum query coverage and minimum identity of 60 %.From the full Diamond output (Sect. 2.2), all matchingtranscripts together with their taxonomic and functional assignment were extracted and false positives discarded (i.e.,transcripts that mapped to the list of specific genes but had adifferent Diamond hit). The top hit for each transcript wasextracted, and the protein classifications manually curatedto yield one common description per function (from different annotations for the same protein in different genomes).The TMM-normalized TPM counts were added to each transcript classification, as well as the full taxonomic lineagefrom NCBI taxonomy. These taxonomic lineages were curated manually to align taxonomic levels per entry.The table was imported into R, all counts per samplesummed up for each combination of protein and family-leveltaxa, and a matrix created with samples as row names andcombined protein and family description as column names.Heat maps were created separately for each protein group(e.g., rhodopsins or sulfolipid biosynthesis proteins), scalingall values to the group maximum.3Results and discussionThe metatranscriptomic data were analyzed following thestrategy outlined in Fig. 1. We obtained taxonomic assignments for 37 % of all assembled transcript contigs. This reflects the fact that the genes of complex marine microbialcommunities, especially from infrequently sampled oceanregimes like the southwest Pacific, are still insufficiently covered by current databases. The data with taxonomic assignments thus give an overview about the gene expression processes during this mesocosm experiment. With this study, weaimed at identifying global differences in expression patternsbetween the mesocosm and the lagoon, as well as betweenthe different sampling time points within the mesocosm. Wefurther explored the expression of marker genes for N and Pmetabolism, and light utilization in the different taxonomicgroups.Biogeosciences, 13, 4135–4149, 20163.1Transcripts cluster into distinct groups with similarexpression patterns over time in M1 and the lagoonGene expression changes roughly followed the timeline,within both M1 and the Nouméa lagoon, with some exceptions (Fig. 2). For the lagoon, samples from day 20 and 23clustered together, the samples from day 10 to 18 formed amid-time cluster, and those from day 2 to 8 an early cluster(Fig. 2b). In M1, the samples from day 6 to 10 and day 12to 20 clustered together (Fig. 2a). Deviating from the timeline, the sample from day 2 was placed close to day 20, day23 was separated from the late cluster, and day 4, exhibitinga prominent subcluster of transcripts upregulated only thatday, was the furthest apart from all other samples (Fig. 2a,black brackets). Closer inspection of this subcluster containing several hundred different transcripts identified 80 % ofthem as Rhodobacteraceae transcripts, correlating well witha 5-fold increase of Rhodobacteraceae 16S tags (from 2.5 to12.5 % of the 16S community) and a leap in bacterial production between T2 and T4 (Pfreundt et al., 2016). The observed transcripts were broadly distributed across metabolicpathways, reflecting a general increase of Rhodobacteraceaegene expression on day 4. The aberrant clustering of the twoearliest samples in M1 (before the DIP spike) and the tightclustering of those following the DIP spike (day 6–10) suggest an impact of the confinement within the mesocosm andof phosphate supplementation on gene expression.Unconstrained ordination using nonmetric multidimensional scaling (NMDS) confirmed the similar temporal distribution of samples from the Nouméa lagoon and M1 (Fig. 3).Yet, the samples from M1 showed a much higher varianceand were more dispersed than those from the lagoon (Fig. 3).Thus, the gene expression profiles within the mesocosm weremore diverse than in the lagoon waters. The comparison ofthe whole data set against the KEGG database (Kanehisa etal., 2014) showed a major difference between M1 and the lagoon samples only in the category energy metabolism, andits subcategories photosynthesis and antenna proteins. Thesecategories comprised 22–36, 8–16, and 2.7–7.5 % in the lagoon, respectively, and were in M1 (excluding day 23) constantly below 22, 7, and 4 %, respectively (the SupplementFigs. S1 and S2). This lower contribution of energy-relatedfunctions in M1 was detectable already at the earliest timepoint (day 2). Furthermore, diverging dynamics in the microbial community composition and transcriptional activitywere triggered in M1 already within the first 48 h (beforeday 2 was sampled), indicated by the large distance betweenM1 and lagoon samples on day 2 (Fig. 3). The early timingof this effect already on day 2 suggests a rapid remodelingof the microbial community’s gene expression upon confinement within the mesocosm. In addition, the DIP spike on theevening of day 4 subsequently triggered distinct ecologicalsuccessions in M1. The patterns we observed here are closeto the three temporal phases defined for the VAHINE experiment based on biogeochemical flux measurements (

U. Pfreundt et al.: Global analysis of gene expression dynamics413911 mesocosm samples11 lagoon samplesSubclusters oftranscripts with similarexpresison changesRNA-Seq paired-endsequencing readsClusterDE transcriptsAdapter removalquality trimmingminimum length 20 ntDifferentialexpression (DE)analysis on eachmatrix with edgeRrRNA removal withSortMeRNA22 datasets withnon-ribosomal readsBinning of all nonribososmal reads fromall samplesSeparation into M1and lagoon matricesPaired-end readmapping with bowtieTranscript abundanceestimation withRSEMFASTA file withsequences of GOIs from147 relevant genomesFind all transcriptsmapping to GOIswith usearchRaw count matrix(fragments pertranscript per sample)TMM-normalizedTPM count matrixTrinity De-novotranscript assemblyTrinitytranscriptsClusteredHeatmap(Figure 2)Taxonomicand KEGGFiguresClassification withDiamondTaxonomic andKEGG affilitationfor each transcriptTaxonomic andfunctional classificationof specific transcriptsList of specifictranscriptsIntegration ofTPM count dataand classificationsFunctionHeatmapsFigure 1. Flowchart describing the major steps in the bioinformatics workflow. Preprocessing of RNA-Seq reads was done separately for eachdata set, leading to 22 data sets of nonribosomal paired-end reads. These were binned and used as input for de-novo assembly of transcripts.The nonribosomal reads were mapped back onto the assembled transcripts with bowtie (Langmead, 2010) to infer each transcripts abundancein each sample using RSEM (Li and Dewey, 2011). Raw abundances were used for differential expression (DE) analysis and cluster analysiswith edgeR on the M1 and lagoon count matrices separately, to find transcripts which changed significantly over time. To enable directin-between sample comparison of transcript abundances, raw abundances were converted to TPM (transcripts per kilobase million) andTMM normalized (trimmed mean of M values) in RSEM, creating the final count matrix used for all figures showing transcript abundances.Classifications for these transcripts were generated using Diamond (Buchfink et al., 2015) against the RefSeq protein database. Further, amanually curated list with specific genes involved in N and P metabolism, as well as light utilization (genes of interest, GOIs) was used toextract the corresponding transcripts, but final classifications were inferred from the Diamond output. This information was used to producethe integrated function-per-taxon heat et al., 2016a) and nifH amplicon analysis (Turk-Kuboet al., 2015). These were defined as follows (Bonnet et al.,2016a). Days 1–4 (phase P0): before the DIP fertilization,P deplete. Days 5–14 (P1): P availability, dominance ofdiatom–diazotroph associations. Days 15–23 (P2): decreasing P availability, slightly higher temperature, increasing N2fixation and a dominance of UCYN-C diazotrophs in themesocosms and increase of primary production (PP) insideand outside the mesocosms.In the following sections, we refer to P0, P1, or P2 to describe trends and changes in gene expression when 23.2.1Succession of gene expression inside mesocosm 1and in the Nouméa lagoonActive taxonomic groups differ between M1 andthe lagoonThe most striking difference between M1 and Nouméa lagoon samples was the 2- to 3-fold dominance of Oscillatoriophycideae transcripts over all other taxa in the lagoon over the full time of the experiment, but not in M1(Figs. S3a, S4a). Gene expression within the Oscillatoriophycideae was mostly attributed to Synechococcus, with asubstantial share of transcript reads in M1 and the lagooncoming from cyanobacteria closely related to Synechococcus CC9605, a strain representative of clade II within thepicophytoplankton subcluster 5.1A (Dufresne et al., 2008)and Synechococcus RS9916, a representative of clade IXwithin picophytoplankton subcluster 5.1B (Scanlan et al.,Biogeosciences, 13, 4135–4149, 2016

4140U. Pfreundt et al.: Global analysis of gene expression dynamicsMesocosm 1(b)NMDS ordination of all samplesNouméa lagoon0.4(a)423 106820218 16212 1446820 231812 16T8T410 14T100.2Median-centeredT12T12 T14T16T16T10T8T4 T14 T18T6T2T20T23T230.0 0.2MDS2T6T2T18Figure 2. Heat map showing the expression (median-centeredlog2 (TPM 1)) of all significantly differentially expressed transcripts in all samples taken from mesocosm M1 (a) and outside (b).Clustering of samples and transcripts was done using Euclideandistance measures followed by average agglomerative clustering(hclust(method average)). Note that in (a) samples T2 and T4cluster far away from the other samples. These were taken beforethe phosphate spike. In M1, T4 is distinguished by a large clusterof genes upregulated at that time point, most of which belong to theRhodobacteraceae family. The general clustering along the timelineis evident inside and outside of M1.Figure 3. NMDS ordination of samples on the basis of TPM counts(transcripts per million sequenced transcripts). Outside samples areblue, samples from M1 are orange. Note that samples from M1 aremore dispersed in the plot, thus transcription profiles are more diverse than outside. This might be due to the DIP fertilization creating a distinct ecological succession in M1.2009; Figs. S3d, S4d). Clade II Synechococcus is typicalfor oligotrophic tropical or subtropical waters offshore orat the continent shelf, between 30 N and 30 S (Scanlan etal., 2009). Contrary to the Nouméa lagoon, Oscillatoriophycideae transcripts were lower than transcripts from Alphaand Gammaproteobacteria in M1 during phases P0 and P1and only gained a similar level as in the lagoon in P2. We detected substantially higher gene expression from viruses inM1 compared to the Nouméa lagoon (Fig. 4). These were assigned mainly to Myoviridae such as S.SM2, S.SSM7, andother cyanophages of the T4-like group, which, based ontheir known host association (Frank et al., 2013; Ma et al.,2014), suggest a viral component acting on the Synechococcus fraction in the mesocosm. A burst of cyanophages mighthave contributed to the observed low numbers and activity of Synechococcus in M1 compared to the lagoon duringP0 and P1 (Fig. 4). The recovery of Synechococcus populations in M1 during P2 mirrors the increase in the energyand photosynthesis-related functional categories (Figs. S1,S2) and in Synechococcus 16S tag abundance and cell counts(Leblanc et al., 2016; Pfreundt et al., 2016).Owing to the initial decay of Synechococcus in M1,Alphaproteobacteria, mainly Rhodobacteraceae, SAR11,and SAR116, dominated the metatranscriptome during P0(Fig. 4). Gammaproteobacterial transcripts increased at thebeginning of P1, reaching similar levels as those of Alphaproteobacteria, and dropped again towards the end ofP1, when the Synechococcus population started recovering(Fig. S3c). This suggests that the predominant Gammaproteobacteria profited from the organic matter released duringbacterial decay. During P2, characterized by an abundant andvery active Synechococcus population, Alphaproteobacteriagene expression increased again. Among these, only SAR11transcripts decreased by about 75 %.Unexpectedly, over 3 weeks, the temporal pattern ofSAR11 transcription appeared tightly coordinated with thatof SAR86 Gammaproteobacteria (Figs. S3, S4). We testedpairwise correlations of alpha- and gammaproteobacterialgroups and found that SAR11 and SAR86 transcript accumulation were highly correlated in M1 and the Nouméalagoon (Fig. S6, Pearson correlation: 0.88/0.96, Spearmanrank correlation: 0.80/0.98 for M1 and lagoon, respectively).This matches recent observations in both coastal and pelagicecosystems for coupling of SAR11 and SAR86 gene expression throughout a diel cycle, suggesting specific biologicalinteractions between these two groups (Aylward et al., 2015).The fact that we now see this correlation over 3 weeks intwo replicate experiments (M1 and Nouméa lagoon sampling) strengthens this hypothesis substantially. On the otherhand, transcriptional activity was decoupled from 16S-basedabundance estimates for both clades (Pfreundt et al., 2016).Decoupling of specific activity and cell abundance has beennoted before for SAR11, with specific activity being lowerthan cell abundance in the North Pacific (Hunt et al., 2013).In microcosm experiments, proteorhodopsin transcripts increased under continuous light while gene abundance decreased (Lami et al., 2009). No such information is availablefor SAR86 in the literature, and the reasons for this decoupling remain elusive. We found no hint in the transcriptionalprofile that could explain the burst

using the Agencourt RNAClean XP kit (Beckman Coulter Genomics). Then, first-strand cDNA synthesis was primed with an N6 randomized primer. After fragmentation, Illu-mina TruSeq sequencing adapters were ligated in a strand-specific manner to the 5’ and 3’ ends of the cDNA fragments,