Background
Globally, approximately 296 million individuals suffered from chronic HBV infection in 2019, resulting in over 820,000 deaths1. Upon infecting the liver cells of the host, HBV not only generates covalently closed circular DNA (cccDNA) within the cells' nuclei but also integrates its double-stranded linear DNA (dslDNA) into the human genome2. The presence of integrated HBV contributes to the instability of the chromosomal structure in patients and allows for the transcription and translation of viral proteins, including HBsAg and truncated HBx3,4. Research has demonstrated that integration is the primary contributor to the presence of HBsAg in CHB patients with HBeAg negative and low HBV load5,6. The clearance or transcriptional silencing of integrated HBV DNA is crucial for achieving a functional cure in patients with CHB and reducing the risk of HCC. However, the regulation mechanism of the transcription of the integrated HBV DNA has not yet been fully elucidated.
Although some researchers have tried to simulate HBV DNA integration through DNA insertion techniques, these methods significantly differ from natural HBV DNA integration. Among cell models with integrated HBV DNA, Hep3B and Huh-1 have only a few integration sites, while PLC/PRF/5, the first discovered cell line integrated with HBV DNA7, possesses a large number of integration sites, making it the most classical model for studying HBV DNA integration. Furthermore, PLC/PRF/5 cells, unlike primary liver cells infected with HBV, do not exhibit HBV replication. This characteristic enables researchers to eliminate interference from cccDNA. However, the lack of comprehensive understanding of the complex HBV DNA integration patterns and transcription characteristics within PLC/PRF/5 cells limited the research on HBV DNA integration based on the PLC/PRF/5 cell line.
The availability of various high-throughput sequencing technologies and publicly accessible data in PLC/PRF/5 cells facilitate combined analysis of multi-omics data when investigating the transcriptional regulation mechanism of HBV DNA integration. In our previous study, we analyzed the HBV DNA integration sites in the PLC/PRF/5 cell line using long-read DNA sequencing technology8 and constructed a map of integrated HBV DNA in PLC/PRF/5. In this study, we aim to investigate the transcriptional activity of each integrated HBV DNA and explore the transcriptional regulatory mechanisms of integrated HBV DNA by combining DNA long-read sequencing, RNA long-read sequencing, WGBS, ChIP-seq, and ATAC-seq. The findings from this study have the potential to enhance our understanding of integrated HBV in PLC/PRF/5 cells and provide a theoretical foundation for the future development of drugs targeting HBV DNA integration.
Materials and Methods
Cell line and Cell culture
PLC/PRF/5 cells were purchased from the American Type Culture Collection (Manassas, VA, USA). Cells were cultured in Dulbecco's Modified Eagle's Medium (DMEM) supplemented with 10% fetal bovine serum (Gibco, Life Technologies, Carlsbad, CA, USA) in a humidified incubator maintained at 37°C and 5% CO2.
 
RNA extraction and cDNA preparation
Total RNA was extracted from the tissue using TRIzol reagent (Takara, Kyoto, Japan). The purity of RNA was tested using the Nano Photometer spectrophotometer (IMPLEN, Westlake Village, USA). cDNA libraries were constructed from 1 μg of total RNA of PLC/PRF/5 cells using a cDNA-PCR Sequencing Kit (SQK-PCS109) according to the manufacturer's protocol. Briefly, reverse transcriptase was used to enrich full-length cDNAs and add defined PCR adapters to both ends of the first-strand cDNA, followed by 14 cycles of cDNA PCR using LongAmp Taq DNA polymerase (New England Biolabs, Ipswich, MA, USA) with an 8-minute elongation time. The PCR products were then subjected to ONT adaptor ligation using T4 DNA ligase (New England Biolabs, Ipswich, MA, USA). Agencourt XP beads (Beckman Coulter, Brea, USA) were used for DNA purification. The final cDNA libraries were loaded onto FLO-MIN109 flow cells and analyzed on a PromethION platform at Biomarker Technology Company (Beijing, China).
 
Oxford Nanopore Technologies long read processing
The data analysis methods are consistent with previous literature9. Briefly, raw reads were first filtered with a minimum average read quality score of 7 and a minimum read length of 500 bp. Ribosomal RNA was discarded after mapping to the rRNA database. Next, full-length, non-chimeric (FLNC) transcripts were identified by searching for the primer at both ends of the reads. Clusters of FLNC transcripts were obtained after mapping to the reference genome with minimap2 (V2.24)10, and consensus isoforms were generated after polishing within each cluster by pinfish. The raw FASTQ data was subjected to quality control and trimming using the built-in quality control tool of Porechop, with default parameters. Low-quality sequences and adapter contamination were removed from the data. The quality of ONT sequencing reads was evaluated using Nanoplot V 1.33.0 11, with default parameters. FLNC transcripts were aligned to 2× genotype A (PLC/PRF/5 cells) HBV genome and human genome reference hg19 using minimap2. Transcripts aligned to both HBV and human genomes were considered to originate from integrated HBV DNA. Finally, the identified HBV transcripts were visualized using Integrative Genomics Viewer software, version 2.11.412.
 
Processing of WGBS data
Raw sequencing FASTQ files were assessed for quality, adapter content, and duplication rates with FastQC v0.11.7, then trimmed using trim_galore. Data alignment was performed using Bismark (V0.24.0)13 to align the sequencing data to the human hg19 and HBV reference genomes, generating SAM files with default parameters. For methylation analysis, the bismark_methylation_extractor was used to generate a report on global genomic cytosine methylation. Then, the GlobalMethLev function in the viewBS (V0.1.11)14 was used to compare the overall methylation levels of the integrated HBV DNA with the host genome in PLC/PRF/5 cells. The MethOneRegion function in viewBS was used to visualize the methylation levels of the integrated HBV genome. The GlobalMethLev function in viewBS was used to calculate the average genomic methylation levels of different regions around the integrated sites, and the R software (V4.2.2) pheatmap was used to visualize the methylation results.
 
Processing of ATAC-seq data
Raw sequencing FASTQ files were assessed for quality, adapter content, and duplication rates with FastQC v0.11.7, trimmed using trim_galore, and aligned with bowtie2 (v2.6.0)15 (parameters: -N 1 -X 2000) to the HBV and human genomes. PCR duplicates bias was removed using the samtools markdup. ATAC-seq data were assessed for quality control using the ATACseqQC16, and the factorFootprints function was used to detect and visualize the signal changes around the motif regions where transcription factors bind. MACS217 software was used for peak calling with parameters set as ‘--shift -100 --extsize 200’. BAM files were converted to BigWig format, with key parameters ‘--binSize 50 --normalizeUsing RPM’, by deeptools18. The deeptools plotProfile tool was used to calculate ATAC-seq signal and visualize the chromatin accessibility levels. The HINT-ATAC tool was used to predict potential transcription factor information in PLC/PRF/5 cells.
 
Processing of ChIP-seq data
Raw sequencing FASTQ files were assessed for quality, adapter content, and duplication rates with FastQC v0.11.7, trimmed using trim_galore, and aligned with bowtie2 (default parameters) to the HBV and human genomes to generate BAM files. The markdup function in Samtools was used to remove PCR-amplified duplicate sequences from the BAM files. Next, chimeric reads were extracted using Samtools and aligned to the integrated HBV DNA using bowtie2 with the ‘--no-softclip’ parameter to calculate the number of chimeric reads from different integration sites. Chimeric read counts were normalized using RPM, and differences between different samples were compared.
 
Comparison of histone modification levels between groups
The GSE113879 dataset WIG files were downloaded, representing the histone modification levels of each position of HBV DNA normalized to RPM. A custom script was used to correct the start site to the EcoRI site. Next, the WIG file was converted to BigWig format using the UCSC wigToBigWig software. The average signal values of the replicate experiments were calculated using the deeptools bigwigCompare function with the ‘—operation mean’ parameter. Deeptools bigwigCompare function was further used to calculate fold enrichment over the input of averaged HBV RPM, setting the ‘--operation’ parameter set to log2 ratio. Calculation of histone modification levels near the integration site: The binary BigWig files of the GSM6341171, GSM6341172, and GSM6341173 datasets were downloaded from the GEO database. The deeptools plotProfile function was used to calculate the histone modification levels in the region near the integration site.
 
Real-time Reverse Transcription (RT)-PCR
Real-time RT-PCR was performed as described previously19. β-Actin was used as the reference gene to determine gene expression. The primers used for real-time RT-PCR are listed in Table S1.
 
Western blot analysis
Western blot analysis was performed as previously described19. Briefly, the lysed cell supernatant was run on an SDS-PAGE and blotted with antibodies. The antibodies used in western blot are anti-HBs (ab9193, Abcam, Cambridge, UK), anti-GAPDH (AP7873b, Abcepta, Suzhou, China), and anti-β-tubulin (AM1020b, Abgent, CA, USA).
 
Quantification of HBsAg
The supernatant of the cultured cells was harvested at 48 h post-treatment. Levels of hepatitis B surface antigen (HBsAg) in the supernatant were quantified using an HBsAg quantitative determination kit (Shenzhen New Industries Biomedical Engineering Co., Ltd., Shenzhen, China) on a MAGLUMI X3 series automatic chemiluminescence immunoassay analyzer according to the manufacturer's instructions.