This column highlights recently published articles that are of interest to the readership of this publication.
This column highlights recently published articles that are of interest to the readership of this publication. We encourage ABRF members to forward information on articles they feel are important and useful to Clive Slaughter, AU-UGA Medical Partnership, 1425 Prince Avenue, Athens GA 30606. Tel; (706) 713-2216: Fax; (706) 713-2221: Email; [email protected] or to any member of the editorial board. Article summaries reflect the reviewer’s opinions and not necessarily those of the Association.
Al’Khafaji et al. make use of long-read cDNA sequencing for high-fidelity delineation of alternatively spliced mRNA transcripts. For construction of their cDNA sequencing libraries, they develop a novel protocol for precise concatenation of transcripts which they employ to optimize sequencing throughput. On the PacBio Sequel IIe platform, Q30 consensus quality is reached in ∼10 circular sequencing passes. Using the single-molecule real-time (SMRT) 8M sequencing cell, libraries of insert size 15-20 kb are of optimal length for reaching ∼10 circular passes. But transcript lengths are typically much shorter, 0.2-5.0 kb. The difference represents an opportunity for multiplexing by concatenation of cDNAs to optimize sequencing throughput. The authors first select cDNAs containing oligo-dT. During amplification by polymerase chain reaction, they append deoxy-uracil (dU) containing barcode adapters. The dU sequences are then removed enzymically. The barcode adapters remain, and serve to direct ligation between cDNAs in combinations chosen to provide concatemers with optimal length for sequencing. This system is demonstrated to identify RNA isoforms unambiguously, without the need for in silico reconstruction. The authors employ the methodology for single-cell investigation of CD8+ tumor-infiltrating T cells. They estimate a 12- to 32-fold increase in the number of differentially spliced genes identified in the absence of concatenation. The data for differentially spliced genes permit the T cells to be clustered into known states of differentiation. The methodology is expected to contribute to discovery of new mRNA isoforms, identification of gene fusions, discovery of neoantigens, and analysis of T and B cell antigen receptor repertoire, among other applications.
ABO blood group is determined by glycans on erythrocyte glycolipids and N-glycoproteins. These glycans define the A, B and H antigens. Group O blood is in high demand for transfusion because its use in medical emergencies generally avoids adverse reactions in the recipient. Consequently, there has long been interest in the possibility of employing glycosidases to convert B and A antigens to H to yield erythrocytes of a type known as ‘enzyme converted to O’ (ECO). Efforts to achieve such conversion have met with considerable success, but the methodology has not yet been adopted for clinical use, in part because of safety concerns resulting from unexplained positive results in crossmatches between recipient plasmas and ECO erythrocytes, despite apparently efficient conversion of the A and B antigens. The A, B and H glycans are all susceptible to extension with additional saccharide units. Such extended structures have been suspected to contribute to these mismatches. Noting that O-glycans on mucins secreted by the intestinal mucosa are also decorated with ABO epitopes, and that the human gut symbiont Akkermansia muciniphila requires human intestinal mucins as carbon and nitrogen sources, the authors sought new exoglycosidases from this bacterium for potential use in production of ECO erythrocytes. In the present report, they evaluate several new enzymes, and identify combinations of enzymes that convert A and B antigens to ECO with enhanced efficiency, and that additionally convert the extended forms. Use of these enzymes demonstrates that the extended structures do indeed contribute significantly to positive crossmatch reactivity. Conversion of the extended structures reduces the incidence of incompatible plasmas and diminishes the severity of remaining positive reactions. The enzymes are active in phosphate buffered saline, and can be removed from erythrocytes with moderate washing. These findings stimulate optimism that clinically useful reagents may yet be identified by rational engineering of the new enzymes, and by further exploitation of enzymes produced by the gut microbiota.
Abramson et al. describe the capabilities of AlphaFold3, a protein structure prediction model newly introduced by Google DeepMind, London, U.K. The new model extends the capabilities of the AlphaFold2, notably in enabling prediction of the structures of proteins during interaction with other molecules, such as DNA, RNA, and low molecular weight ligands, and in modeling structural alterations that result from post-translational covalent modification. In AlphaFold3, both the network architecture and the training procedure are revised to accommodate more general chemical structures and to improve learning efficiency with respect to the use of data. The network de-emphasizes multiple sequence alignment of proteins, and introduces a diffusion-based architecture that operates directly on atomic coordinates without the complexity of rotational frames or equivariant processing used by AlphaFold2. Without resorting to protein structure predication and separate ligand docking steps, the authors demonstrate a large improvement in protein-ligand structure prediction using their unified deep learning framework. The authors note that a continuing limitation of models such as AlphaFold3 is their prediction of static structures of the kind seen in the Protein Data Bank rather than the dynamic structures that exist in solution. They also note that, in disordered regions, the diffusion-based AlphaFold3 tends to introduce spurious structural order (hallucinations), albeit of low confidence. Remediation of this problem requires special negative weighting, whereas the non-generative AlphaFold2 more favorably generates a distinctive ribbon-like appearance in disordered regions. The AlphaFold3 utility is not made available for download. Users are required to access it via a server. Limitations are imposed on the number of predictions available per day and on modeling interactions with possible drugs.
Two further groups present approaches to unified deep learning for prediction of the structures of proteins interacting with ligands or bearing post-translational modifications. Both groups retain the modeling of protein conformations as linear amino acid sequences. Krishna et al. supplement the sequence repertoire with additional residues to represent DNA and RNA nucleotides. Both groups superimpose upon this architecture the structure of small molecule ligands or covalent modifications represented as graphs with atoms as nodes and chemical bonds as edges. They supplement this architecture with diffusion models to predict local folding around a given small molecule. Qiao et al. name their utility NeuralPlexer. Krishna et al. develop their utility from RoseTTAFold2, and name their iteration of this architecture RoseTTAFolddiffusion All-Atom.
Lipid particles in the size range 5-200 nm include metabolic lipoproteins, extracellular vesicles involved in inflammatory responses, and viruses. They also include synthetic liposomes and lipid nanoparticles used for delivery of drugs or vaccine RNAs. Sych et al. here describe single-particle fluorescence profiling for such nanoparticles. The authors record fluorescence fluctuations in multiple channels while fluorescently labeled particles in solution diffuse within the focal volume of a confocal microscope. In this way, they measure diffusion coefficient (related to particle size) and content of individual particles. The authors use this methodology to measure messenger RNA encapsulation efficiency of lipid nanoparticles, and antibody binding to virus particles. They also demonstrate discrimination of high-density lipoprotein (HDL), low-density lipoprotein (LDL) and very-low-density lipoprotein (VLDL) particles in single donors, and distinguish the distribution of these particles among different healthy donors. The methodology employs commercially available instrumentation and freely available software.
The biological problems that quantitative proteomic surveys are employed to elucidate require not only high proteome coverage and reproducibility to resolve, but also high sample throughput for comparisons between large numbers of individuals, phenotypes, and technical replicates. Serrano et al. here document the capabilities of methodology involving the Orbitrap Astral mass spectrometer system from Thermo Fisher Scientific, San Jose, CA for rapid acquisition of proteomic data at high proteome coverage and depth. The instrument integrates a quadrupole-Orbitrap system with a high transmission Astral analyzer, which provides product ion spectra at a rate of 200 Hz with high detection sensitivity and dynamic range. The authors perform chromatographic separation of 1-µg peptide samples on a 40-cm nanocapillary column with 7, 15, 30, or 60-min gradients, consuming 8, 41, 56, or 85-min LC-MS/MS instrument time (injection-to-injection) respectively. Employing a data-independent scanning protocol with a narrow (2 Th) isolation window, they acquire from a human cell line 7,852, 9,831, 10,411, and 10,645 unique protein groups from 94,267, 195,612, 234,406 and 254,754 unique peptides respectively. This represents a high proportion of the 18,397 human protein groups the authors estimate to have been credibly detected.
Whilst the extent to which proteomic data are shared has improved over the last decade through requirements by journals and funding agencies, the review by Shome et al. posits that it remains far from adequate to sustain necessary community scrutiny and to mine datasets for answers to questions that emerge after initial publication. Instead, investigators tend to share the minimum amount of data demanded for publication. Shome et al. express the opinion that the reasons for reluctance to share data include inadequate incentives to do so, lack of awareness of the scientific or ethical issues, and concern that academic credit or financial reward for the original investigators will be compromised. The present review suggests ways these concerns might be addressed, and lists repositories for proteomic data, along with an assessment of the strengths and weaknesses of each to help authors choose the most suitable repository for their data and metadata. However, the present authors also draw attention to investigators’ responsibility to safeguard the privacy of individuals who supply samples for proteomic analysis when data are shared. Personal identifiers should be removed, and, where necessary, access to data should be appropriately controlled. It is hoped that the measures advocated in this review will lead to favorable change in the cultural norms of proteomics investigators with regard to information accessibility.
Genome-wide association (GWAS) identifies locations in the genome at which common allelic variants of gene loci are associated with a heritable condition of interest. However, linkage disequilibrium between the closely linked loci within such regions is generally so strong that GWAS alone is insufficient to identify the particular locus or loci within them that are principally responsible for the detected association. Schnitzler et al. here propose a systematic way to identify the particular loci responsible. It consists of five steps. First, identify a cell type involved in the development of condition of interest. Second, build a map of the variants in features such as enhancers, coding regions and spice sites among loci implicated by GWAS. Third, use CRISPR interference Perturb-seq to systematically knockdown all candidate loci in the relevant GWAS locations, monitor the effects of each perturbation with single cell RNA sequencing (scRNA-seq) in the chosen cell type, and use unsupervised machine learning for de novo identification of co-regulated loci that are expressed within that cell type, unbiased by prior knowledge of gene sets or pathways. Such co-regulation is suggestive of contribution to relevant ‘gene programs.’ Fourth, test for statistically significant association between the candidate risk variants and these ‘gene programs.’ Fifth, study the effects of the identified loci. Schnitzler et al. apply this scheme to atherosclerosis. Having identified endothelial cells to be crucial in development of atherosclerosis, they knock down all loci that lie within ±500 kb of 306 GWAS signals associated with coronary artery disease. Of these loci, 228 are not associated with plasma lipoproteins. The authors show that 43 of these loci affect gene expression in model endothelial cells. Machine learning analysis of the expression data unexpectedly indicates that many of these loci converge on gene programs that correspond to branches of the cerebral cavernous malformation (CCM) signaling pathway. The data further indicate the unanticipated involvement of one locus, TLNRD1, in the CCM pathway. The authors study the CCM2 and TLNRD1 loci in detail, and show that alleles at these loci that down-regulate CCM, and knockdown of these loci, promote the expression of atheroprotective genes, actin stress fiber formation and endothelial barrier function. These changes mimic the atheroprotective effects of laminar blood flow, which are disrupted by the turbulent flow that occurs in the locations where atheromas generally form. These beneficial effects contrast with the deleterious consequences of rare, monogenic loss of the CCM function, which causes cavernous malformations in the brain and spinal chord during development. The discovery of this mechanism is a significant advance in cardiovascular biology. The methodology that led to it may be anticipated to contribute importantly to functional genomics in the future.
The use of methods for gene silencing that rely upon alteration of DNA sequence raises concern about specificity and unintended consequences, especially in the clinical setting. The alternative approach to gene silencing is transcriptional suppression. But this approach presents the challenge of formulating methods that will achieve silencing that is long-lasting. The present investigators have devised an approach that utilizes an engineered transcriptional repressor consisting of a Krüppel-associated box (KRAB) zinc-finger DNA binding domain linked to the catalytic domain of DNA-methyltransferase A and its cofactor DNMT3-like. This construct works on targeted genes by concerted removal of activating histone marks and addition of repressive histone marks. These activities are accompanied by a repressive local increase in methylation of CpG dinucleotides. Such epigenetic changes can be inherited during cell division through the activity of the endogenous methyltransferase DNMT1. They represent the basis for durable silencing. The approach also has the favorable feature that although expression of the engineered transcriptional repressor is necessary to initiate silencing, its long-term expression is dispensable. The authors here describe application of this methodology to durable silencing of the mouse Pcsk9 gene. The protein encoded by this gene stimulates degradation of the low-density lipoprotein (LDL) receptor, decreasing its expression in liver cells. Suppression of Pcsk9 is of considerable clinical interest as a way to increase expression of LDL receptors and reduce the level of plasma LDL. The engineered repressor’s mRNA is delivered via lipid nanoparticles in a single administration. Suppression for nearly one year is demonstrated in mice. Suppression efficiency is comparable to that achieved with gene editing, and it persists even after partial hepatectomy to force lever regeneration, indicating that silencing is indeed heritable at the somatic cell level. The results encourage future testing for clinical safety and specificity.
A large-scale GWAS study is presented here to investigate associations between common allelic variants and serum or plasma levels of metabolites measured by magnetic resonance (NMR) spectroscopy. The study encompasses more than 135,000 participants. A total of 233 metabolic traits are quantified: 213 lipid, lipoprotein or fatty acid parameters, and 20 non-lipid parameters, which include amino acids, ketone bodies, glycolysis/gluconeogenesis-related metabolites, fluid balance (albumin and creatinine), and inflammation-related (glycoprotein acetylation). A striking feature of the results is the widespread occurrence of polygenic inheritance among the traits included in the study. The authors highlight the importance of consistency in selection of sample type and fasting status in studies of this kind. They also caution that choice of genetic marker to represent the genetic etiology of variation in the level of a particular metabolite may strongly affect the results of Mendelian randomization studies aimed at determining causal relationships between the metabolic parameter and clinical conditions. As with all GWAS studies, it is important to recall that SNP heritability is only a sub-component of broad-sense heritability, that SNP heritability values are typically low, and that only a minor portion of SNP heritability is explained by the lead SNPs. The authors also caution that 27 of the 33 cohorts of subjects contributing to the present study are of European origin. Nevertheless, the data are foundational for study of the genetic basis of metabolic variation. Summary statistics are made publicly available.
A detailed study of contaminants leachable from plastic labware during analysis of lipids by liquid chromatography-mass spectrometry is presented in this pair of papers. In accord with general practice, extraction of lipids is here performed by incubation of samples in a mixture of chloroform and methanol (2:1, v/v), followed by addition of water to induce partitioning in a mixture of final composition 8:4:3 v/v/v chloroform, methanol, and water. The exposure of plasticware to such organic solvents is responsible for extraction of substances introduced during labware manufacture. In the first paper, the authors test polypropylene tubes (including microcentrifuge tubes) from different manufacturers and batches. They discriminate contaminants derived from plasticware from those derived from the solvents themselves, and compare contamination from plastics and from borosilicate glass. There is wide variation in the degree of contamination, but even the most favorable plastic tubes introduce an astonishing 847 different m/z signals. The authors identify some of these. Particularly disturbing are 21 primary amide and fatty acid surfactants that are identical to endogenous biological lipids. Unfortunately, pre-washing the plasticware has only marginal benefit. The consequences of this contamination are, of course, increasingly severe with samples containing limited quantities of analytes. In the second paper, the authors document severe ion suppression of low abundance lipids by contaminants. The authors make a detailed repository of information on the contaminants publicly available as a resource for investigators.
Two groups address the mismatch in transmission electron microscopy (TEM) between the circular electron beam that illuminates the specimen and the rectangular profile of the area covered by the camera. The consequence of this mismatch is that the camera collects information from less than 70% of the area exposed to the electron beam. The remaining area cannot be used subsequently for high resolution imaging because of damage to the specimen caused by the beam. Both groups design square or rectangular apertures for the electron beam to create beam profiles of corresponding shape for use in cryo-TEM. Chua et al. install commercially available aperture plates with square holes between condenser lenses 2 and 3. To achieve alignment of the beam with the square detector, they rotate and align the beam by adjusting the intensity of the post-objective projection lens 2. This necessitates recalibration for pixel size, image shift and eucentric focus. Brown et al. instead fabricate an aperture plate with a series of rectangles and squares of different sizes etched onto a silicon wafer (followed by gold coating to prevent electron beam charging), and select the ones that anticipate rotation of the square beam as the condenser lens is adjusted. They detect no penalty in image resolution. Chua et al. additionally demonstrate compatibility with cryo-electron tomography, in which the specimen is tilted to reconstruct 3-D images.
Brightness and photostability are both crucial factors contributing to the usefulness of fluorescent proteins as tracers in cell biology. The recent discovery of StayGold, a green fluorescent protein with greatly enhanced photostability but undiminished brightness relative to green fluorescent proteins in general use has been recognized as a major advance. However, StayGold is a dimeric protein. This feature is disadvantageous because it confers a propensity to perturb interactions between biological membranes, induce oligomerization of tagged proteins, and distort measurements involving fluorescence resonance energy transfer (FRET). Three groups now announce the development of monomeric variants of StayGold. Ando et al. describe mStayGold, produced by directed mutation to disrupt the dimer interface, followed by selection of variants for high brightness and photostability. Zhang et al. identify mBaoJin in a functional screen for loss of transcription factor activity that depends on dimerization, then perform several rounds of mutagenesis. Ivorra-Molla et al. describe Stay-Gold-E138D, in which the dimer interface is disrupted by a single, directed mutation of glutamic to aspartic acid. Systematic comparison of these three constructs in diverse applications remains to be completed, but their development represents a further substantial advance in capabilities for functional fluorescence imaging.
Macrocyclic compounds have diverse biologic properties. They are of intense interest as drug compounds: a list of 67 received FDA approval by 2023, including 26 dosed orally and 41 parenterally. Most macrocyclic drugs are presently used as antibacterial agents, but antivirals, antifungals, and agents for oncology, autoimmunity and immunosuppression are also in use, among other indications. Most of these drugs are derived from natural products, but the two present papers broaden the scope of systematic exploration of this class of compounds for pharmaceutical purposes. Salverson et al. take a computational approach. They explore 14.9 million closed cycles composed of combinations of 130 monomers that include alpha, beta, gamma, and 17 other amino acid chemotypes. They describe computational methodology to identify those compounds that adopt primarily a single, cyclic conformation. They search macrocycles containing 9- to 32-membered rings comprised of 3- or 4-residues. They synthesize 18 structures predicted to occupy low-energy states. Of these, 15 are close to the computed models. The methodology is used to identify selective inhibitors of HDAC6, and SARS-CoV-2 protease Mpro, and inhibitors of interaction between BCL2 homology antagonist/killer (Bak) and myeloid cell leukemia-1 (MCL1). Merz et al. focus on development of methods for high-throughput combinatorial synthesis. They explore cyclization by head-to-tail dithioether linkage, and amide acylation for enhancement of membrane permeability. They deploy their methodology in the development of selective inhibitors of thrombin. These two studies contribute substantially to methodology for the design of an increasingly interesting class of bioactive substances.